# Ex 4 - Datacube

This notebook briefly describes how to use the `Datacube` class to easily read and process ICEYE datacubes. 

In [1]:
from pathlib import Path
import os
import icecube
from icecube.bin.datacube import Datacube
from icecube.bin.generate_cube import IceyeProcessGenerateCube

Please configure paths below as per need

In [2]:
resource_dir = os.path.join(str(Path(icecube.__file__).parent.parent), "tests/resources")
grd_raster_dir = os.path.join(resource_dir, "grd_stack")

# to show example with vectors
vector_labels_fpath = os.path.join(resource_dir, "labels/dummy_vector_labels.json")
# Example of the "labels/dummy_vector_labels.json"
#[
#    {
#        "product_file": "ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_0.tif",
#        "labels": {
#            "objects": [
#                {
#                    "class": "rand-b",
#                    "bbox": {
#                        "xmin": 2,
#                        "ymin": 5,
#                        "xmax": 9,
#                        "ymax": 7
#                    }
#                },


# cube configuration fpath
cube_config_fpath = os.path.join(resource_dir, "json_config/config_use_case5.json")
# config_use_case5.json 
#{
#    "start_date": 20210425,
#    "end_date" : 20210430,
#    "min_incidence_angle" : 20,
#    "max_incidence_angle" : 34,
#    "temporal_resolution" : 1,
#    "temporal_overlap" : 1
#}


Below we will showcase an example of using the `Datacube` class for netCDF4 file format built with GRDs 
and vector labels.

Let's build the cube first with our sample dataset. It is worth mentioning that any icecube dataset can be easily attached to `Datacube` using `Datacube.set_xrdataset(xr.Dataset)` or if we have a netCDF4 file, it can easily be read using `Datacube.read_cube(cube_fpath)`.

In [3]:
dc = IceyeProcessGenerateCube.create_cube(grd_raster_dir, cube_config_fpath, vector_labels_fpath)

08/31/2021 10:44:20 AM - sar_datacube_metadata.py - [INFO] - Building the metadata from the folder /home/adupeyrat/Documents/code/icecube/tests/resources/grd_stack using GRD
processing rasters for cubes: 100%|██████████| 6/6 [00:00<00:00, 115.66it/s]
08/31/2021 10:44:20 AM - common_utils.py - [INFO] - create running time is 0.1259 seconds
08/31/2021 10:44:20 AM - sar_datacube_metadata.py - [INFO] - Building the metadata from the folder /home/adupeyrat/Documents/code/icecube/tests/resources/grd_stack using GRD
processing rasters for labels cube: 100%|██████████| 6/6 [00:00<00:00, 2710.37it/s]
08/31/2021 10:44:20 AM - common_utils.py - [INFO] - create running time is 0.0252 seconds



`datacube_inst` is an instance of class `Datacube` so it encapsulates all the useful attributes and methods that we can use to process the datacube. The below cells demonstrate this with some examples:

In [4]:
type(dc)

icecube.bin.datacube.Datacube

In [5]:
# Here is how our dataset looks like, referred to `xds`
xds = dc.xrdataset
xds

Unnamed: 0,Array,Chunk
Bytes,1.17 kiB,200 B
Shape,"(6, 10, 10)","(1, 10, 10)"
Count,10 Tasks,6 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 1.17 kiB 200 B Shape (6, 10, 10) (1, 10, 10) Count 10 Tasks 6 Chunks Type uint16 numpy.ndarray",10  10  6,

Unnamed: 0,Array,Chunk
Bytes,1.17 kiB,200 B
Shape,"(6, 10, 10)","(1, 10, 10)"
Count,10 Tasks,6 Chunks
Type,uint16,numpy.ndarray


In [6]:
# Fetch all the data variables of the cube. We can see that "Intensity" and "Labels" are two data 
# variables found in our datacube
all_dvs = dc.get_data_variables()
all_dvs

['Intensity', 'Labels']

In [7]:
# We can fetch all the images present in the datacube w.r.t a particular variable. 
# Please note that "None" denote that product file is missing because of the temporal gap.
dc.get_all_products(dc.get_xrarray(all_dvs[0]))

['None',
 'None',
 'ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_1.tif',
 'ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_0.tif',
 'None',
 'None']

In [8]:
# We can pass the data variable to get the underlying xr.DataArray. We can see that the data cube has 
# 6 bands as per our user configuration. Also please notice that our metadata is contained in xr.DataArray.attrs
# as lists of keys, indexed by multi-temporal stack 
dc.get_xrarray(all_dvs[0])

Unnamed: 0,Array,Chunk
Bytes,1.17 kiB,200 B
Shape,"(6, 10, 10)","(1, 10, 10)"
Count,10 Tasks,6 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 1.17 kiB 200 B Shape (6, 10, 10) (1, 10, 10) Count 10 Tasks 6 Chunks Type uint16 numpy.ndarray",10  10  6,

Unnamed: 0,Array,Chunk
Bytes,1.17 kiB,200 B
Shape,"(6, 10, 10)","(1, 10, 10)"
Count,10 Tasks,6 Chunks
Type,uint16,numpy.ndarray


### Values

In [9]:
# similarly we can see the relevant xr.DataArray associated with variable "Labels". 
# Plese note that label objects are serialized and then stored. So in order to read them back, they must be
# un-serialized.
dc.get_xrarray(all_dvs[1]).head(1)

As we work with image files, it is easier to use them as reference to get values or metadata inside 
data arrays. `Datacube.get_product_values(**args)` provides convenient way to fetch labels for the images. 

In [10]:
# One can specify the product-file and data variable to fetch the values for a raster
product_file = "ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_1.tif"
dc.get_product_values(product_file, dc.get_xrarray(all_dvs[0]))

array([[329, 389, 217, 418,  48,  67,  98, 423, 317, 525],
       [434, 508, 348, 198, 323, 436, 286, 320, 550, 407],
       [ 27, 265, 533, 416, 492,  87, 476, 559,  21, 363],
       [ 65,  77, 231, 319, 287,  17, 388, 594,  13, 245],
       [ 42, 360, 184, 164, 491, 253, 491,  59,  34,  75],
       [104, 100, 551, 504, 107,  31, 524, 376, 121, 264],
       [245, 321,  70, 441, 276, 573, 455, 417, 389, 251],
       [528, 323,  83, 333, 514, 229,  58, 202, 342, 351],
       [ 33, 246, 452, 307,   7, 300, 334, 248, 397,   1],
       [268, 325,  26, 227,  74, 482, 122, 136, 237, 483]], dtype=uint16)

In [11]:
# Similarly we can do it for the Labels as well. Please note that below a different xr.DataArray was passed
product_file = "ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_1.tif"
dc.get_product_values(product_file, dc.get_xrarray(all_dvs[1]))

{'objects': [{'class': 'rand-b',
   'bbox': {'xmin': 1, 'ymin': 0, 'xmax': 5, 'ymax': 9}},
  {'class': 'rand-b', 'bbox': {'xmin': 8, 'ymin': 9, 'xmax': 8, 'ymax': 9}},
  {'class': 'rand-b', 'bbox': {'xmin': 6, 'ymin': 9, 'xmax': 8, 'ymax': 9}},
  {'class': 'rand-b', 'bbox': {'xmin': 9, 'ymin': 0, 'xmax': 9, 'ymax': 3}},
  {'class': 'rand-b', 'bbox': {'xmin': 5, 'ymin': 7, 'xmax': 9, 'ymax': 8}},
  {'class': 'rand-c', 'bbox': {'xmin': 9, 'ymin': 7, 'xmax': 9, 'ymax': 8}},
  {'class': 'rand-a', 'bbox': {'xmin': 6, 'ymin': 5, 'xmax': 6, 'ymax': 5}},
  {'class': 'rand-c', 'bbox': {'xmin': 2, 'ymin': 9, 'xmax': 2, 'ymax': 9}},
  {'class': 'rand-b', 'bbox': {'xmin': 2, 'ymin': 7, 'xmax': 6, 'ymax': 7}},
  {'class': 'rand-a', 'bbox': {'xmin': 5, 'ymin': 5, 'xmax': 8, 'ymax': 8}},
  {'class': 'rand-c', 'bbox': {'xmin': 5, 'ymin': 3, 'xmax': 5, 'ymax': 4}},
  {'class': 'rand-a', 'bbox': {'xmin': 9, 'ymin': 5, 'xmax': 9, 'ymax': 5}},
  {'class': 'rand-a', 'bbox': {'xmin': 0, 'ymin': 5, 'xmax': 7

### Metadata
One can easily fetch the metadata associated with `xr.Dataset`, `xr.DataArray`, or on a product level using `Datacube`

In [12]:
# fetch metadata associated with xr.Dataset. One can see that the metadata is empty as intended on dataset level.
dc.get_xrdataset_metadata()

{}

Let's see if we have metadata associated with our `xr.DataArrays`. We can check with the first data variable, that
is "Intensity". 
In the output cell below, we can see that it returns a dictionary object with a list of values. Metadata/attribute keys are a superset of all attributes found in all the images of the stack. A list is maintained against each key and values are appended against each image in the corresponding index of the stack. For missing keys, "None" is appended as value. 

In [13]:
# Let's see if we have metadata associated with our xr.DataArrays
# let's check with the first data variable
metadata = dc.get_xrarray_metadata(all_dvs[0])
# show only the first 5 keys
{k: metadata[k] for k in list(metadata)[:5]}

{'avg_scene_height': ['None',
  'None',
  '110.74176',
  '110.74176',
  'None',
  'None'],
 'product_file': ['None',
  'None',
  'ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_1.tif',
  'ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_0.tif',
  'None',
  'None'],
 'heading': ['None',
  'None',
  '349.91295192092355',
  '349.91295192092355',
  'None',
  'None'],
 'velX': ['None',
  'None',
  '[-4673.12223293 -4673.79355399 -4674.46481748 -4675.13602017\n -4675.80716526 -4676.47824955 -4677.14927463 -4677.82024208\n -4678.49114871 -4679.16199769 -4679.83278583 -4680.50351471\n -4681.17418593 -4681.84479628 -4682.51534894 -4683.18584072\n -4683.85627319 -4684.52664797 -4685.19696183 -4685.86721796\n -4686.53741317 -4687.20754904 -4687.87762716 -4688.54764432\n -4689.21760372 -4689.88750215 -4690.5573412  -4691.22712246\n -4691.89684272 -4692.56650517 -4693.23610662 -4693.90564864\n -4694.57513283 -4695.24455598 -4695.91392128 -4696.58322553\n -4697.25247031 -4697.9216572

In [14]:
# Similarly we can show for second data variable, the associated metadata. As shown in output cell, 
# it's simply a dictionary of product-files.
dc.get_xrarray_metadata(all_dvs[1])

{'product_file': ['None',
  'None',
  'ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_1.tif',
  'ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_0.tif',
  'None',
  'None']}

In [15]:
# Let's try to fetch metadata associated with a single product in xr.DataArray
product_file = "ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_0.tif"
dc.get_metadata_by_product(product_file, dc.get_xrarray(all_dvs[0]))

{'avg_scene_height': '110.74176',
 'product_file': 'ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_0.tif',
 'heading': '349.91295192092355',
 'velX': '[-4673.12223293 -4673.79355399 -4674.46481748 -4675.13602017\n -4675.80716526 -4676.47824955 -4677.14927463 -4677.82024208\n -4678.49114871 -4679.16199769 -4679.83278583 -4680.50351471\n -4681.17418593 -4681.84479628 -4682.51534894 -4683.18584072\n -4683.85627319 -4684.52664797 -4685.19696183 -4685.86721796\n -4686.53741317 -4687.20754904 -4687.87762716 -4688.54764432\n -4689.21760372 -4689.88750215 -4690.5573412  -4691.22712246\n -4691.89684272 -4692.56650517 -4693.23610662 -4693.90564864\n -4694.57513283 -4695.24455598 -4695.91392128 -4696.58322553\n -4697.25247031 -4697.92165722 -4698.59078306 -4699.259851\n -4699.92885785 -4700.59780519 -4701.26669461 -4701.93552292\n -4702.60429329 -4703.27300253 -4703.94165222 -4704.61024395\n -4705.27877453 -4705.94724712 -4706.61565854 -4707.28401038\n -4707.95230421 -4708.62053684 -4709

In [16]:
# Let's try to get the metadata associated with our Labels now, which should be just the product name itself.
product_file = "ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_0.tif"
dc.get_metadata_by_product(product_file, dc.get_xrarray(all_dvs[1]))

{'product_file': 'ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_0.tif'}

In [17]:
# finally we can save our datacube to required destination if needed as well
output_fpath = "./cube.nc"
dc.to_file(output_fpath, format="netCDF4")