# Deep dive into the Zarr format: Inside `Sentinel_1_SLC.zarr`

## Introduction
This tutorial introduces the structure of a `zarr` sample for **Sentinel 1 SLC** (Single Look Complex) radar data. We will demonstrate how to visualise the `.zarr` encoding structure, explore embedded information, and retrieve metadata for further processing.

### Prerequisites
A sample dataset for this tutorial can be obtained from the [EOPF available Samples](https://common.s3.sbg.perf.cloud.ovh.net/product.html). If further data sets want to be explored, the code indicates where the code needs to be updated.

For local **Sentinel 1 SLC** data exploration, the resource with the format `S01SIWSLC_....zarr` should be located and downloaded in the same directory as this example.

> **Note:** <br>
> Further sample descriptions will be included in subsequent notebook updates.<br>
> To look into the `.zarr` products naming, visit [the EOPF product types and file naming rules](https://cpm.pages.eopf.copernicus.eu/eopf-cpm/main/PSFD/3-product-types-naming-rules.html).<br>
<br>
> Names, can give some context of the type of product we are working with.
><br>

To manage the indicated libraries, it is recommended to work within a dedicated and stable set up. To ensure package compatibility and avoid conflicts, the following virtual environment setup is suggested:

For Conda:

`conda create --name zarr_explore python=3.11 os xarray zarr numpy jupyter`

For pip (for Windows):

`python -m venv .zarr_explore`<br>
`.zarr_explore\Scripts\activate.bat`<br>
`pip install os xarray zarr numpy jupyter`

### Setting up the environment
The `xarray` library facilitates the handling of labeled multi-dimensional arrays, enabling more efficient processing. This library will be explored in detail along [Chapter 3](). <br>
Check out their [documentation](https://docs.xarray.dev/en/stable/) for additional resources.

We then import the specific dependencies.

In [11]:
import os
import xarray as xr

To allow us retrieve only the names for each of the stored groups inside `zarr`, the subsequent function definition allows us looping and retrieving the names to be visualised at each main node in an efficient way. <br> 
This will allow general overview of the elements stored within them without the defaults `xarray` fine description.

In [12]:
def print_gen_structure(node, indent=""):
    print(f"{indent}{node.name}")     #allows us access each node
    for child_name, child_node in node.children.items(): #loops inside the selected nodes to extract naming
        print_gen_structure(child_node, indent + "  ") # prints the name of the selected nodes

From `xarray`, The `.open_datatree()` function enables access and decoding of a `DataTree` from a file-like object (in this case, the `.zarr` stored file), creating a tree node for each group within the file.

In [13]:
# Open the Zarr store with xarray as a DataTree
s1_zarr_sample= xr.open_datatree(
    'S01SIWSLC_20231201T170634_0027_A117_S27C_VH_IW1_249411.zarr',  # Substitute with the downloaded sample of your interest
    engine="zarr", # storage format
    chunks={}, # allows to open the default chunking
)

The following output displays the information contained inside the attributes, conditions, measurements, and quality main `.zarr` groups.

In [14]:
print('Zarr Sentinel 1 SLC')
print_gen_structure(s1_zarr_sample.root) 
print("-" * 30)

Zarr Sentinel 1 SLC
None
  conditions
    antenna_pattern
    attitude
    azimuth_fm_rate
    dc_estimate
    gcp
    orbit
    reference_replica
    replica
    terrain_height
  measurements
  quality
    calibration
    noise
    noise_azimuth
    noise_range
------------------------------


To have a finer visualisation of the `zarr` element, `xarray` also allows us to access a representation of the entire data content within the `.zarr` object. This visualisation displays each group defined inside the `.zarr` file and its respective arrays, including detailed information such as general metadata, dimensions, chunking geometry, and chunk size.

In [15]:
# Open the Zarr store with xarray and print the detailed structure.
# Run this lines in case the print() of the whole data set is of your interest.
# print("Dataset Structure:")
# print(s1_zarr_sample)
# print("-" * 30)

If we are  looking forward to extract specific information from a group, `xarray`'s lables allows us to retrieve by group, the information we are interested in. <br>
<br>
Lets say we are willing to visualise only the `elevation_angle` of retrieval inside this asset.<br>
We need to remember then, that according to the structure, it is located  inside the `antenna information`. The path or group where the `conditions/antenna_pattern` array is contained inside the `zarr`, will allow us to retrieve the group's information. <br>
We can visualise it:

In [16]:
# Retrieving the satellites antenna relevant conditions:
print(s1_zarr_sample['conditions/antenna_pattern'])

<xarray.DataTree 'antenna_pattern'>
Group: /conditions/antenna_pattern
    Dimensions:            (azimuth_time: 10, slant_range_time: 712)
    Coordinates:
      * azimuth_time       (azimuth_time) datetime64[ns] 80B 2023-12-01T17:06:35....
      * slant_range_time   (slant_range_time) float32 3kB 0.005336 ... 0.00569
    Data variables:
        elevation_angle    (azimuth_time, slant_range_time) float32 28kB dask.array<chunksize=(10, 712), meta=np.ndarray>
        elevation_pattern  (azimuth_time, slant_range_time) complex64 57kB dask.array<chunksize=(10, 712), meta=np.ndarray>
        incidence_angle    (azimuth_time, slant_range_time) float32 28kB dask.array<chunksize=(10, 712), meta=np.ndarray>
        roll               (azimuth_time) float64 80B dask.array<chunksize=(10,), meta=np.ndarray>
        terrain_height     (azimuth_time) float64 80B dask.array<chunksize=(10,), meta=np.ndarray>


It is important to point out that if we are willing to actually explore the groups and definition inside the `zarr`, we are able to take out the `print()` statement. <br>
This will enable the `xarray.DataTree` **Drop down** interface that will let us explore interactiveley group related metadata and information. <br>
We can viasualise each contained `array` and the `dtype`.

In [None]:
# Retrieving the same group in an interactve xarray.DataTree:
s1_zarr_sample['conditions/antenna_pattern']

Unnamed: 0,Array,Chunk
Bytes,27.81 kiB,27.81 kiB
Shape,"(10, 712)","(10, 712)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 27.81 kiB 27.81 kiB Shape (10, 712) (10, 712) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",712  10,

Unnamed: 0,Array,Chunk
Bytes,27.81 kiB,27.81 kiB
Shape,"(10, 712)","(10, 712)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,55.62 kiB,55.62 kiB
Shape,"(10, 712)","(10, 712)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 55.62 kiB 55.62 kiB Shape (10, 712) (10, 712) Dask graph 1 chunks in 2 graph layers Data type complex64 numpy.ndarray",712  10,

Unnamed: 0,Array,Chunk
Bytes,55.62 kiB,55.62 kiB
Shape,"(10, 712)","(10, 712)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,27.81 kiB,27.81 kiB
Shape,"(10, 712)","(10, 712)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 27.81 kiB 27.81 kiB Shape (10, 712) (10, 712) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",712  10,

Unnamed: 0,Array,Chunk
Bytes,27.81 kiB,27.81 kiB
Shape,"(10, 712)","(10, 712)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(10,)","(10,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 80 B 80 B Shape (10,) (10,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",10  1,

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(10,)","(10,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(10,)","(10,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 80 B 80 B Shape (10,) (10,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",10  1,

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(10,)","(10,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Inside this element, we are able to visualise the main data for the Sentinel 1 Mission, the SLC included inside the group `measurements`.
If we revise further inside each of them we will find the chunks containing the arrays with the reflectance information.

In [18]:
# Retrieving the SLC data inside .zarr:
s1_zarr_sample['/measurements']

Unnamed: 0,Array,Chunk
Bytes,11.73 kiB,11.73 kiB
Shape,"(1501,)","(1501,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 11.73 kiB 11.73 kiB Shape (1501,) (1501,) Dask graph 1 chunks in 2 graph layers Data type int64 numpy.ndarray",1501  1,

Unnamed: 0,Array,Chunk
Bytes,11.73 kiB,11.73 kiB
Shape,"(1501,)","(1501,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,177.30 kiB,177.30 kiB
Shape,"(22694,)","(22694,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 177.30 kiB 177.30 kiB Shape (22694,) (22694,) Dask graph 1 chunks in 2 graph layers Data type int64 numpy.ndarray",22694  1,

Unnamed: 0,Array,Chunk
Bytes,177.30 kiB,177.30 kiB
Shape,"(22694,)","(22694,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,259.89 MiB,34.36 MiB
Shape,"(1501, 22694)","(1501, 3000)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 259.89 MiB 34.36 MiB Shape (1501, 22694) (1501, 3000) Dask graph 8 chunks in 2 graph layers Data type complex64 numpy.ndarray",22694  1501,

Unnamed: 0,Array,Chunk
Bytes,259.89 MiB,34.36 MiB
Shape,"(1501, 22694)","(1501, 3000)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray


Additionally, through `s1_zarr_sample.attrs[]` we are able to visualise both the `stac_discovery` and `other_metadata`. <br>
<br>
For the properties inside `stac_discovery` for example:

In [19]:
# STAC metadata style:
s1_zarr_sample.attrs['stac_discovery']['properties']

{'created': '2022-01-06T05:31:50Z',
 'end_datetime': '2023-12-01_t17:07:00.671447',
 'eopf: instrument_swath': 'iw1',
 'eopf:collection': '',
 'eopf:data_take_id': 407067,
 'eopf:image_size': {},
 'eopf:instrument_mode': 7,
 'eopf:instrument_swath': '',
 'eopf:processing_baseline': '',
 'eopf:timeline': '',
 'eopf:type': '',
 'instrument': 'Synthetic Aperture Radar',
 'mission': 'Sentinel-1',
 'platform': 'Sentinel-1A',
 'product_type': '',
 'provider': [{'name': 'L2 RP Processor', 'roles': ['processor']},
  {'name': 'ESA', 'roles': ['producer']}],
 'sar:center_frequency': 5405000454.33435,
 'sar:frequency_band': '',
 'sar:instrument_mode': '',
 'sar:looks_equivalent_number': '',
 'sar:pixel_spacing_range': '',
 'sar:polarization': 'vh',
 'sar:product_type': '',
 'sar:resolution_azimuth': '',
 'sar:resolution_range': '',
 'sat:absolute_orbit_number': 51464,
 'sat:anx_datetime': '',
 'sat:orbit_state': 'ascending',
 'sat:relative_orbit_number': '',
 'start_datetime': '2023-12-01_t17:06:

And inside `other_metadata` the raw data analysis (to have a digestible print):

In [20]:
# Complementing metadata:
s1_zarr_sample.attrs['other_metadata']['raw_data_analysis']

{'raw_data_analysis': {'azimuth_time': '2023-12-01_t17:06:32.765215',
  'i_bias': 0.1381503939628601,
  'iq_gain_imbalance': 1.032348990440369,
  'iq_quadrature_departure': 0.005993634928017855,
  'q_bias': 0.3798973858356476,
  'support': {'i_bias_lower_bound': -0.004488567821681499,
   'i_bias_upper_bound': 0.004488567821681499,
   'i_bias_used_for_correction': 0.2398381978273392,
   'iq_gain_imbalance_used_for_correction': 0.9858453273773193,
   'iq_gain_lower_bound': 0.9988433122634888,
   'iq_gain_upper_bound': 1.00115704536438,
   'iq_quadrature_departure_lower_bound': -0.8596398830413818,
   'iq_quadrature_departure_upper_bound': 0.8716259002685547,
   'iq_quadrature_departure_used_for_correction': 0.08416946977376938,
   'q_bias_lower_bound': -0.004347919020801783,
   'q_bias_upper_bound': 0.004347919020801783,
   'q_bias_used_for_correction': 0.1212906986474991}}}

## Conclusion
This tutorial provides an initial understanding of the `zarr` structure for a Sentinel 1 SLC radar sample. <br>
<br>
By using the `xarray` library, one can effectively navigate and inspect the different components within the `zarr` format, including its metadata and array organisation.<br> 
This foundation will help deeply undestand the subsequent data analysis and processing workflows intended in our series.

For a deeper description of the metadata structure, follow the [metadata structure]() tutorial.