# Discover EOPF Zarr - Sentinel-2 L2A

## Introduction
This tutorial introduces you to the structure of a Zarr sample for **Sentinel-2 L2A** data. We will demonstrate how to access and open a Zarr product sample with `xarray`, how to visualise the `zarr` encoding structure, explore embedded information, and retrieve relevant metadata for further processing.

### Prerequisites
This tutorial uses a re-processed sample dataset frome the EOPF Sample Service that is available for download [here](https://common.s3.sbg.perf.cloud.ovh.net/product.html). 

The Zarr product downloaded is a Sentinel-2 L2A tile from 29 June 2023 (file name: `S02MSIL2A_20230629T120347_0000_A064_TC64.zarr.zip`). To reproduce the example, you can download the data product and store in the same folder as this notebook.

<hr>

#### Import libraries

In [None]:
import os
import xarray as xr

#### Helper functions

##### `print_gen_structure`
This function helps us to retrieve an visualise the names for each of the stored groups inside a `zarr`. As an output, it will print a general overview of elements inside the `zarr`.

In [2]:
def print_gen_structure(node, indent=""):
    print(f"{indent}{node.name}")     #allows us access each node
    for child_name, child_node in node.children.items(): #loops inside the selected nodes to extract naming
        print_gen_structure(child_node, indent + "  ") # prints the name of the selected nodes

<hr>

## Open a Zarr Store

In a first step, we use the function `open_datatree()` from the `xarray` library to open a Zarr store as a DataTree.

ADD DESCRIPTION OF KWARGS

**Note**: The final print of the `DataTree` object is commented out, as the display can be quite verbose, showing the entire content within the Zarr. An alternative is to apply a helper function that only displays the higher level structure as shown in the next code cell.

In [None]:
# Open the Zarr store with xarray as a DataTree
s2l2a_zarr_sample= xr.open_datatree(
    "S02MSIL2A_20230629T120347_0000_A064_TC64.zarr",
    engine="zarr", # storage format
    chunks={}, # allows to open the default chunking
)

# s2l2a_zarr_sample

If you apply the helper function `print_gen_structure` on the root of the DataTree object, you will get a listing of the tree-like structure of the object. You see all Zarr groups, such as `measurements`, `quality` and `conditions`, their sub-groups and content.

In [None]:
print("Zarr Sentinel 2 L2A Structure")
print_gen_structure(s2l2a_zarr_sample.root) 
print("-" * 30)

Zarr Sentinel 2 L2A Structure
None
  conditions
    geometry
    mask
      detector_footprint
        r10m
        r20m
        r60m
      l1c_classification
        r60m
      l2a_classification
        r20m
        r60m
    meteorology
      cams
      ecmwf
  measurements
    reflectance
      r10m
      r20m
      r60m
  quality
    atmosphere
      r10m
      r20m
      r60m
    l2a_quicklook
      r10m
      r20m
      r60m
    mask
      r10m
      r20m
      r60m
    probability
      r20m
------------------------------


## Extract information from Zarr groups

In a next step, we can explore the content of individual Zarr groups. By specifying the name of the group and subgroup and adding it into square brackets, we can extract the content of the relevant group. Let us for example extract the content of the subgroup `reflectance` under `measurements`.

As a result, you will see that there are three subgroups of the parent node `measurements/reflectance`: `r10`, `r20` and `r60`, which are the DataArrays with the three different resolutions of the Sentinel-2 L2A data.

The `xarray.DataTree` structure allows the exploration of additional group-related metadata and information. For example, you can find the `chunksize` of each array and the coordinates.

In [None]:
# Retrieving the reflectance groups:
s2l2a_zarr_sample["measurements/reflectance"]

<xarray.DataTree 'reflectance'>
Group: /measurements/reflectance
├── Group: /measurements/reflectance/r10m
│       Dimensions:  (y: 10980, x: 10980)
│       Coordinates:
│         * x        (x) int32 44kB 300005 300015 300025 300035 ... 409775 409785 409795
│         * y        (y) int32 44kB 4600015 4600005 4599995 ... 4490245 4490235 4490225
│       Data variables:
│           b02      (y, x) float64 964MB dask.array<chunksize=(1830, 1830), meta=np.ndarray>
│           b03      (y, x) float64 964MB dask.array<chunksize=(1830, 1830), meta=np.ndarray>
│           b04      (y, x) float64 964MB dask.array<chunksize=(1830, 1830), meta=np.ndarray>
│           b08      (y, x) float64 964MB dask.array<chunksize=(1830, 1830), meta=np.ndarray>
├── Group: /measurements/reflectance/r20m
│       Dimensions:  (y: 5490, x: 5490)
│       Coordinates:
│         * x        (x) int32 22kB 300010 300030 300050 300070 ... 409750 409770 409790
│         * y        (y) int32 22kB 4600010 4599990 4599970 .

## Extract Zarr metadata on different levels

Through `s2l2a_zarr_sample.attrs[]` we are able to visualise both the `stac_discovery` and `other_metadata`. <br>
<br>
For the properties inside `stac_discovery` for example:

In [None]:
# STAC metadata style:
s2l2a_zarr_sample.attrs["stac_discovery"]['properties']

{'created': '2023-06-29T12:03:47+00:00',
 'datetime': '2018-08-20T08:36:01.024Z',
 'end_datetime': '2018-08-20T08:36:01.024000+00:00',
 'eo:bands': [{'center_wavelength': 442.7,
   'common_name': 'coastal',
   'full_width_half_max': 0.02,
   'name': 'b01',
   'solar_illumination': 1884.69},
  {'center_wavelength': 492.7,
   'common_name': 'blue',
   'full_width_half_max': 0.065,
   'name': 'b02',
   'solar_illumination': 1959.66},
  {'center_wavelength': 559.8,
   'common_name': 'green',
   'full_width_half_max': 0.035,
   'name': 'b03',
   'solar_illumination': 1823.24},
  {'center_wavelength': 664.6,
   'common_name': 'red',
   'full_width_half_max': 0.03,
   'name': 'b04',
   'solar_illumination': 1512.06},
  {'center_wavelength': 704.1,
   'common_name': 'rededge',
   'full_width_half_max': 0.015,
   'name': 'b05',
   'solar_illumination': 1424.64},
  {'center_wavelength': 740.5,
   'common_name': 'rededge',
   'full_width_half_max': 0.015,
   'name': 'b06',
   'solar_illumination'

And from `other_metadata`, we are able to retrieve the information specific to a band, for example the **red** reflectance band `b04`:

In [None]:
# Complementing metadata:
s2l2a_zarr_sample.attrs["other_metadata"]['band_description']['b04']

{'bandwidth': 30.0,
 'central_wavelength': 664.6,
 'onboard_compression_rate': '2.97',
 'onboard_integration_time': '1.3872929',
 'physical_gain': '4.51587741',
 'spectral_response_step': '1',
 'spectral_response_values': '0.00141521 0.02590238 0.11651178 0.39088616 0.74959342 0.94485805 0.98011173 0.99406309 1 0.99545475 0.99052772 0.97733476 0.94055988 0.87894956 0.81629384 0.77345952 0.75448766 0.75991531 0.7826343 0.8101689 0.83612975 0.86125424 0.88609106 0.91138767 0.93405146 0.95042063 0.9592573 0.96039555 0.95913395 0.95809013 0.95527459 0.94376465 0.89490799 0.74426308 0.476777 0.22960399 0.08009118 0.02617076 0.00415242',
 'units': 'nm',
 'wavelength_max': 684.0,
 'wavelength_min': 646.0}

## Now it is your turn

PROVIDE A TASK THEY CAN REPEAT

## Conclusion
This tutorial provides an initial understanding of the `zarr` structure for a Sentinel-2 L2A sample.

By using the `xarray` library, we can effectively navigate and inspect the different components within the `zarr` format, including its metadata and array organisation. 
This foundation will help deeply undestand the subsequent data analysis and processing workflows intended in our series.