### Introduction

In this tutorial, we will compare the access and performance and size of the `.SAFE` Sentinel-2 L2A Items available at the [Copernicus Data Space Ecosystem](https://dataspace.copernicus.eu/) (CDSE) vs. the `.zarr` products directly from the [EOPF Sentinel Zarr Sample Service STAC Catalog](https://stac.browser.user.eopf.eodc.eu/?.language=en).

### What we will learn

- `.SAFE` access through the CDSE
- .zarr access through the STAC Catalog
- The 

### Prerequisites

It is advised that you go through the previous [section](33_eopf_stac_connection.ipynb), as it gives you an introduction on how to access a STAC catalog programmatically.

As we will be using the traditional `.SAFE` format for comparison, you will need to create a user account inside the [CDSE](https://dataspace.copernicus.eu/). If you have not yet registered, you can create an account [here](https://identity.dataspace.copernicus.eu/auth/realms/CDSE/login-actions/registration?execution=10f09889-a37b-4bbf-90c9-b833b16ddb55&client_id=account-console&tab_id=KPrlB_gk1KE). To register in the platform and learn about all the data you can access, follow [this tutorial](https://documentation.dataspace.copernicus.eu/Registration.html).<br>
This registration will allow you to generate tokens that will be needed to access and download the Sentinel data sets we are interested in.

#### Item of Interest

To compare the equivalent `.zarr` Item to the traditional `.SAFE` item, we will focus on the southern area of Lower Saxony, Germany.
The items that cover this geographical area of interest for this tutorial are:

The .SAFE product:
- `S2C_MSIL2A_20250415T102041_N0511_R065_T32UNC_20250415T160234.SAFE`

The .zarr product:
- `S2C_MSIL2A_20250415T102041_N0511_R065_T32UNC_20250415T160234.zarr`

It is important to point out that until the current re-engineering, one `.SAFE` item equals one `.zarr` item, meaning they cover the same extent and bounding box.

In the future, ESA is looking forward to developing an entire data set dedicated to zarr, and a new general format will be unified. These efforts are under current development.

To follow the ongoing discussion, visit the [CDSE Forum](https://forum.dataspace.copernicus.eu/) for the latest updates.


<hr>

#### Import libraries

In [1]:
import requests
import os
import getpass
import numpy as np
import time
import zipfile
import pystac
import matplotlib.pyplot as plt
#from typing import List, Optional, cast
from pystac import Collection, MediaType
from pystac_client import Client, CollectionClient
from datetime import datetime
import xarray as xr
import rioxarray
import s3fs
import xml.etree.ElementTree as ET

#### Helper functions

##### `get_access_token()`

The CDSE provides a tutorial on how to download via API, Items of interest for the Sentinel missions through the **OpenData Protocol** (OData).
To utilise your CDSE credentials and generate the tokens for your retrieval, this function will help us generate the needed tokens.

In [2]:
def get_access_token(username: str, password: str) -> str:
    data = {
        "client_id": "cdse-public", 
        "username": username, #input credentials
        "password": password,
        "grant_type": "password",
    }
    try:
        r = requests.post( # constructing the request
            "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token",
            data=data,
        )
        r.raise_for_status()
    except Exception as e:
        raise Exception(
            f"Access token creation failed. Reponse from the server was: {r.json()}"
        )
    return r.json()["access_token"]

<hr>

## Connection to the CDSE

Once an account is created, we are able to provide the designed credentials that will allow us to get the files of interest and download them directly to our local machine.

In [3]:
user_name = input("Enter your CDS username (email): ") 
def_pass = input("Enter your CDS password: ")


As we are willing to compare performance response, we will consider the running time this procedure takes us from providing the credentials until the data is accessed in our local working space.


In [4]:
st= time.time()   # Starts calculating the access time 
access_token = get_access_token(user_name, def_pass)

# Downloading the Item of interest:

url = f"https://download.dataspace.copernicus.eu/odata/v1/Products(5d369f6c-909e-44f6-8fef-3a8220ba13e1)/$value" # Item of interest with
                                                                                                                 # URL provided through CDSE
headers = {"Authorization": f"Bearer {access_token}"}
session = requests.Session()
session.headers.update(headers)
response = session.get(url, headers=headers, stream=True) # storing the requested file


And to access the data, we can store it locally through a `.zip` file, enhancing performance in the `.SAFE` store.

In [5]:
with open("sentinel_2_SAFE.zip", "wb") as file:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            file.write(chunk)

### .`SAFE` size

For having an overview of the structure and size of the selected file, we can unzip it and access each of the stored bands with the corresponding metadata of the item.

To rectify, we had accessed the correct file, we check the compressed characteristics:


In [6]:
#Unzip the archive
with zipfile.ZipFile("sentinel_2_SAFE.zip", "r") as zip_ref:
    zip_ref.extractall(".") #Extract to the current directory

#Find the extracted .SAFE folder in the current directory
safe_input = [f for f in os.listdir(".") if f.endswith(".SAFE")][0]
print(f"Extracted: {safe_input}")

Extracted: S2C_MSIL2A_20250415T102041_N0511_R065_T32UNC_20250415T160234.SAFE


The .SAFE format, is stored based on Granules, which store the available bands, at different resolutions, quality masks and the complementary metadata that allows for to reconstruction and management of the files through different **GIS** software or programming languages.

To calculate the size of the file, we add up the component sizes, resulting in:

In [7]:
safe_size = sum(os.path.getsize(os.path.join(dp, f)) for dp, dn, filenames in os.walk(safe_input) for f in filenames) / (1024**3)
et=time.time()    # Ends calculating the access time 

print(f"SAFE Directory Size: {safe_size:.3f} GB")
print(f'Running time: {(et-st):.3f} sec')

SAFE Directory Size: 0.698 GB
Running time: 138.723 sec


We need to consider that for the access of a whole `.SAFE` item and further individual bands retrieval through the CDSE, it is easier to download the whole element.


## Connection to the EOPF Zarr STAC Catalog

Our first step is to establish a connection to the path where the `.zarr `file is stored inside the EOPF Zarr STAC Catalog. This involves defining the `url` of the STAC endpoint. See the previous [section](./33_eopf_stac_connection.ipynb) for a more detailed explanation of how to retrieve the endpoint `url`.

As we are interested in accessing the proper `.zarr` file, we need to search for it inside the cloud storage location. To define the architecture of the storage, we define the endpoint for the STAC storage as:

In [8]:
st = time.time()
fs = s3fs.S3FileSystem(anon=True, client_kwargs={"endpoint_url": "https://objectstore.eodc.eu:2222"})

It is also important to define the object path, which indicates the bucket (assigned to the mission), the prefixes (for grouping) and the `.zarr` object we are interested in.

In [9]:
bucket = 'e05ab01a9d56408d82ac32d69a5aae2a:202504-s02msil2a/'
prefixes = '15/products/cpm_v256/'
zarr_product = 'S2C_MSIL2A_20250415T102041_N0511_R065_T32UNC_20250415T160234.zarr'

zarr_store = bucket + prefixes + zarr_product
print(zarr_store)

e05ab01a9d56408d82ac32d69a5aae2a:202504-s02msil2a/15/products/cpm_v256/S2C_MSIL2A_20250415T102041_N0511_R065_T32UNC_20250415T160234.zarr


To access the defined bucket, we need to prepare parameters to manipulate the internal system of the bucket, prepare the S3 request and adjust behaviour for the specific file.

In [11]:
handlers = fs.s3.meta.events._emitter._handlers
handlers_to_unregister = handlers.prefix_search("before-parameter-build.s3")
if handlers_to_unregister: # Check if there are handlers to unregister
    handler_to_unregister = handlers_to_unregister[0]
    fs.s3.meta.events._emitter.unregister("before-parameter-build.s3", handler_to_unregister)
else:
    print("No 'before-parameter-build.s3' handler found to unregister. This might be fine or indicate a different boto3/s3fs version.")

Once the parameters are defined, we can create the mapping definition of the store, so we can access the location as a "regular" file directory through the S3 API calls and get back the response of the `.zarr` file:

In [12]:
s3_map = s3fs.S3Map(root=zarr_store, s3=fs)

try:
    # Use xarray.open_zarr with the S3Map
    zarr_input = xr.open_zarr(s3_map)
    print("Successfully opened Zarr store!")
except Exception as e:
    print(f"Error opening Zarr store: {e}")

Successfully opened Zarr store!


### `.zarr` size

Then, calculating the size and retrieval time results in:

In [13]:
zarr_size = (fs.du(zarr_store)) / (1024**3)
et = time.time()
print(f"Size: {zarr_size:.3f} GB")
print(f'Running time: {(et-st):.3f} sec')

Size: 0.845 GB
Running time: 1.457 sec


As we can see, the `.zarr` compression is similar in size (slightly larger) than the `.SAFE` file, but when it comes to the time it takes us to access the data, it is reflected that .zarr is responding way faster.

And, the most relevant difference, we are not even storing locally the `.zarr`, all the calculations occur in go, without the need to locally store the file.

## Metadata Access

### `.SAFE`

The metadata available for each of the `.SAFE` products is complemented by 2 files that are found inside the store; the `MTD_MSIL2A.xml` and `MTD_TL.xml` files.

Inside them, we can find essential information about the dataset, such as:

- Satellite Name & Sensor Type
- Processing Level & Projection
- Tile ID and Sensing Time
- Cloud Coverage Information

To access such files, we can position ourselves inside the `.SAFE` store and get the `.xml` tree structure:

In [14]:
st=time.time()

granule_path = os.path.join(safe_input, "GRANULE")
granules = os.listdir(granule_path)
granule_folder = os.path.join(granule_path, granules[0], "IMG_DATA")

#Retrieves main metadata set
metadata_file = os.path.join(safe_input, "MTD_MSIL2A.xml")

#Restrieves specific metadata from the granules stored in .SAFE
dtree = ET.parse(os.path.join(safe_input, "GRANULE", os.listdir(os.path.join(safe_input, "GRANULE"))[0], "MTD_TL.xml"))

#Acceses the .xml file so we can parse it and search for specific strings.
tree = ET.parse(metadata_file)
root = tree.getroot()
droot = dtree.getroot()

Once the metadata is extracted to a `.xml` `ElementTree`, we can parse it and retrieve the specific information of our interest.

In case we are interested in visualising the whole structure, we can run both the hashed codes and explore line by line the included metadata of the `.SAFE` product.

In [15]:
print("--- Content of MTD_MSIL2A.xml (root element) ---")
if root is not None:

    # The entire MTD_MSIL2A.XML structure of the root element
    print("\n--- XML of MTD_MSIL2A.xml (root element) ---")
    ET.indent(tree, space="  ") # Adds indentation for readability
    print(ET.tostring(root, encoding='unicode', xml_declaration=True))

else:
    print("Root element for MTD_MSIL2A.xml not loaded.")


# print("\n\n--- Content of MTD_TL.xml (droot element) ---")
# if droot is not None:
#     # The entire MTD_TL.XML structure of the droot element
#     print("\n--- XML of MTD_TL.xml (droot element) ---")
#     if dtree:
#         ET.indent(dtree, space="  ")
#         print(ET.tostring(droot, encoding='unicode', xml_declaration=True))
#     else:
#         # if dtree wasn't created (e.g., if MTD_TL.xml was not found)
#         print(ET.tostring(droot, encoding='unicode'))

# else:
#     print("droot element for MTD_TL.xml not loaded.")


--- Content of MTD_MSIL2A.xml (root element) ---

--- XML of MTD_MSIL2A.xml (root element) ---
<?xml version='1.0' encoding='utf-8'?>
<ns0:Level-2A_User_Product xmlns:ns0="https://psd-15.sentinel2.eo.esa.int/PSD/User_Product_Level-2A.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://psd-15.sentinel2.eo.esa.int/PSD/User_Product_Level-2A.xsd">
  <ns0:General_Info>
    <Product_Info>
      <PRODUCT_START_TIME>2025-04-15T10:20:41.026Z</PRODUCT_START_TIME>
      <PRODUCT_STOP_TIME>2025-04-15T10:20:41.026Z</PRODUCT_STOP_TIME>
      <PRODUCT_URI>S2C_MSIL2A_20250415T102041_N0511_R065_T32UNC_20250415T160234.SAFE</PRODUCT_URI>
      <PROCESSING_LEVEL>Level-2A</PROCESSING_LEVEL>
      <PRODUCT_TYPE>S2MSI2A</PRODUCT_TYPE>
      <PROCESSING_BASELINE>05.11</PROCESSING_BASELINE>
      <PRODUCT_DOI>https://doi.org/10.5270/S2_-znk9xsj</PRODUCT_DOI>
      <GENERATION_TIME>2025-04-15T16:02:34.000000Z</GENERATION_TIME>
      <PREVIEW_IMAGE_URL>Not applicable</PREVIEW_I

The metadata that provides the information of our interest inside the `.xml` files delivers the following information:

In [16]:
safe_metadata = {
    "product_id" :root.findtext(".//PRODUCT_URI"),
    "satellite": root.findtext(".//SPACECRAFT_NAME"),
    "processing_level": root.findtext(".//PROCESSING_LEVEL"),    
    "product_type": root.findtext(".//PRODUCT_TYPE"),
    "instrument": root.findtext(".//DATATAKE_TYPE"),    
    "instrument_mode": root.findtext(".//DATATAKE_TYPE"),
    "epsg": droot.findtext(".//HORIZONTAL_CS_CODE"),
    "date_of_retrieval": root.findtext(".//GENERATION_TIME")[:10],
    "sensing_time": root.findtext(".//PRODUCT_START_TIME")[11:19],
    "cloud_coverage": droot.findtext(".//CLOUDY_PIXEL_OVER_LAND_PERCENTAGE")
}

et = time.time()

print(f'Running time: {(et-st):.3f} sec')
print(f'Some of the .SAFE selected metadata:')
safe_metadata

Running time: 0.012 sec
Some of the .SAFE selected metadata:


{'product_id': 'S2C_MSIL2A_20250415T102041_N0511_R065_T32UNC_20250415T160234.SAFE',
 'satellite': 'Sentinel-2C',
 'processing_level': 'Level-2A',
 'product_type': 'S2MSI2A',
 'instrument': 'INS-NOBS',
 'instrument_mode': 'INS-NOBS',
 'epsg': 'EPSG:32632',
 'date_of_retrieval': '2025-04-15',
 'sensing_time': '10:20:41',
 'cloud_coverage': '25.952634'}

As we see, the metadata is consolidated for the whole `.SAFE` product, and when we are interested in specific parts, we should look for the specific information over the whole dataset.

### `.zarr`

For the `.zarr` file case, we can access directly the metadata from the selected item where we calculated the time through the `.attrs` parameter, indicating the group it belongs to, resulting in a group and a processing time such as the following:

In [17]:
st = time.time()

zarr_metadata = {
    "product_id": zarr_input.attrs["stac_discovery"]["id"],
    "mission_id": zarr_input.attrs["stac_discovery"]["properties"]["platform"],

    "processing_level": zarr_input.attrs["stac_discovery"]["properties"]["processing:level"],
    
    "product_type": zarr_input.attrs["stac_discovery"]["properties"]['product:type'],
    "instrument":zarr_input.attrs["stac_discovery"]["properties"]['instrument'],
    "instrument_mode": zarr_input.attrs["stac_discovery"]["properties"]['eopf:instrument_mode'],
    "epsg": zarr_input.attrs["stac_discovery"]["properties"]['proj:epsg'],

    "processing_time": zarr_input.attrs["stac_discovery"]["properties"]["created"][:10],
    "cloud_cover ": zarr_input.attrs["stac_discovery"]["properties"]["eo:cloud_cover"]
}
et = time.time()

print(f'Running time: {(et-st):.3f} sec')
print(f'Some of the .zarr selected metadata:')
zarr_metadata

Running time: 0.000 sec
Some of the .zarr selected metadata:


{'product_id': 'S2C_MSIL2A_20250415T102041_N0511_R065_T32UNC_20250415T160234.SAFE',
 'mission_id': 'sentinel-2c',
 'processing_level': 'L2A',
 'product_type': 'S02MSIL2A',
 'instrument': 'msi',
 'instrument_mode': 'INS-NOBS',
 'epsg': 32632,
 'processing_time': '2025-04-15',
 'cloud_cover ': 25.95717}

We can see that the time to process the request is fast.

It is important to point out that all the `.zarr` based requests are made through the connection to the bucket where the `.zarr` objects are stored.

This storage methodology highlights how the performance can be elevated through faster responses on EO workflows.

## 💪 Now it is your turn

With the foundations learned so far, you are now equipped to access products from the EOPF Zarr STAC catalog. These are your tasks:

### Task 1: 

### Task 2: 

### Task 3: 


## Conclusion

In this section we compared....


## What's next?

This online resource is under active development. So stay tuned for regular updates.