In [None]:
pip install zarr

In [None]:
### Sentinel-2 SAFE to Zarr Conversion Workflow

# Introduction
"""
This Jupyter Notebook demonstrates how to convert Sentinel-2 imagery from the ESA's SAFE format to the cloud-native Zarr format.

Sentinel-2 data in SAFE format is structured in a hierarchical directory format, which can be inefficient for cloud-based workflows. Converting to Zarr provides advantages like efficient storage, chunked access, and parallel processing, making it ideal for large-scale remote sensing applications.

### Notebook Outline
1. **Downloading Sentinel-2 SAFE Data from Sentinel Hub**
2. **Understanding Sentinel-2 SAFE Format**
3. **Loading Sentinel-2 SAFE Data**
4. **Preprocessing and Reorganization**
5. **Converting to Zarr Format**
6. **Verifying and Inspecting Zarr Data**
7. **Conclusion and Further Reading**

Each section is linked for easy navigation. Let's begin!
"""

In [1]:
# Required Libraries
import os
import rasterio
import rioxarray
import xarray as xr
import zarr
import dask.array as da
import shutil
from pathlib import Path

In [3]:
# Section 1: Understanding Sentinel-2 SAFE Format
"""
Sentinel-2 Level-1C and Level-2A products are distributed in the SAFE format, which consists of a structured directory containing metadata and raster image files.

Key components of SAFE format:
- **Metadata (XML)**: Contains acquisition details, processing parameters, and geolocation data.
- **Granules**: Each SAFE product is divided into granules containing different bands.
- **JPEG2000 Bands (.jp2)**: The spectral bands are stored as JPEG2000 images.

More information: https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/data-formats
"""

# Section 2: Loading Sentinel-2 SAFE Data
"""
We'll start by locating a Sentinel-2 SAFE directory and listing the available bands.
"""

"\nWe'll start by locating a Sentinel-2 SAFE directory and listing the available bands.\n"

In [3]:
safe_dir = Path("C:/Users/Usuario/Desktop/Jobs/ThriveGEO/data/S2B_MSIL2A_20250307T103809_N0511_R008_T32ULC_20250307T125310.SAFE/GRANULE/L2A_T32ULC_A041790_20250307T103811/IMG_DATA/")

In [4]:
# Locate all .jp2 band files
band_files = list(safe_dir.glob("**/*.jp2"))

In [5]:
print(f"Found {len(band_files)} band files.")
print("Example file:", band_files[0])

Found 36 band files.
Example file: C:\Users\Usuario\Desktop\Jobs\ThriveGEO\data\S2B_MSIL2A_20250307T103809_N0511_R008_T32ULC_20250307T125310.SAFE\GRANULE\L2A_T32ULC_A041790_20250307T103811\IMG_DATA\R10m\T32ULC_20250307T103809_AOT_10m.jp2


In [9]:
# Section 3: Preprocessing and Reorganization
"""
Since Sentinel-2 bands are stored separately, we need to load them and stack them into a single xarray dataset.
We'll also ensure the coordinate reference system (CRS) and geotransform are consistent.
"""

"\nSince Sentinel-2 bands are stored separately, we need to load them and stack them into a single xarray dataset.\nWe'll also ensure the coordinate reference system (CRS) and geotransform are consistent.\n"

In [13]:
# Load bands into an xarray Dataset
def load_band(file_path):
    with rasterio.open(file_path) as src:
        band_data = src.read(1)  # Read first (and only) band
        return xr.DataArray(band_data, dims=("y", "x"), coords={
            "x": src.xy(0, range(src.width))[0],
            "y": src.xy(range(src.height), 0)[1],
        }, name=file_path.stem)

In [14]:
# Stack bands into an xarray Dataset
dataset = xr.Dataset({
    band.stem: load_band(band) for band in band_files
})

In [22]:
crs = rasterio.open(band_files[5]).crs

In [24]:
band_files[5]

WindowsPath('C:/Users/Usuario/Desktop/Jobs/ThriveGEO/data/S2B_MSIL2A_20250307T103809_N0511_R008_T32ULC_20250307T125310.SAFE/GRANULE/L2A_T32ULC_A041790_20250307T103811/IMG_DATA/R60m/T32ULC_20250307T103809_B05_60m.jp2')

In [25]:
dataset