<a href="https://jupyterhub.user.eopf.eodc.eu/hub/user-redirect/git-pull?repo=https://github.com/eopf-toolkit/eopf-101&branch=main&urlpath=lab/tree/eopf-101/41_rio_tiler_s2_fundamentals.ipynb" target="_blank">
  <button style="background-color:#0072ce; color:white; padding:0.6em 1.2em; font-size:1rem; border:none; border-radius:6px; margin-top:1em;">
    üöÄ Launch this notebook in JupyterLab
  </button>
</a>

## Introduction

This notebook demonstrates efficient tiling workflows with EOPF Zarr data using **rio-tiler** and **rio-xarray**. We'll showcase how direct Zarr access with proper chunking delivers superior performance for web mapping and visualization tasks.

**Rio-tiler** is a powerful Python library designed for creating map tiles from raster data sources. Combined with EOPF Zarr's cloud-optimized format, it enables efficient tile generation for web mapping applications without downloading entire datasets.

## What we will learn

- üó∫Ô∏è How to integrate rio-tiler with EOPF Zarr datasets
- üé® Generate map tiles (RGB and false color composites) from Sentinel-2 data
- üìä Understand the relationship between Zarr chunks and tile performance
- ‚ö° Observe memory usage patterns for large optical datasets
- üåç Create interactive web map visualizations

## Prerequisites

This tutorial builds on concepts from previous sections:
- [Understanding Zarr Structure](24_zarr_struct_S2L2A.ipynb) - Sentinel-2 data organization
- [STAC and xarray Tutorial](44_eopf_stac_xarray_tutorial.ipynb) - Accessing EOPF data
- [Zarr Chunking Strategies](sections/2x_about_eopf_zarr/253_zarr_chunking_practical.ipynb) - Chunking fundamentals

As rio-tiler is extensively used in this notebook, familiarity with its core concepts is beneficial. Refer to the [rio-tiler documentation](https://docs.rio-tiler.io/en/latest/) for more details.

![rio-tiler with EOPF Zarr](img/rio-tiler.png)

**Required packages**: `rio-tiler`, `rio-xarray`, `xarray`, `zarr`, `pystac-client`

<hr>

# Section 1: Direct Zarr Access Setup

We'll start by connecting to the EOPF STAC catalog and loading a Sentinel-2 L2A dataset with its native Zarr chunking configuration.

### Import libraries

In [None]:
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
from pystac_client import Client
from pystac import MediaType
from rio_tiler.io import XarrayReader
from rio_tiler.models import ImageData
import rioxarray # used through `.rio` accessor
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries imported successfully")

### Connect to EOPF STAC Catalog

We'll search for a cloud-free Sentinel-2 L2A scene over a test region.

In [None]:
# Connect to EOPF STAC API
eopf_stac_api_root = "https://stac.core.eopf.eodc.eu/"
catalog = Client.open(url=eopf_stac_api_root)

# Search for Sentinel-2 L2A over Napoli during summer 2025
search_results = catalog.search(
    collections='sentinel-2-l2a',
    bbox=(14.268124, 40.835933, 14.433823, 40.898202),  # Napoli AOI
    datetime='2025-06-01T00:00:00Z/2025-09-30T23:59:59Z',  # Summer 2025
    max_items=1,
    filter={
        "op": "and",
        "args": [
            {
                "op": "lte",
                "args": [
                    {"property": "eo:cloud_cover"},
                    5  # Cloud cover less than or equal to 10%
                ]
            }
        ]
    },
    filter_lang='cql2-json'
)

# Get first item
items = list(search_results.items())
if not items:
    raise ValueError("No items found. Try adjusting the search parameters.")

item = items[0]
print(f"üì¶ Found item: {item.id}")
print(f"üìÖ Acquisition date: {item.properties.get('datetime', 'N/A')}")

### Open Zarr Dataset with xarray

We'll use xarray's `open_datatree()` to access the hierarchical EOPF Zarr structure directly from cloud storage.

In [None]:
# Get Zarr URL from STAC item
item_assets = item.get_assets(media_type=MediaType.ZARR)
zarr_url = item_assets['product'].href
print(f"üåê Zarr URL: {zarr_url}")

# Open with xarray DataTree
dt = xr.open_datatree(
    zarr_url,
    engine="zarr",
    chunks="auto"  # Use existing Zarr chunks
)

print("\nüìÇ Available groups in DataTree:")
for group in sorted(dt.groups):
    if dt[group].ds.data_vars:
        print(f"  {group}: {list(dt[group].ds.data_vars.keys())}")

### Explore Sentinel-2 Band Structure

Sentinel-2 L2A provides bands at three spatial resolutions:
- **10m**: B02 (Blue), B03 (Green), B04 (Red), B08 (NIR)
- **20m**: B05, B06, B07, B8A, B11, B12
- **60m**: B01, B09, B10

Let's examine the 10m resolution group, which we'll use for RGB visualization.

In [None]:
# Access 10m resolution bands
ds_10m = dt['/measurements/reflectance/r10m'].to_dataset()

print("\nüîç 10m Resolution Dataset:")
print(f"Dimensions: {dict(ds_10m.dims)}")
print(f"Bands: {list(ds_10m.data_vars.keys())}")
print(f"Coordinates: {list(ds_10m.coords.keys())}")

# Check chunking configuration
if 'b04' in ds_10m:
    chunks = ds_10m['b04'].chunks
    print(f"\nüì¶ Current chunk configuration: {chunks}")
    print(f"   Y-axis chunks: {chunks[0] if len(chunks) > 0 else 'N/A'}")
    print(f"   X-axis chunks: {chunks[1] if len(chunks) > 1 else 'N/A'}")

### Extract and Set CRS Information

EOPF Zarr stores CRS information in the root DataTree attributes under `other_metadata.horizontal_CRS_code`. We need to extract this and set it using rioxarray for rio-tiler compatibility.

In [None]:
# Extract CRS from EOPF metadata (following geozarr.py approach)
epsg_code_full = dt.attrs.get("other_metadata", {}).get("horizontal_CRS_code", "EPSG:4326")
epsg_code = epsg_code_full.split(":")[-1]  # Extract numeric part (e.g., "32632" from "EPSG:32632")

print(f"üìç Extracted CRS from EOPF metadata: EPSG:{epsg_code}")
print(f"   Full code: {epsg_code_full}")

# Set CRS on the dataset using rioxarray
ds_10m.rio.write_crs(f"epsg:{epsg_code}", inplace=True)

print(f"\n‚úÖ CRS set successfully on dataset")

In [None]:
# Verify CRS and geospatial metadata
crs = ds_10m.rio.crs
bounds = ds_10m.rio.bounds()
transform = ds_10m.rio.transform()

print(f"\nüåç Geospatial Metadata:")
print(f"   CRS: {crs}")
print(f"   EPSG Code: {crs.to_epsg()}")
print(f"   Bounds (left, bottom, right, top): {bounds}")
print(f"   Transform: {transform}")
print(f"\n   Width: {ds_10m.dims['x']} pixels")
print(f"   Height: {ds_10m.dims['y']} pixels")

### Verify Geospatial Metadata

Now we can access CRS, bounds, and transform information through rioxarray. This is essential for rio-tiler integration.

# Section 2: Rio-tiler Integration Basics

Now we'll integrate rio-tiler to generate map tiles from our Zarr dataset.

### Prepare DataArray for Rio-tiler

Rio-tiler's `XarrayReader` works with **DataArrays**, not Datasets. We need to stack our RGB bands into a single DataArray with a 'band' dimension.

In [None]:
# Verify dataset is ready
if ds_10m.rio.crs is None:
    raise ValueError("CRS not set! Check previous steps.")

# Stack RGB bands into a single DataArray for rio-tiler
# XarrayReader requires a DataArray, not a Dataset
rgb_bands = xr.concat(
    [ds_10m['b04'], ds_10m['b03'], ds_10m['b02']],  # Red, Green, Blue
    dim='band'
).assign_coords(band=['red', 'green', 'blue'])

# Preserve CRS information
rgb_bands = rgb_bands.rio.write_crs(ds_10m.rio.crs)

print("‚úÖ DataArray prepared for rio-tiler")
print(f"   CRS: {rgb_bands.rio.crs}")
print(f"   Shape: {rgb_bands.shape} (band, y, x)")
print(f"   Bands: {list(rgb_bands.coords['band'].values)}")

### Generate True Color RGB Tile

We'll create a Web Mercator tile (zoom 12) showing true color composite (B04-Red, B03-Green, B02-Blue).

In [None]:
# Create RGB composite using XarrayReader with our stacked DataArray
with XarrayReader(rgb_bands) as src:
    # Get dataset info
    print(f"\nüìä Dataset Info:")
    print(f"   CRS: {src.crs}")
    print(f"   Bounds: {src.bounds}")
    print(f"   Available bands: {src.band_names}")
    
    # Get a tile at zoom level 12 for Napoli area
    # Data is rescaled for visualization
    tile = src.tms.tile(14.23, 40.83, 12)
    rgb_data = src.tile(tile.x, tile.y, 12, tilesize=512).rescale(((0, 0.4),))

print(f"\n‚úÖ RGB tile generated:")
print(f"   CRS: {rgb_data.crs}")
print(f"   Bounds: {rgb_data.bounds}")
print(f"   Shape: {rgb_data.data.shape}")
print(f"   Data type: {rgb_data.data.dtype}")

### Visualize True Color Composite

Let's visualize the RGB composite with histogram stretching for better contrast.

In [None]:
plt.figure(figsize=(10, 10))
plt.imshow(rgb_data.array.transpose(1, 2, 0))
plt.title('Sentinel-2 L2A True Color Composite (B04-B03-B02)', fontsize=14, fontweight='bold')
plt.xlabel('X-coordinate')
plt.ylabel('Y-coordinate')
plt.grid(False)
plt.tight_layout()
plt.show()

# Section 3: Understanding the Data Flow

Let's examine how Zarr chunks relate to tile generation and performance.

### Visualize Chunk Grid and Tile Requests

The key to understanding chunk performance is seeing **where** tiles land relative to chunk boundaries. Let's create a visualization showing the chunk grid overlaid on actual data.

In [None]:
# Create a smaller test dataset for visualization
# First create a subset, then rechunk it to show visible chunk grid

subset_size = 2048
ds_subset_original = ds_10m.isel(x=slice(0, subset_size), y=slice(0, subset_size))

print(f"üì¶ Original Subset:")
print(f"   Size: {subset_size}√ó{subset_size} pixels")
print(f"   Original chunks: {ds_subset_original['b04'].chunks}")

# Rechunk to a size that will create a visible grid (512√ó512)
ds_subset = ds_subset_original.chunk({'y': 512, 'x': 512})

print(f"\nüì¶ Rechunked Test Dataset:")
print(f"   Size: {subset_size}√ó{subset_size} pixels")
print(f"   New chunks: {ds_subset['b04'].chunks}")
print(f"   Bounds: {ds_subset.rio.bounds()}")

# Get chunk information
if ds_subset['b04'].chunks:
    chunk_y = ds_subset['b04'].chunks[0][0]
    chunk_x = ds_subset['b04'].chunks[1][0]
    n_chunks_y = len(ds_subset['b04'].chunks[0])
    n_chunks_x = len(ds_subset['b04'].chunks[1])
    
    print(f"\nüß© Chunk Structure:")
    print(f"   Chunk size: {chunk_y}√ó{chunk_x} pixels")
    print(f"   Number of chunks: {n_chunks_y}√ó{n_chunks_x} = {n_chunks_y * n_chunks_x} total")
    print(f"   Chunk size per band: ~{(chunk_y * chunk_x * 2) / (1024**2):.2f} MB")
    
print("\nNote: We rechunked to 512√ó512 to create a visible grid for demonstration.")

In [None]:
from zarr_tiling_utils import visualize_chunks_and_tiles

# Visualize with test dataset (now properly rechunked to 512√ó512)
tile_info = visualize_chunks_and_tiles(ds_subset, tile_size=256, num_sample_tiles=4)

### Compare Chunking Strategies

Now let's create datasets with different chunking strategies and see how they affect tile access patterns. We assume the requested tiles CRS are aligned with the dataset CRS.

We'll test the following scenarios:
1. **Aligned**: Chunks match tile size (256x256)
2. **Larger**: Chunks much larger than tiles (1024x1024)
3. **Misaligned**: Chunks don't align with tiles (300x300)
4. **Misaligned Large**: Large chunks that don't align with tiles (700x700)
5. **Smaller**: Chunks smaller than tiles (128x128)
6. **Misaligned Small**: Very small chunks that don't align with tiles (100x100)

In [None]:
# Create three versions of the subset with different chunking strategies
strategies = {
    'Aligned (256x256)': {'chunks': {'y': 256, 'x': 256}},
    'Larger (1024x1024)': {'chunks': {'y': 1024, 'x': 1024}},
    'Misaligned (300x300)': {'chunks': {'y': 300, 'x': 300}},
    'Misaligned (700x700)': {'chunks': {'y': 700, 'x': 700}},
    'Smaller (128x128)': {'chunks': {'y': 128, 'x': 128}},
    'Misaligned Small (100x100)': {'chunks': {'y': 100, 'x': 100}}
}

# Rechunk datasets in memory
ds_variants = {}
for name, config in strategies.items():
    ds_rechunked = ds_subset.chunk(config['chunks'])
    ds_variants[name] = ds_rechunked
    
    chunk_info = ds_rechunked['b04'].chunks
    print(f"{name}:")
    print(f"  Chunks: {chunk_info[0][0]}x{chunk_info[1][0]}")
    print(f"  Number of chunks: {len(chunk_info[0])}√ó{len(chunk_info[1])} = {len(chunk_info[0]) * len(chunk_info[1])}")
    print()

### Visualize Chunking Strategy Comparison

Let's visualize how a single tile request maps to chunks in each strategy:

In [None]:
from zarr_tiling_utils import compare_chunking_strategies

results = compare_chunking_strategies(ds_variants, tile_size=256, tile_x=512, tile_y=512)

## Conclusion

In this notebook, we've learned:

1. ‚úÖ How to **extract CRS from EOPF metadata** and set it for rioxarray compatibility
2. ‚úÖ How to **prepare DataArrays for rio-tiler** by stacking bands with `xr.concat()`
3. ‚úÖ Generated RGB color map tiles from Sentinel-2 data
4. ‚úÖ **Visualized spatial relationships** between Zarr chunks and tile requests
5. ‚úÖ **Compared chunking strategies** and their impact on tile access patterns
6. ‚úÖ Understood when to rechunk for optimal tiling performance

### Key Takeaways

- **Spatial alignment is everything**: Tile positions relative to chunk boundaries determine efficiency
- **1 chunk per tile is optimal**: Match chunk size to tile size when possible (e.g., 256√ó256)
- **2-4 chunks is acceptable**: Provides good balance for multi-zoom support
- **EOPF's default chunking**: Optimized for bulk processing (~4096px), not web tiling
- **Rechunking trade-offs**: Smaller chunks = more chunks to manage but better tile efficiency
- **CRS matters**: Data CRS (UTM) vs tile CRS (Web Mercator) affects spatial queries

### Important Patterns for EOPF Data

```python
# 1. Extract CRS from EOPF metadata
epsg_code = dt.attrs["other_metadata"]["horizontal_CRS_code"].split(":")[-1]
ds = ds.rio.write_crs(f"epsg:{epsg_code}")

# 2. Stack bands for rio-tiler
stacked_bands = xr.concat([ds['b04'], ds['b03'], ds['b02']], dim='band')
stacked_bands = stacked_bands.rio.write_crs(ds.rio.crs)

# 3. Rechunk for tiling (if needed)
ds_tiling = ds.chunk({'y': 256, 'x': 256})

# 4. Use with XarrayReader
with XarrayReader(stacked_bands) as src:
    tile = src.tile(x, y, z, tilesize=256)
```

**Completed in approximately 15-20 minutes ‚è±Ô∏è**

## What's Next?

In the next notebooks, we'll dive deeper into:

- **Notebook 2**: [Chunking Strategy Optimization with Sentinel-1 SAR](42_rio_tiler_s1_chunking.ipynb) - Systematic benchmarking of different chunk sizes
- **Notebook 3**: [Projections and TMS with Sentinel-3 OLCI](43_rio_tiler_s3_projections.ipynb) - Optimizing spatial reference systems for global datasets

### üí™ Try It Yourself

**Challenge 1**: Create a custom band combination (e.g., SWIR-NIR-Red for B12-B08-B04)

**Challenge 2**: Compare tile generation performance for different zoom levels

**Challenge 3**: Experiment with different chunk sizes by rechunking the dataset