<a href="https://jupyterhub.user.eopf.eodc.eu/hub/user-redirect/git-pull?repo=https://github.com/eopf-toolkit/eopf-101&branch=main&urlpath=lab/tree/eopf-101/41_rio_tiler_s2_fundamentals.ipynb" target="_blank">
  <button style="background-color:#0072ce; color:white; padding:0.6em 1.2em; font-size:1rem; border:none; border-radius:6px; margin-top:1em;">
    üöÄ Launch this notebook in JupyterLab
  </button>
</a>

## Introduction

This notebook demonstrates efficient tiling workflows with EOPF Zarr data using **rio-tiler** and **rio-xarray**. We'll showcase how direct Zarr access with proper chunking delivers superior performance for web mapping and visualization tasks.

**Rio-tiler** is a powerful Python library designed for creating map tiles from raster data sources. Combined with EOPF Zarr's cloud-optimized format, it enables efficient tile generation for web mapping applications without downloading entire datasets.

## What we will learn

- üó∫Ô∏è How to integrate rio-tiler with EOPF Zarr datasets
- üé® Generate map tiles (RGB and false color composites) from Sentinel-2 data
- üìä Understand the relationship between Zarr chunks and tile performance
- ‚ö° Observe memory usage patterns for large optical datasets
- üåç Create interactive web map visualizations

## Prerequisites

This tutorial builds on concepts from previous sections:
- [Understanding Zarr Structure](24_zarr_struct_S2L2A.ipynb) - Sentinel-2 data organization
- [STAC and xarray Tutorial](44_eopf_stac_xarray_tutorial.ipynb) - Accessing EOPF data
- [Zarr Chunking Strategies](sections/2x_about_eopf_zarr/253_zarr_chunking_practical.ipynb) - Chunking fundamentals

**Required packages**: `rio-tiler`, `rio-xarray`, `xarray`, `zarr`, `pystac-client`

<hr>

# Section 1: Direct Zarr Access Setup

We'll start by connecting to the EOPF STAC catalog and loading a Sentinel-2 L2A dataset with its native Zarr chunking configuration.

### Import libraries

In [1]:
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
from pystac_client import Client
from pystac import MediaType
from rio_tiler.io import XarrayReader
from rio_tiler.models import ImageData
import rioxarray
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries imported successfully")

‚úÖ Libraries imported successfully


### Connect to EOPF STAC Catalog

We'll search for a cloud-free Sentinel-2 L2A scene over a test region.

In [2]:
# Connect to EOPF STAC API
eopf_stac_api_root = "https://stac.core.eopf.eodc.eu/"
catalog = Client.open(url=eopf_stac_api_root)

# Search for Sentinel-2 L2A over Napoli during summer 2025
search_results = catalog.search(
    collections='sentinel-2-l2a',
    bbox=(14.268124, 40.835933, 14.433823, 40.898202),  # Napoli AOI
    datetime='2025-07-01T00:00:00Z/2025-08-31T23:59:59Z',  # Summer 2025
    max_items=1
)

# Get first item
items = list(search_results.items())
if not items:
    raise ValueError("No items found. Try adjusting the search parameters.")

item = items[0]
print(f"üì¶ Found item: {item.id}")
print(f"üìÖ Acquisition date: {item.properties.get('datetime', 'N/A')}")

üì¶ Found item: S2A_MSIL2A_20250829T100041_N0511_R122_T33TVF_20250829T121701
üìÖ Acquisition date: 2025-08-29T10:00:41.024000Z


### Open Zarr Dataset with xarray

We'll use xarray's `open_datatree()` to access the hierarchical EOPF Zarr structure directly from cloud storage.

In [3]:
# Get Zarr URL from STAC item
item_assets = item.get_assets(media_type=MediaType.ZARR)
zarr_url = item_assets['product'].href
print(f"üåê Zarr URL: {zarr_url}")

# Open with xarray DataTree
dt = xr.open_datatree(
    zarr_url,
    engine="zarr",
    chunks="auto"  # Use existing Zarr chunks
)

print("\nüìÇ Available groups in DataTree:")
for group in sorted(dt.groups):
    if dt[group].ds.data_vars:
        print(f"  {group}: {list(dt[group].ds.data_vars.keys())}")

üåê Zarr URL: https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202508-s02msil2a/29/products/cpm_v256/S2A_MSIL2A_20250829T100041_N0511_R122_T33TVF_20250829T121701.zarr

üìÇ Available groups in DataTree:
  /conditions/geometry: ['mean_sun_angles', 'mean_viewing_incidence_angles', 'sun_angles', 'viewing_incidence_angles']
  /conditions/mask/detector_footprint/r10m: ['b02', 'b03', 'b04', 'b08']
  /conditions/mask/detector_footprint/r20m: ['b05', 'b06', 'b07', 'b11', 'b12', 'b8a']
  /conditions/mask/detector_footprint/r60m: ['b01', 'b09', 'b10']
  /conditions/mask/l1c_classification/r60m: ['b00']
  /conditions/mask/l2a_classification/r20m: ['scl']
  /conditions/mask/l2a_classification/r60m: ['scl']
  /conditions/meteorology/cams: ['aod1240', 'aod469', 'aod550', 'aod670', 'aod865', 'bcaod550', 'duaod550', 'omaod550', 'ssaod550', 'suaod550', 'z']
  /conditions/meteorology/ecmwf: ['msl', 'r', 'tco3', 'tcwv', 'u10', 'v10']
  /measurements/reflectance/r10m: ['b02', 'b03', 'b04', 'b

### Explore Sentinel-2 Band Structure

Sentinel-2 L2A provides bands at three spatial resolutions:
- **10m**: B02 (Blue), B03 (Green), B04 (Red), B08 (NIR)
- **20m**: B05, B06, B07, B8A, B11, B12
- **60m**: B01, B09, B10

Let's examine the 10m resolution group, which we'll use for RGB visualization.

In [4]:
# Access 10m resolution bands
ds_10m = dt['/measurements/reflectance/r10m'].to_dataset()

print("\nüîç 10m Resolution Dataset:")
print(f"Dimensions: {dict(ds_10m.dims)}")
print(f"Bands: {list(ds_10m.data_vars.keys())}")
print(f"Coordinates: {list(ds_10m.coords.keys())}")

# Check chunking configuration
if 'b04' in ds_10m:
    chunks = ds_10m['b04'].chunks
    print(f"\nüì¶ Current chunk configuration: {chunks}")
    print(f"   Y-axis chunks: {chunks[0] if len(chunks) > 0 else 'N/A'}")
    print(f"   X-axis chunks: {chunks[1] if len(chunks) > 1 else 'N/A'}")


üîç 10m Resolution Dataset:
Dimensions: {'y': 10980, 'x': 10980}
Bands: ['b02', 'b03', 'b04', 'b08']
Coordinates: ['x', 'y']

üì¶ Current chunk configuration: ((4096, 4096, 2788), (4096, 4096, 2788))
   Y-axis chunks: (4096, 4096, 2788)
   X-axis chunks: (4096, 4096, 2788)


### Extract and Set CRS Information

EOPF Zarr stores CRS information in the root DataTree attributes under `other_metadata.horizontal_CRS_code`. We need to extract this and set it using rioxarray for rio-tiler compatibility.

In [5]:
# Extract CRS from EOPF metadata (following geozarr.py approach)
epsg_code_full = dt.attrs.get("other_metadata", {}).get("horizontal_CRS_code", "EPSG:4326")
epsg_code = epsg_code_full.split(":")[-1]  # Extract numeric part (e.g., "32632" from "EPSG:32632")

print(f"üìç Extracted CRS from EOPF metadata: EPSG:{epsg_code}")
print(f"   Full code: {epsg_code_full}")

# Set CRS on the dataset using rioxarray
ds_10m = ds_10m.rio.write_crs(f"epsg:{epsg_code}")

print(f"\n‚úÖ CRS set successfully on dataset")

üìç Extracted CRS from EOPF metadata: EPSG:32633
   Full code: EPSG:32633

‚úÖ CRS set successfully on dataset


In [6]:
# Verify CRS and geospatial metadata
crs = ds_10m.rio.crs
bounds = ds_10m.rio.bounds()
transform = ds_10m.rio.transform()

print(f"\nüåç Geospatial Metadata:")
print(f"   CRS: {crs}")
print(f"   EPSG Code: {crs.to_epsg()}")
print(f"   Bounds (left, bottom, right, top): {bounds}")
print(f"   Transform: {transform}")
print(f"\n   Width: {ds_10m.dims['x']} pixels")
print(f"   Height: {ds_10m.dims['y']} pixels")


üåç Geospatial Metadata:
   CRS: EPSG:32633
   EPSG Code: 32633
   Bounds (left, bottom, right, top): (399960.0, 4490220.0, 509760.0, 4600020.0)
   Transform: | 10.00, 0.00, 399960.00|
| 0.00,-10.00, 4600020.00|
| 0.00, 0.00, 1.00|

   Width: 10980 pixels
   Height: 10980 pixels


### Verify Geospatial Metadata

Now we can access CRS, bounds, and transform information through rioxarray. This is essential for rio-tiler integration.

# Section 2: Rio-tiler Integration Basics

Now we'll integrate rio-tiler to generate map tiles from our Zarr dataset.

### Setup XarrayReader for Multispectral Data

Rio-tiler's `XarrayReader` allows us to treat xarray datasets as tile sources.

In [7]:
# Verify dataset is ready for rio-tiler
# CRS should already be set from previous step
if ds_10m.rio.crs is None:
    raise ValueError("CRS not set! Check previous steps.")

print("‚úÖ Dataset prepared for rio-tiler")
print(f"   CRS: {ds_10m.rio.crs}")
print(f"   Available bands: {list(ds_10m.data_vars.keys())}")

‚úÖ Dataset prepared for rio-tiler
   CRS: EPSG:32633
   Available bands: ['b02', 'b03', 'b04', 'b08']


### Generate True Color RGB Tile

We'll create a Web Mercator tile (zoom 12) showing true color composite (B04-Red, B03-Green, B02-Blue).

In [8]:
# Create RGB composite using XarrayReader
with XarrayReader(ds_10m) as src:
    # Get dataset info
    print(f"\nüìä Dataset Info:")
    print(f"   CRS: {src.crs}")
    print(f"   Bounds: {src.bounds}")
    print(f"   Available bands: {list(ds_10m.data_vars.keys())}")
    
    # Read RGB bands (B04=Red, B03=Green, B02=Blue)
    # Using part() to read a geographic bbox
    rgb_data = src.part(
        bounds=src.bounds,
        max_size=1024,  # Maximum dimension
        bands=['b04', 'b03', 'b02'],  # RGB order,
        bbox=(14.219686024247595, 40.81406906961218, 14.322682850419474, 40.866016421491814)
    )

print(f"\n‚úÖ RGB tile generated:")
print(f"   Shape: {rgb_data.data.shape}")
print(f"   Data type: {rgb_data.data.dtype}")


üìä Dataset Info:
   CRS: EPSG:32633
   Bounds: (399960.0, 4490220.0, 509760.0, 4600020.0)
   Available bands: ['b02', 'b03', 'b04', 'b08']


AttributeError: 'Dataset' object has no attribute 'name'

### Visualize True Color Composite

Let's visualize the RGB composite with histogram stretching for better contrast.

In [None]:
# Apply simple contrast stretch (2% linear stretch)
def stretch_rgb(data, lower_percentile=2, upper_percentile=98):
    """Apply percentile stretch to RGB data."""
    stretched = np.zeros_like(data, dtype=np.uint8)
    for i in range(3):
        band = data[i]
        p_low, p_high = np.percentile(band[band > 0], [lower_percentile, upper_percentile])
        band_stretched = np.clip((band - p_low) / (p_high - p_low) * 255, 0, 255)
        stretched[i] = band_stretched.astype(np.uint8)
    return stretched

# Stretch and visualize
rgb_stretched = stretch_rgb(rgb_data.data)

plt.figure(figsize=(12, 10))
plt.imshow(np.transpose(rgb_stretched, (1, 2, 0)))
plt.title('Sentinel-2 L2A True Color Composite (B04-B03-B02)', fontsize=14, fontweight='bold')
plt.xlabel('X-coordinate')
plt.ylabel('Y-coordinate')
plt.grid(False)
plt.tight_layout()
plt.show()

### Generate False Color Composite for Vegetation

False color composite using NIR-Red-Green (B08-B04-B03) highlights vegetation in red tones.

In [None]:
# Generate false color composite (NIR-Red-Green)
with XarrayReader(ds_10m) as src:
    false_color_data = src.part(
        bounds=src.bounds,
        max_size=1024,
        bands=['b08', 'b04', 'b03']  # NIR-Red-Green
    )

# Stretch and visualize
false_color_stretched = stretch_rgb(false_color_data.data)

plt.figure(figsize=(12, 10))
plt.imshow(np.transpose(false_color_stretched, (1, 2, 0)))
plt.title('False Color Composite for Vegetation (B08-B04-B03)', fontsize=14, fontweight='bold')
plt.xlabel('X-coordinate')
plt.ylabel('Y-coordinate')
plt.grid(False)
plt.tight_layout()
plt.show()

print("\nüåø In this visualization:")
print("   - Bright red areas = Dense, healthy vegetation")
print("   - Dark red/brown = Sparse vegetation or bare soil")
print("   - Blue tones = Urban areas, water bodies")

# Section 3: Understanding the Data Flow

Let's examine how Zarr chunks relate to tile generation and performance.

### Chunk to Tile Relationship

Understanding how tiles map to Zarr chunks is crucial for optimization.

In [None]:
# Analyze chunk configuration
band_b04 = ds_10m['b04']

print("\nüì¶ Zarr Chunk Analysis:")
print(f"   Full array shape: {band_b04.shape}")
print(f"   Chunk shape: {band_b04.chunks}")
print(f"   Number of chunks: {band_b04.data.npartitions if hasattr(band_b04.data, 'npartitions') else 'N/A'}")
print(f"   Data type: {band_b04.dtype}")

# Calculate chunk size in MB
if band_b04.chunks:
    chunk_y, chunk_x = band_b04.chunks[0][0], band_b04.chunks[1][0]
    chunk_size_mb = (chunk_y * chunk_x * band_b04.dtype.itemsize) / (1024 * 1024)
    print(f"   Single chunk size: ~{chunk_size_mb:.2f} MB")
    
    # Estimate chunks needed for a 256x256 tile at zoom 12
    tile_size = 256
    chunks_per_tile = np.ceil(tile_size / chunk_x) * np.ceil(tile_size / chunk_y)
    print(f"\nüó∫Ô∏è For a {tile_size}x{tile_size} tile:")
    print(f"   Estimated chunks accessed: ~{int(chunks_per_tile)}")
    print(f"   Data transferred: ~{chunk_size_mb * chunks_per_tile:.2f} MB")

### Memory Usage Patterns

Let's observe memory usage when accessing data at different scales.

In [None]:
import psutil
import os

def get_memory_usage():
    """Get current process memory usage in MB."""
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / (1024 * 1024)

# Test different tile sizes
tile_sizes = [256, 512, 1024]
memory_usage = []

print("\nüíæ Memory Usage by Tile Size:")
for size in tile_sizes:
    mem_before = get_memory_usage()
    
    with XarrayReader(ds_10m) as src:
        tile = src.part(
            bounds=src.bounds,
            max_size=size,
            bands=['b04', 'b03', 'b02']
        )
    
    mem_after = get_memory_usage()
    mem_delta = mem_after - mem_before
    memory_usage.append(mem_delta)
    
    print(f"   {size}x{size} tile: {mem_delta:>6.2f} MB")

# Visualize memory usage
plt.figure(figsize=(10, 6))
plt.bar([str(s) for s in tile_sizes], memory_usage, color='steelblue', alpha=0.7)
plt.xlabel('Tile Size (pixels)', fontsize=12)
plt.ylabel('Memory Delta (MB)', fontsize=12)
plt.title('Memory Usage by Tile Size', fontsize=14, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

### Handling Multi-Resolution Bands

Sentinel-2 has bands at different resolutions. Let's compare performance across resolutions.

In [None]:
import time

# Load different resolution groups and set CRS
resolutions = {
    '10m': dt['/measurements/reflectance/r10m'].to_dataset(),
    '20m': dt['/measurements/reflectance/r20m'].to_dataset(),
    '60m': dt['/measurements/reflectance/r60m'].to_dataset()
}

# Set CRS on all resolution datasets
for res_name, ds_res in resolutions.items():
    resolutions[res_name] = ds_res.rio.write_crs(f"epsg:{epsg_code}")

print("\n‚ö° Performance by Resolution:")
print(f"{'Resolution':<12} {'Dimensions':<20} {'Chunk Size':<20} {'Read Time (s)'}")
print("=" * 75)

for res_name, ds_res in resolutions.items():
    # Get first band for testing
    first_band = list(ds_res.data_vars.keys())[0]
    band_data = ds_res[first_band]
    
    # Time a small read operation
    start = time.time()
    _ = band_data.isel(x=slice(0, 500), y=slice(0, 500)).values
    read_time = time.time() - start
    
    dims = f"{band_data.shape[0]}x{band_data.shape[1]}"
    chunks = f"{band_data.chunks[0][0]}x{band_data.chunks[1][0]}" if band_data.chunks else "N/A"
    
    print(f"{res_name:<12} {dims:<20} {chunks:<20} {read_time:.3f}")

print("\nüí° Key Insights:")
print("   - Larger chunks = fewer HTTP requests but more data transfer")
print("   - Chunk size should align with typical access patterns (tiles)")
print("   - EOPF Zarr uses optimized chunks for efficient access")

### Performance Summary

Let's visualize the chunk-to-tile alignment.

In [None]:
# Create visualization of chunk vs tile coverage
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Optimal alignment (chunk size = tile size)
ax1 = axes[0]
tile_grid = np.zeros((8, 8))
tile_grid[2:6, 2:6] = 1  # Tile coverage
ax1.imshow(tile_grid, cmap='RdYlGn', alpha=0.7)
ax1.set_title('Optimal: 1 Chunk per Tile', fontsize=12, fontweight='bold')
ax1.set_xlabel('Chunks aligned with tiles')
ax1.grid(False)

# Suboptimal alignment (mismatched sizes)
ax2 = axes[1]
suboptimal_grid = np.zeros((8, 8))
suboptimal_grid[1:7, 1:7] = 0.5  # Multiple chunks needed
suboptimal_grid[2:6, 2:6] = 1  # Tile coverage
ax2.imshow(suboptimal_grid, cmap='RdYlGn', alpha=0.7)
ax2.set_title('Suboptimal: Multiple Chunks per Tile', fontsize=12, fontweight='bold')
ax2.set_xlabel('Chunks not aligned with tiles')
ax2.grid(False)

plt.tight_layout()
plt.show()

print("\nüéØ Best Practices:")
print("   ‚úì Match chunk size to tile size (e.g., 256x256, 512x512)")
print("   ‚úì Use consolidated metadata (.zmetadata) to reduce requests")
print("   ‚úì Consider zoom levels when choosing chunk sizes")
print("   ‚úì Align chunks to power-of-2 boundaries for web mercator")

## Conclusion

In this notebook, we've learned:

1. ‚úÖ How to integrate **rio-tiler** with EOPF Zarr datasets
2. ‚úÖ Generated RGB and false color map tiles from Sentinel-2 data
3. ‚úÖ Understood the critical relationship between Zarr chunks and tile performance
4. ‚úÖ Observed memory usage patterns for different tile sizes
5. ‚úÖ Learned about multi-resolution band handling strategies

### Key Takeaways

- **Chunk alignment matters**: Optimal performance requires matching chunk sizes to tile access patterns
- **EOPF Zarr is pre-optimized**: Default chunking (1830√ó1830 for 10m bands) balances storage and access efficiency
- **Memory scales with tile size**: Larger tiles require proportionally more memory
- **Rio-tiler simplifies tiling**: XarrayReader provides a clean interface for tile generation

**Completed in approximately 15-20 minutes ‚è±Ô∏è**

## What's Next?

In the next notebooks, we'll dive deeper into:

- **Notebook 2**: [Chunking Strategy Optimization with Sentinel-1 SAR](42_rio_tiler_s1_chunking.ipynb) - Systematic benchmarking of different chunk sizes
- **Notebook 3**: [Projections and TMS with Sentinel-3 OLCI](43_rio_tiler_s3_projections.ipynb) - Optimizing spatial reference systems for global datasets

### üí™ Try It Yourself

**Challenge 1**: Create a custom band combination (e.g., SWIR-NIR-Red for B12-B08-B04)

**Challenge 2**: Compare tile generation performance for different zoom levels

**Challenge 3**: Experiment with different chunk sizes by rechunking the dataset