## Introduction

## Seeds

## 3 Zarr Optimize Algorithm

### 3.1 Description

The idea of this algorithm is to exploit the chunking of the cloud-native Zarr format in order to load only those parts of the Sentinel-2 scenes that are actually needed.

1. First, we create a 10 km × 10 km grid (EPSG:5325) that covers the whole of Iceland.  
2. Then we load the seeds (GeoJSON points) and mark every grid cell as a *candidate* that contains at least one seed point.  
3. For each *candidate* 10 km × 10 km cell, we query the Sentinel-2 L2A collection via [STAC](https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items) to find all images in the given month whose footprint intersects the cell.  
4. We then use the chunking of the Zarr format to extract only those chunks whose spatial extent overlaps the *candidate* cell and reproject to EPSG:5325.  
5. From all extracted data within the 10 km × 10 km *candidate* cell, we compute the median per pixel and per band (to reduce measurement noise and cloud contamination).  
6. Based on these per-pixel, per-band medians, we calculate the Normalized Difference Snow Index (NDSI).  
7. Every pixel with NDSI > 0.42 is classified as snow/ice.  
8. If more than 30% of the pixels within a 10 km × 10 km *candidate* cell are classified as snow/ice, all 4-neighbouring 10 km × 10 km cells (that have not yet been processed) are also marked as *candidate* cells.  
9. The algorithm repeats steps 3 to 8 until no additional *candidate* cells remain.


### 3.2 Implementation

#### 3.2.1 Step 0: Imports and Configuration

In [None]:
# Standard libraries
import warnings
warnings.filterwarnings('ignore')

# Geospatial libraries
import geopandas as gpd
from shapely.geometry import box, Point
import pyproj

# Data processing
import xarray as xr
import numpy as np
import pandas as pd

# STAC API for Sentinel-2 data
from pystac_client import Client

# Visualization
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import folium
from folium import plugins

# Configuration
EPSG_ICELAND = 5325  # ISN2004 / Lambert 2004
BOUNDING_BOX_ICELAND = [1400000,100000,2000000,500000]
GRID_SIZE = 10000  # 10 km in meters
NDSI_THRESHOLD = 0.42
SNOW_PERCENTAGE_THRESHOLD = 0.30

print("All libraries successfully imported")

#### 3.2.2 Step 1: Load and Visualize Seeds

The seeds are glacier polygons from CORINE Land Cover (CLC). We load the seed points from the GeoJSON file and display an initial overview.

In [None]:
# Load seeds
seeds = gpd.read_file('data/Iceland_Seeds.geojson')

# Check CRS and transform to EPSG:5325 if necessary
print(f"Original CRS: {seeds.crs}")
if seeds.crs.to_epsg() != EPSG_ICELAND:
    seeds = seeds.to_crs(epsg=EPSG_ICELAND)
    print(f"Transformed to EPSG:{EPSG_ICELAND}")

# Overview of the seeds
print(f"\nNumber of seed points: {len(seeds)}")
print(f"\nFirst 3 seeds:")
print(seeds.head(3))

# Get bounds of the seeds
bounds = seeds.total_bounds
print(f"\nBounding box of seeds (EPSG:{EPSG_ICELAND}):")
print(f"   X: [{bounds[0]:.0f}, {bounds[2]:.0f}]")
print(f"   Y: [{bounds[1]:.0f}, {bounds[3]:.0f}]")

Visualization of seed points:

In [None]:
# Interactive visualization of the seeds with satellite imagery
seeds_wgs84 = seeds.to_crs(epsg=4326)

# Calculate center of Iceland for map
center_lat = seeds_wgs84.geometry.y.mean()
center_lon = seeds_wgs84.geometry.x.mean()

# Create interactive map with satellite imagery
map_seeds = folium.Map(
    location=[center_lat, center_lon],
    zoom_start=6,
    tiles=None
)

# Add Esri World Imagery
folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
    attr='Esri World Imagery',
    name='Satellite Imagery',
    overlay=False,
    control=True
).add_to(map_seeds)

# Add seed points to map
for idx, row in seeds_wgs84.iterrows():
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=6,
        color='red',
        fill=True,
        fillColor='red',
        fillOpacity=0.5,
        popup=f"Seed {idx}",
        tooltip="Glacier Seed"
    ).add_to(map_seeds)

# Add layer control
folium.LayerControl().add_to(map_seeds)

print(f"Step 1 completed: {len(seeds)} seeds loaded and visualized")
print("Interactive map created with satellite imagery background")

# Display map
map_seeds

#### 3.2.3 Step 2: Create 10 km × 10 km Grid

We create a regular 10 km × 10 km grid that covers all of Iceland, and mark every grid cell as a *candidate* that contains at least one seed point.

In [None]:
def create_grid(bounds, grid_size):
    """
    Creates a regular grid with the given cell size and bounding box.
    
    Parameters:
    -----------
    bounds : tuple
        (xmin, ymin, xmax, ymax) in meters
    grid_size : int
        Size of a grid cell in meters
        
    Returns:
    --------
    gpd.GeoDataFrame with grid cells
    """
    xmin, ymin, xmax, ymax = bounds
    
    # Round the grid boundaries to multiples of grid_size
    xmin = np.floor(xmin / grid_size) * grid_size
    ymin = np.floor(ymin / grid_size) * grid_size
    xmax = np.ceil(xmax / grid_size) * grid_size
    ymax = np.ceil(ymax / grid_size) * grid_size
    
    # Create grid cells
    grid_cells = []
    grid_ids = []
    cell_id = 0
    
    y = ymin
    while y < ymax:
        x = xmin
        while x < xmax:
            # Create a box per grid cell
            cell = box(x, y, x + grid_size, y + grid_size)
            grid_cells.append(cell)
            grid_ids.append(cell_id)
            cell_id += 1
            x += grid_size
        y += grid_size
    
    # Create GeoDataFrame
    grid = gpd.GeoDataFrame({
        'cell_id': grid_ids,
        'geometry': grid_cells
    }, crs=f"EPSG:{EPSG_ICELAND}")
    
    return grid

# Create grid
grid = create_grid(BOUNDING_BOX_ICELAND, GRID_SIZE)

print(f"Grid created:")
print(f"Number of grid cells: {len(grid)}")
print(f"Cell size: {GRID_SIZE/1000} km × {GRID_SIZE/1000} km")

Now we mark every cell, which contains a seed, as a candidate

In [None]:
def mark_candidate_cells(grid, seeds):
    """
    Marks grid cells that contain at least one seed.
    
    Parameters:
    -----------
    grid : gpd.GeoDataFrame
        GeoDataFrame with grid cells
    seeds : gpd.GeoDataFrame
        GeoDataFrame with seed points
        
    Returns:
    --------
    gpd.GeoDataFrame with additional columns:
        - is_candidate: bool indicating if the cell contains seeds
        - seed_count: number of seeds in the cell
    """
    # Initialize all cells as non-candidates
    grid['is_candidate'] = False
    grid['seed_count'] = 0
    
    # Spatial join: which grid cells contain which seeds?
    joined = gpd.sjoin(grid, seeds, how='inner', predicate='contains')
    
    # Count seeds per grid cell (only for cells with seeds)
    if len(joined) > 0:
        candidate_counts = joined.groupby('cell_id').size()
        
        # Mark only cells that contain seeds as candidates
        grid.loc[grid['cell_id'].isin(candidate_counts.index), 'is_candidate'] = True
        grid.loc[grid['cell_id'].isin(candidate_counts.index), 'seed_count'] = \
            grid.loc[grid['cell_id'].isin(candidate_counts.index), 'cell_id'].map(candidate_counts)
    
    return grid

grid = mark_candidate_cells(grid, seeds)

# Statistics
n_candidates = grid['is_candidate'].sum()
print(f"\n Grid statistics:")
print(f"   Total grid cells: {len(grid)}")
print(f"   Candidate cells: {n_candidates}")
print(f"   Seeds per candidate (average): {grid[grid['is_candidate']]['seed_count'].mean():.1f}")

Then we create an interactive visualization, which displays the Seeds, Grid, candidate cells and a background satellite images:

In [None]:
# Interactive visualization: Grid with candidate cells on satellite imagery

# Transform to WGS84 for mapping
grid_wgs84 = grid.to_crs(epsg=4326)
candidates_wgs84 = grid_wgs84[grid_wgs84['is_candidate']]
non_candidates_wgs84 = grid_wgs84[~grid_wgs84['is_candidate']]
seeds_wgs84 = seeds.to_crs(epsg=4326)

# Calculate center for map
center_lat = seeds_wgs84.geometry.y.mean()
center_lon = seeds_wgs84.geometry.x.mean()

# Create interactive map with satellite imagery
map_grid = folium.Map(
    location=[center_lat, center_lon],
    zoom_start=6,
    tiles=None
)

# Add Esri World Imagery
folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
    attr='Esri World Imagery',
    name='Satellite Imagery',
    overlay=False,
    control=True
).add_to(map_grid)

# Add ALL grid cells
folium.GeoJson(
    non_candidates_wgs84,
    name='All Grid Cells',
    style_function=lambda x: {
        'fillColor': 'lightblue',
        'color': 'lightgray',
        'weight': 0.5,
        'fillOpacity': 0.1,
        'opacity': 0.4
    },
    tooltip=folium.GeoJsonTooltip(
        fields=['cell_id'],
        aliases=['Cell ID:'],
        localize=True
    )
).add_to(map_grid)

# Add candidate cells
folium.GeoJson(
    candidates_wgs84[['cell_id', 'seed_count', 'geometry']],
    name='Candidate Cells',
    style_function=lambda x: {
        'fillColor': 'blue',
        'color': 'darkblue',
        'weight': 2.5,
        'fillOpacity': 0.4,
        'opacity': 0.7
    },
    tooltip=folium.GeoJsonTooltip(
        fields=['cell_id', 'seed_count'],
        aliases=['Cell ID:', 'Seeds:'],
        localize=True
    ),
    popup=folium.GeoJsonPopup(
        fields=['cell_id', 'seed_count'],
        aliases=['Cell ID:', 'Number of Seeds:']
    )
).add_to(map_grid)

# Add seed points as feature group
seed_group = folium.FeatureGroup(name='Glacier Seeds')
for idx, row in seeds_wgs84.iterrows():
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=3,
        color='red',
        fill=True,
        fillColor='red',
        fillOpacity=0.7,
        popup=f"<b>Seed {idx}</b>",
        tooltip="Glacier Seed"
    ).add_to(seed_group)

seed_group.add_to(map_grid)

# Add fullscreen button
plugins.Fullscreen().add_to(map_grid)

# Add layer control (allows toggling layers on/off)
folium.LayerControl(collapsed=False).add_to(map_grid)

print(f"Step 2 completed: Grid with {n_candidates} candidate cells created")
print(f"Interactive map created with {len(grid)} grid cells ({n_candidates} highlighted as candidates)")
print("Tip: Use the layer control in the top right to toggle layers on/off")

# Display map
map_grid

#### Step 3: STAC API Query for Sentinel-2 L2A

We connect to the EOPF STAC Catalog and search for Sentinel-2 L2A scenes that cover a test candidate cell for a specific month.

In [None]:
# Select a test candidate cell
test_cell = grid[grid['is_candidate']].iloc[0]

print("Test candidate cell selected:")
print(f"Cell ID: {test_cell['cell_id']}")
print(f"Number of seeds: {test_cell['seed_count']}")

# Bounds of the test cell in EPSG:5325
cell_bounds = test_cell.geometry.bounds  # (minx, miny, maxx, maxy)
print(f"\nCell bounds (EPSG:{EPSG_ICELAND}):")
print(f"X: [{cell_bounds[0]:.0f}, {cell_bounds[2]:.0f}]")
print(f"Y: [{cell_bounds[1]:.0f}, {cell_bounds[3]:.0f}]")

In [None]:
# Transform bounds to WGS84 for STAC query
cell_gdf = gpd.GeoDataFrame([test_cell], crs=f"EPSG:{EPSG_ICELAND}")
cell_wgs84 = cell_gdf.to_crs(epsg=4326)
bbox_wgs84 = cell_wgs84.total_bounds  # (minx, miny, maxx, maxy)

print(f"\nCell bounds (WGS84 for STAC query):")
print(f"Lon: [{bbox_wgs84[0]:.4f}, {bbox_wgs84[2]:.4f}]")
print(f"Lat: [{bbox_wgs84[1]:.4f}, {bbox_wgs84[3]:.4f}]")

# Test time period (e.g., July 2025 - summer)
test_date_start = "2025-07-01"
test_date_end = "2025-07-31"

print(f"\nTest period: {test_date_start} to {test_date_end}")

In [None]:
# Connect to EOPF STAC Catalog
STAC_URL = "https://stac.core.eopf.eodc.eu"

print(f"Connecting to EOPF STAC Catalog: {STAC_URL}")
catalog = Client.open(STAC_URL)

# Search for Sentinel-2 L2A scenes
print(f"\nSearching for Sentinel-2 L2A scenes...")
search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=bbox_wgs84,
    datetime=[test_date_start, test_date_end]
)

# Collect results
items = list(search.items())

print(f"Scenes found: {len(items)}")
if len(items) > 0:
    print(f"\nFirst scene:")
    first_item = items[0]
    print(f"ID: {first_item.id}")
    print(f"Date: {first_item.datetime}")
    print(f"Cloud Cover: {first_item.properties.get('eo:cloud_cover', 'N/A')}%")
else:
    print("No scenes found. Try a different time period.")

## Sentinel Native Algorithm

## Algorithm Comparison

### Benchmark Zaar Optimize Algorithm

### Benchmark Sentinel Native Algorithm

### Conclusion

## Temporal Glaciar Analysis