---
title: Pangeo X EarthCODE
subtitle: D.03.10 HANDS-ON TRAINING - EarthCODE 101 Hands-On Workshop - Example showing how to access data on the EarthCODE Open Science Catalog and working with the Pangeo ecosystem on EDC
authors:
  - name: Deyan Samardzhiev
    github: sunnydean
    orcid: 0009-0003-3803-8522
    affiliations:
      - id: Lampata UK
        institution: Lampata UK
reviewers:
  - name: Anne Fouilloux
    orcid: 0000-0002-1784-2920
    github: annefou
    affiliations:
      - id: Simula Research Laboratory
        institution: Simula Research Laboratory
        ror: 00vn06n10
date: 2025-06-01
thumbnail: https://raw.githubusercontent.com/ESA-EarthCODE/documentation/refs/heads/main/pages/public/img/EarthCODE_kv_transparent.png
keywords: ["earthcode", "pangeo", "stac", "xarray", "earth observation", "remote sensing"]
tags: ["pangeo"]
releaseDate: 2025-06-01
datePublished: 2025-06-01
dateModified: 2025-06-01
banner: ../static/PANGEO.png
github: https://github.com/sunnydean/LPS25_Pangeo_x_EarthCODE_Workshop
license: MIT
---

In [None]:
# # remove for EDC actual version, just for local testing
# from distributed.client import _global_clients
# for client in list(_global_clients.values()):
#     client.close()



## Table of Content
```{contents}
:depth: 1
```

## Context
We will be using the [Pangeo](https://pangeo.io/) open-source software stack to demonstrate how to fetch EarthCODE published data and publically available satellite Sentinel-2 data to generate burn severity maps for the assessment of the areas affected by wildfires.

### Methodology approach
* Access Sentinel-2 L2A cloud optimised dataset through STAC
* Compute the Normalised Burn Ratio (NBR) index to highlight burned areas
* Classify burn severity

### Highlights
* The NBR index uses near-infrared (NIR) and shortwave-infrared (SWIR) wavelengths.


## Data
We will use Sentinel-2 data accessed via [element84's STAC API](https://element84.com/earth-search/) endpoint and the [SeasFire Data Cube](https://opensciencedata.esa.int/products/seasfire-cube/collection) to find burned areas, inspect them in more detail and generate burn severity maps for the assessment of the areas affected by wildfires.



#### Related publications
* https://www.sciencedirect.com/science/article/pii/S1470160X22004708#f0035
* https://github.com/yobimania/dea-notebooks/blob/e0ca59f437395f7c9becca74badcf8c49da6ee90/Fire%20Analysis%20Compiled%20Scripts%20(Gadi)/dNBR_full.py
* *Alonso, Lazaro, Gans, Fabian, Karasante, Ilektra, Ahuja, Akanksha, Prapas, Ioannis, Kondylatos, Spyros, Papoutsis, Ioannis, Panagiotou, Eleannna, Michail, Dimitrios, Cremer, Felix, Weber, Ulrich, & Carvalhais, Nuno. (2022). SeasFire Cube: A Global Dataset for Seasonal Fire Modeling in the Earth System (0.4) [Data set]. Zenodo. @alonso-2024. The same dataset can also be downloaded from Zenodo: https://zenodo.org/records/13834057*
* *https://registry.opendata.aws/sentinel-2-l2a-cogs/*










### Packages

As best practices dictate, we recommend that you install and import all the necessary libraries at the top of your Jupyter notebook.

In [None]:
import json
import xarray
import rasterio

from datetime import datetime
from datetime import timedelta

import numpy as np
import pandas as pd
import geopandas as gpd

import hvplot.xarray
import dask.distributed

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from shapely import geometry

from pystac_client import Client as pystac_client
from odc.stac import configure_rio, stac_load

import os
import xrlint.all as xrl
from xcube.core.verify import assert_cube


# Startup your Dask Cluster

Create dask cluster as described in the [dask 101 guide](../pangeo101/dask101.ipynb)

In [None]:
from dask_gateway import Gateway
gateway = Gateway()
cluster = gateway.new_cluster()
cluster.scale(2)
client = cluster.get_client()
client

In [None]:
# or create a local dask cluster on a local machine.

# from dask.distributed import Client
# client = Client()   
# client

# Load the Data

In [None]:
http_url = "https://s3.waw4-1.cloudferro.com/EarthCODE/OSCAssets/seasfire/seasfire_v0.4.zarr/"

ds = xarray.open_dataset(
	http_url,
	engine='zarr',
    chunks={},
	consolidated=True
	# storage_options = {'token': 'anon'}
)
ds

# Forest Fires Search

Search for a forest fire in Europe over the last couple of years using the Burned Areas data from Global Wildfire Information System (GWIS)


In [None]:
gwis = ds.gwis_ba
gwis

We load our area of interest file from the previous stage

In [None]:
with open("../aoi/feature.json") as f:
    feature = json.load(f)

poly = geometry.shape(feature["geometry"])
bbox = list(poly.bounds)

polygon = gpd.GeoDataFrame(index=[0], crs="epsg:4326", geometry=[geometry.box(*bbox)])
min_lon, min_lat, max_lon, max_lat = polygon.total_bounds

lat_start = gwis.latitude.sel(latitude=max_lat, method="nearest").item()
lat_stop  = gwis.latitude.sel(latitude=min_lat, method="nearest").item()
lon_start = gwis.longitude.sel(longitude=min_lon, method="nearest").item()
lon_stop  = gwis.longitude.sel(longitude=max_lon, method="nearest").item()

# Masking

To extract only the relevant land data we apply the land mask supplied in the dataset

In [None]:
## in SeasFire Datacube v2.0 the burned area variables would have the water bodies masked with ERA-5 land sea mask 
mask= ds['lsm']
mask

Now after loading the data we need, we will find the date of the biggest forest fire during the last three years within that place

In [None]:
gwis_aoi = gwis.sel(time=slice('2018-01-01','2021-01-01'), latitude=slice(lat_start, lat_stop),longitude=slice(lon_start, lon_stop))

In [None]:
# date where the sum in the plot is the highest
date_max_fire = gwis_aoi.sum(dim={'latitude','longitude'}).idxmax(dim='time').compute()
date_max_fire

In [None]:
biggest_fire_aoi = gwis_aoi.sel(time=date_max_fire)
biggest_fire_aoi

Plot the forest fire areas to get an idea about our data

In [None]:
# gwis_all=gwis.resample(time="1Y").sum()
biggest_fire_aoi = biggest_fire_aoi.where(mask)
biggest_fire_aoi.plot()


In the Pangeo stack there are visualization tools that can help us easily plot data in a more interactive way, with a simple interface, e.g. hvplot

In [None]:

# Plot it interactively, with some context
biggest_fire_aoi.hvplot(
    x='longitude',
    y='latitude',
    cmap='viridis',
    colorbar=True,
    frame_width=600,
    frame_height=400,
    geo=True,
    tiles='OSM'
)


In [None]:
fire_date_t = pd.to_datetime(date_max_fire.values.item()) # get the date of the forest fire and a the dates before and after it
week_before = (fire_date_t - timedelta(days=7))
week_after = (fire_date_t + timedelta(days=7))

In [None]:
print(week_before.date(), "---" , week_after.date())

In [None]:
index_name = 'NBR'

bandnames_dict = {
    'nir': 'nir',
    'swir22': 'swir22'
}

# Normalised Burn Ratio, Lopez Garcia 1991
def calc_nbr(ds):
    return (ds.nir - ds.swir22) / (ds.nir + ds.swir22)

index_dict = {'NBR': calc_nbr}
index_dict

In [None]:
catalog = pystac_client.open("https://earth-search.aws.element84.com/v1")
chunk={}

In [None]:
week_before_start = (week_before - timedelta(days=30))
time_range = str(week_before_start.date()) + "/" + str(week_before.date())

query1 = catalog.search(
    collections=["sentinel-2-l2a"], datetime=time_range, limit=100,
    bbox=bbox, query={"eo:cloud_cover": {"lt": 0.5}}
)

items = list(query1.items())
print(f"Found: {len(items):d} datasets")

items_pre = min(items, key=lambda item: item.properties["eo:cloud_cover"])

prefire_ds = stac_load(
    [items_pre],
    bands=("nir", "swir22"),
    chunks=chunk,  # <-- use Dask
    groupby="datetime",
    bbox=bbox,
)
prefire_ds

In [None]:
week_after_end = (week_after + timedelta(days=30))
time_range = str(week_after.date()) + "/" + str(week_after_end.date())

query2 = catalog.search(
    collections=["sentinel-2-l2a"], datetime=time_range, limit=100,
    bbox=bbox, query={"eo:cloud_cover": {"lt": 0.5}}
)

items = list(query2.items())
print(f"Found: {len(items):d} datasets")

items_post = min(items, key=lambda item: item.properties["eo:cloud_cover"])

postfire_ds = stac_load(
    [items_post],
    bands=("nir", "swir22"),
    chunks=chunk,  # <-- use Dask
    groupby="datetime",
    bbox=bbox,
)
postfire_ds

In [None]:
# Rename bands in dataset to use simple names 
bands_to_rename = {
    a: b for a, b in bandnames_dict.items() if a in prefire_ds.variables
}

# prefire
prefire_ds[index_name] = index_dict[index_name](prefire_ds.rename(bands_to_rename) / 10000.0)

# postfire
postfire_ds[index_name] = index_dict[index_name](postfire_ds.rename(bands_to_rename) / 10000.0)


In [None]:
# calculate delta NBR
prefire_burnratio = prefire_ds.NBR.isel(time=0)
postfire_burnratio = postfire_ds.NBR.isel(time=0)

delta_NBR = prefire_burnratio - postfire_burnratio

dnbr_dataset = delta_NBR.to_dataset(name='delta_NBR').persist() # <--- load and keep data into your workers

In [None]:
dnbr_dataset
delta_NBR

In [None]:
fig = plt.figure(1, figsize=[7, 10])

# We're using cartopy and are plotting in PlateCarree projection 
# (see documentation on cartopy)
ax = plt.subplot(1, 1, 1, projection=ccrs.PlateCarree())
ax.coastlines(resolution='10m')
ax.gridlines(draw_labels=True)

# We need to project our data to the new Orthographic projection and for this we use `transform`.
# we set the original data projection in transform (here Mercator)
prefire_burnratio.plot(ax=ax, transform=ccrs.epsg(prefire_burnratio.spatial_ref.values), cmap='RdBu_r',
                       cbar_kwargs={'orientation':'horizontal','shrink':0.95})

# One way to customize your title
plt.title( pd.to_datetime(prefire_burnratio.time.values.item()).strftime("%d %B %Y"), fontsize=18)

In [None]:
fig = plt.figure(1, figsize=[7, 9])

# We're using cartopy and are plotting in PlateCarree projection 
# (see documentation on cartopy)
ax = plt.subplot(1, 1, 1, projection=ccrs.PlateCarree())
ax.coastlines(resolution='10m')
ax.gridlines(draw_labels=True)

# We need to project our data to the new Orthographic projection and for this we use `transform`.
# we set the original data projection in transform (here Mercator)
postfire_burnratio.plot(ax=ax, transform=ccrs.epsg(postfire_burnratio.spatial_ref.values), cmap='RdBu_r',
                        cbar_kwargs={'orientation':'horizontal','shrink':0.95})

# One way to customize your title
plt.title( pd.to_datetime(postfire_burnratio.time.values.item()).strftime("%d %B %Y"), fontsize=18)

In [None]:
fig = plt.figure(1, figsize=[7, 10])

# We're using cartopy and are plotting in PlateCarree projection 
# (see documentation on cartopy)
ax = plt.subplot(1, 1, 1, projection=ccrs.PlateCarree())
ax.coastlines(resolution='10m')
ax.gridlines(draw_labels=True)

# We need to project our data to the new Orthographic projection and for this we use `transform`.
# we set the original data projection in transform (here Mercator)
dnbr_dataset.delta_NBR.plot(ax=ax, transform=ccrs.epsg(dnbr_dataset.delta_NBR.spatial_ref.values), cmap='RdBu_r',
                            cbar_kwargs={'orientation':'horizontal','shrink':0.95})

# One way to customize your title
plt.title( "Delta NBR", fontsize=18)

https://un-spider.org/advisory-support/recommended-practices/recommended-practice-burn-severity/in-detail/normalized-burn-ratio

![img](https://un-spider.org/sites/default/files/table+legend.PNG)

In [None]:
BURN_THRESH = 0.27
burn_mask = dnbr_dataset.delta_NBR > BURN_THRESH           # True/False mask, same shape as raster
burn_mask.plot()

In [None]:
dx, dy = dnbr_dataset.delta_NBR.rio.resolution()
pixel_area_ha = abs(dx * dy) / 1e4       # 10m × 10m  → 0.01 ha
pixel_area_ha

pixels_burned   = burn_mask.sum().compute().item()   # integer number of burned pixels
burned_area_ha  = pixels_burned * pixel_area_ha

print(f"Pixels burned : {pixels_burned:,d}")
print(f"Burned area   : {burned_area_ha:,.2f} ha")
print(f"Actual Burned Area : {biggest_fire_aoi.sum().compute():,.2f}, ha")

In [None]:
dnbr_dataset['burned_ha_mask'] = burn_mask

In [None]:
dnbr_dataset = dnbr_dataset.persist()
biggest_fire_aoi = biggest_fire_aoi.persist()


# r_dnbr_dataset = dnbr_dataset.rename({'x': 'lon', 'y': 'lat'})
# r_gwis_all = gwis_all.rename({'longitude': 'lon', 'latitude': 'lat'})

In [None]:
biggest_fire_aoi = biggest_fire_aoi.rio.write_crs(ds.rio.crs)

Plot is off because of bad projection (curcilinear to rectilinear) but we can see that generally the fires are in the north-west/north-east regions with two distinct occurances

In [None]:
biggest_fire_aoi_reprojected = biggest_fire_aoi.rio.reproject(dnbr_dataset.delta_NBR.rio.crs)

dnbr_plot = dnbr_dataset.delta_NBR.hvplot(
    width=700,
    height=700,
    title='dNBR (10 m) with GWIS overlay',
    alpha=1.0
)

# Plot the reprojected coarse dataset as transparent overlay
gwis_plot = biggest_fire_aoi_reprojected.hvplot(
    cmap='Reds',
    alpha=0.3,
    clim=(0, biggest_fire_aoi.max().compute().item())
)

# Combine them interactively
combined_plot = dnbr_plot * gwis_plot

combined_plot


# Saving Your Work

xrlint and validate your cube...

In [None]:
linter = xrl.new_linter("recommended")
linter.validate(dnbr_dataset)


# Add metadata descriptions to our data

As a best practices, it's recommended to have a well described data cube. Tools such as xrlint help guide what standards to apply so that you can have consistent and standard metadata across your published datasets

In [None]:
# Assign dataset-level attributes
dnbr_dataset.attrs.update({
    'title': 'Delta NBR and Burned Area Mask Dataset',
    'history': 'Created by reprojecting and aligning datasets for fire severity analysis',
    'Conventions': 'CF-1.7'
})


# Assign variable-level attributes for delta_NBR
dnbr_dataset.delta_NBR.attrs.update({
    'institution': 'Lampata',
    'source': 'Sentinel-2 imagery; processed with open-source dNBR code, element84...',
    'references': 'https://example.com/ref',
    'comment': 'dNBR values represent change in vegetation severity post-fire',
    'standard_name': 'difference_normalized_burn_ratio',
    'long_name': 'Differenced Normalized Burn Ratio (dNBR)',
    'units': 'm'
})

# Example for burned_ha_mask data variable
dnbr_dataset.burned_ha_mask.attrs.update({
    'standard_name': 'burned_area_mask',
    'long_name': 'Burned Area Mask in Hectares',
    'units': 'hectares',
    'institution': 'Your Institution Name',
    'source': 'Derived from wildfire impact analysis',
    'references': 'https://example.com/ref',
    'comment': 'Burned area mask showing presence of burned areas'
})


# Assert Cube

There are other tools available as well such as xcube's assert cube which validates our data cube (e.g. dimensions, chunks, grids between dataarray's match)

In [None]:

assert_cube(dnbr_dataset)  # raises ValueError if it's not xcube-valid


In [None]:
# add time
dnbr_dataset = dnbr_dataset.expand_dims(time=[postfire_ds.time.isel(time=0).values])
dnbr_dataset

Chunk Data for Better Usability for future users

In [None]:
dnbr_dataset = dnbr_dataset.chunk({"time": 1, "y": 1000, "x": 1000})


In [None]:
print(type(dnbr_dataset.burned_ha_mask.data)) # check data format 

In [None]:

save_at_folder = '../wildfires'
if not os.path.exists(save_at_folder):
    os.makedirs(save_at_folder)

# Define the output path within your notebook folder
output_path = os.path.join(save_at_folder, "dnbr_dataset.zarr")

# save
dnbr_dataset.to_zarr(output_path, mode="w")