<a href="https://jupyterhub.user.eopf.eodc.eu/hub/user-redirect/git-pull?repo=https://github.com/eopf-toolkit/eopf-101&branch=main&urlpath=lab/tree/eopf-101/06_eopf_zarr_in_action/65_vegetation_anomalies" target="_blank">
  <button style="background-color:#0072ce; color:white; padding:0.6em 1.2em; font-size:1rem; border:none; border-radius:6px; margin-top:1em;">
    ðŸš€ Launch this notebook in JupyterLab
  </button>
</a>

### Introduction

Forests are the largest terrestrial carbon sinks, playing a critical role in regulating the global carbon cycle through sustained CO$_2$ uptake. However, extreme climate events (such as heatwaves and droughts) can disrupt forest functioning, reducing photosynthetic activity and, in severe cases, causing tree mortality.

While in situ measurements provide valuable information on forest health, collecting such data over large areas is costly, time-consuming, and logistically challenging. Scalable and continuous monitoring therefore requires more efficient approaches.

In this case study, we will explore how to use **Sentinel-2 L2A data stored as zarr data cubes** to analyze vegetation anomalies in forest ecosystems in Germany. Specifically, we will focus on two ICOS sites affected by the drought of 2018: [DE-Hai](https://meta.icos-cp.eu/resources/stations/ES_DE-Hai) and [DE-Tha](https://meta.icos-cp.eu/resources/stations/ES_DE-Tha), with DE-Tha left for learners to explore on their own.

By leveraging **spatiotemporal data cubes**, we will compute spectral indices and derive anomaly time series to evaluate forest responses to extreme events in CO$_2$ uptake (Gross Primary Production, GPP, from ICOS). This notebook will guide you through a **modular workflow** for:

1. Accessing Sentinel-2 zarr data from STAC.  
2. Calculating vegetation indices.  
3. Detecting anomalies in time series.  
4. Visualizing forest responses to environmental extremes.

Through this study case, you will see the potential of **zarr-based data cubes** for scalable forest monitoring.

### What we will learn

In this notebook, you will gain hands-on experience with the following:

- ðŸš€ **Creating Sentinel-2 L2A data cubes** from the [EOPF zarr STAC](https://stac.core.eopf.eodc.eu/).  
- ðŸ”Ž **Operating on data cubes**, including resampling and interpolation.  
- ðŸŒ¿ **Computing spectral indices** using [Awesome Spectral Indices](https://github.com/awesome-spectral-indices/awesome-spectral-indices).  
- ðŸ“ˆ **Deriving vegetation anomalies** from time series data.  
- ðŸ“Š **Visualizing time series** and comparing results against GPP measurements from [ICOS](https://www.icos-cp.eu/).

### Prerequisites

To analyze vegetation anomalies, we will work with **vegetation indices**. To leverage the full power of `xarray`, we will use the **`spyndex` Python package**, which provides a Python API for the **Awesome Spectral Indices** catalogue. This gives access to **over 200 spectral indices**, which can be computed directly on various Python data types, including `xarray` datasets and data arrays.  

For more details, refer to the [Awesome Spectral Indices paper in *Scientific Data*](https://doi.org/10.1038/s41597-023-02096-0).

<hr>

#### Import libraries

In [None]:
import xarray as xr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as sps

import pystac_client
import dask
import spyndex
import os

from pyproj import Transformer
from zarr_wf_utils import validate_scl
from dask.distributed import Client

#### Helper functions

##### `get_items`

This function **queries the EOPF STAC API** for **Sentinel-2 L2A items** that intersect a specified latitude/longitude point within a given **date range**.

In [None]:
def get_items(lat, lon, start_date="2017-01-01", end_date="2024-12-31", return_as_dicts=False):
    """
    Query the EOPF STAC API for Sentinel-2 L2A items intersecting a given latitude/longitude point
    within a specified date range.

    Parameters
    ----------
    lat : float
        Latitude of the point of interest (WGS84).
    lon : float
        Longitude of the point of interest (WGS84).
    start_date : str, default="2017-01-01"
        Start of the date range.
    end_date : str, default="2024-12-31"
        End of the date range.
    return_as_dicts : bool, default=False
        If True, return items as dictionaries; otherwise return STAC Item objects.

    Returns
    -------
    list
        List of STAC Items (or dictionaries) matching the query.
    """

    # Connect to the STAC API
    client = pystac_client.Client.open("https://stac.core.eopf.eodc.eu/")
    
    # Define a GeoJSON Point for spatial filtering
    point = {"type": "Point", "coordinates": [lon, lat]}
    
    # Search for Sentinel-2 L2A items intersecting the point and within the date range
    search = client.search(
        collections=["sentinel-2-l2a"],
        intersects=point,
        datetime=f"{start_date}/{end_date}"
    )

    # Return items either as STAC objects or plain dictionaries
    if return_as_dicts:
        items = list(search.items_as_dicts())
    else:
        items = list(search.items())

    return items

##### `latlon_to_buffer_bbox`

This function **converts a latitude/longitude point** to a specified **projected coordinate system (EPSG)** and then generates a **square bounding box** centered on that point.

In [None]:
def latlon_to_buffer_bbox(lat, lon, epsg, buffer=500):
    """
    Convert a latitude/longitude point to a projected coordinate system (EPSG),
    then generate a square bounding box centered on that point.

    Parameters
    ----------
    lat : float
        Latitude of the point of interest (WGS84).
    lon : float
        Longitude of the point of interest (WGS84).
    epsg : str or int
        Target projected coordinate system (e.g., EPSG:32632). Used for distance-based buffering.
    buffer : float, default=500
        Half-size of the square buffer (in meters). The output box will span
        2*buffer in width and height.

    Returns
    -------
    tuple
        (minx, miny, maxx, maxy) bounding box coordinates in the target EPSG.
    """

    # Transformer: convert geographic coordinates (lon, lat) to projected (x, y)
    transformer = Transformer.from_crs("EPSG:4326", epsg, always_xy=True)

    # Perform the coordinate transformation
    x, y = transformer.transform(lon, lat)

    # Construct the bounding box by applying the buffer in projected units (meters)
    minx = x - buffer
    maxx = x + buffer
    miny = y - buffer
    maxy = y + buffer

    return (minx, miny, maxx, maxy)

##### `open_and_curate_data`

This function **opens a Sentinel-2 STAC item (Zarr)**, performs a **spatial subset** around a given point, applies **SCL-based masking**, and returns an **`xarray.Dataset`** containing only the selected reflectance bands (e.g. `B04`, `B8A`) with a **time dimension**. It uses the `validate_scl` function from `zarr_wf_utils.py`.

> **Note:** This function is a **delayed Dask object**, meaning it will only be executed when explicitly triggered through Dask.

In [None]:
@dask.delayed
def open_and_curate_data(item, lat, lon, bands=["b04", "b8a"], resolution=20, buffer=500, items_as_dicts=False):
    """
    Open a Sentinel-2 STAC item (Zarr), spatially subset it around a point,
    apply SCL-based masking, and return an xarray Dataset containing only
    selected reflectance bands with a time dimension.

    Parameters
    ----------
    item : pystac.Item or dict
        STAC item describing the Sentinel-2 observation. Can be provided
        either as a dict (e.g., after JSON serialization) or as a pystac.Item.
    lat : float
        Latitude of the point of interest (WGS84).
    lon : float
        Longitude of the point of interest (WGS84).
    bands : list
        Bands to retrieve.
    resolution : int, optional
        Spatial resolution to load for reflectance bands and SCL. Default is 20 m.
    buffer : int, optional
        Half-size of the square buffer (in meters) around the point of interest.
        The function extracts a bounding box of size (2*buffer) centered on (lat, lon).
    items_as_dicts : bool, optional
        If True, treat `item` as a Python dictionary with STAC-like structure.
        If False, treat it as a pystac.Item object.

    Returns
    -------
    ds : xr.Dataset (wrapped in dask.delayed)
        Dataset with dimensions (time, y, x) containing:
        - reflectances
        - only valid pixels according to SCL filtering
        - only pixels inside the buffered bounding box
        A time dimension is added so datasets can be concatenated later.
    """

    # --------------------------------------------------------------
    # 1. Extract STAC asset HREF and timestamp
    # --------------------------------------------------------------

    # Standard STAC read depending on input format
    if items_as_dicts:
        href = item["assets"]["product"]["href"]
        datetime_value = item["properties"]["datetime"]
    else:
        href = item.assets["product"].href
        datetime_value = item.properties["datetime"]

    # Convert the STAC datetime to daily precision numpy datetime
    # (removing the trailing "Z" timezone indicator)
    time = np.datetime64(datetime_value.replace("Z", "")).astype("datetime64[D]")

    # Resolution code used by S2 Zarr hierarchy (e.g., "r20m")
    resolution = f"r{resolution}m"

    # --------------------------------------------------------------
    # 2. Open Zarr datatree
    # --------------------------------------------------------------
    ds = xr.open_datatree(
        href,
        engine="zarr",
        consolidated=True,
        chunks="auto"
    )

    # --------------------------------------------------------------
    # 3. Determine projection and build projected bounding box
    # --------------------------------------------------------------

    # EPSG code for the scene (e.g., 32632 for Sentinel-2 tile)
    epsg = ds.attrs["other_metadata"]["horizontal_CRS_code"]

    # Convert (lat, lon) to a projected bounding box centered on the point
    # Buffer is in meters, so this is done in projected CRS
    minx, miny, maxx, maxy = latlon_to_buffer_bbox(lat, lon, epsg, buffer)

    # --------------------------------------------------------------
    # 4. Extract SCL (Scene Classification Layer) and build valid mask
    # --------------------------------------------------------------

    # Access the classification layer at the correct resolution
    scl = ds.conditions.mask.l2a_classification[resolution].scl

    # Convert SCL to a boolean mask indicating valid surface reflectance pixels
    valid_mask = validate_scl(scl)

    # --------------------------------------------------------------
    # 5. Extract reflectance bands and apply mask
    # --------------------------------------------------------------

    # reflectance[...] is a datatree, convert to dataset then select bands
    ds = (
        ds.measurements.reflectance[resolution]
        .to_dataset()[bands]
        .where(valid_mask)          # Apply SCL mask (invalid â†’ NaN)
    )

    # --------------------------------------------------------------
    # 6. Spatial subsetting to bounding box
    # --------------------------------------------------------------

    ds = ds.where(
        (ds.x >= minx) & (ds.x <= maxx) &
        (ds.y >= miny) & (ds.y <= maxy),
        drop=True
    )

    # --------------------------------------------------------------
    # 7. Add time dimension for temporal stacking
    # --------------------------------------------------------------

    # Enforce the shape (time, y, x) so multiple calls can concatenate cleanly
    ds = ds.expand_dims(time=[time])

    return ds

##### `curate_gpp`

This function **loads a GPP time series**, computes **weekly anomalies**, identifies **extreme low-GPP events**, and filters these extremes to retain only events lasting at least a specified number of **consecutive weeks**.

In [None]:
def curate_gpp(dataset="DE-Hai", percentile=0.1, consecutive_weeks=2):
    """
    Load a GPP time series, compute weekly anomalies, identify extreme low-GPP
    events, and filter extremes to retain only events with at least a specified
    number of consecutive weeks.

    Parameters
    ----------
    dataset : str
        Name of the CSV file (without extension) found in ./data/.
    percentile : float
        Lower-tail percentile used to define extreme negative anomalies
        (e.g., 0.1 => 10th percentile of the anomaly distribution).
    consecutive_weeks : int
        Minimum run length of consecutive extreme weeks required for an
        extreme event to be retained.

    Returns
    -------
    df : pandas.DataFrame
        A dataframe indexed by weekly timestamps, containing GPP, anomalies,
        week-of-year, and a final binary 'extreme' flag.
    """

    # --------------------------------------------------------------
    # 1. Load & preprocess time series
    # --------------------------------------------------------------
    df = pd.read_csv(os.path.join("data", f"{dataset}.csv"))

    # Ensure 'time' is parsed as a datetime
    df["time"] = pd.to_datetime(df["time"])

    # Optionally restrict to a start date for consistency
    df = df[df.time >= "2017-01-01"]

    # Set time as index for easier resampling and time-based operations
    df = df.set_index("time")

    # --------------------------------------------------------------
    # 2. Temporal aggregation: convert to weekly median GPP
    # --------------------------------------------------------------
    df = df.resample("1W").median()

    # Extract week-of-year for building a weekly climatology
    df["weekofyear"] = df.index.isocalendar().week

    # --------------------------------------------------------------
    # 3. Compute weekly climatology (multi-year mean per week)
    # --------------------------------------------------------------
    df_msc = df.groupby("weekofyear")["GPP_NT_VUT_REF"].mean()

    # Weekly anomaly = observed GPP - weekly climatological mean
    df["anomaly"] = df["GPP_NT_VUT_REF"] - df["weekofyear"].map(df_msc)

    # --------------------------------------------------------------
    # 4. Identify extreme anomalies using a Gaussian percentile threshold
    # --------------------------------------------------------------
    # Fit a normal distribution to the anomaly series
    dist = sps.norm(
        loc=df["anomaly"].mean(),
        scale=df["anomaly"].std()
    )

    # Lower-tail anomaly threshold corresponding to the chosen percentile
    q = np.abs(dist.ppf(percentile))

    # Initial binary extreme flag: 1 = extreme low anomaly
    df["extreme"] = 0
    df.loc[df["anomaly"] <= -q, "extreme"] = 1

    # --------------------------------------------------------------
    # 5. Filter extremes: keep only runs with >= consecutive_weeks
    # --------------------------------------------------------------
    s = df["extreme"]

    # Identify contiguous groups of identical values (0-runs and 1-runs)
    groups = (s != s.shift()).cumsum()

    # Compute the run length for each time step
    run_lengths = s.groupby(groups).transform("size")

    # Final extreme flag: extreme only if in a run of sufficient length
    df["extreme"] = ((s == 1) & (run_lengths >= consecutive_weeks)).astype(int)

    return df

Global colors to be used for visualization.

In [None]:
# Colors for indices
NDVI_COLOR = 'limegreen'
kNDVI_COLOR = 'darkviolet'

# Color for zero line in anomalies
ZERO_COLOR = 'red'

<hr>

## Vegetation Anomalies in the Hainich National Park

We will start the notebook with a forest ecosystem that was severely affacted by the drought of 2018: DE-Hai.

In [None]:
# DE-Hai Coordinates
HAI_LAT = 51.079212
HAI_LON = 10.452168

Initialize a **Dask distributed client** to enable parallel and delayed computation. This will manage the execution of tasks, such as loading and processing large Sentinel-2 zarr datasets, efficiently.

In [None]:
client = Client()
client

### Load the GPP data

Load the **GPP time series** for the DE-Hai site and compute **weekly anomalies**, identifying extreme low-GPP events.

In [None]:
HAI_df = curate_gpp("DE-Hai")

### Create the Sentinel-2 L2A Data Cube

Query the **EOPF STAC API** to retrieve Sentinel-2 L2A items for the DE-Hai site. For each item, **open and curate** the data by subsetting around the site coordinates and selecting the relevant bands.  

The datasets are then **computed in parallel with Dask**, concatenated along the **time dimension**, sorted by time, and finally **loaded into memory** as a single `xarray.Dataset`.

In [None]:
# Get all items as a list of dicts
HAI_items = get_items(HAI_LAT,HAI_LON,return_as_dicts=True)

# Create the delayed Dask objects
HAI_ds = [open_and_curate_data(item,lat=HAI_LAT,lon=HAI_LON,items_as_dicts=True) for item in HAI_items]

# Compute the delayed objects in parallel. This outputs a list of xarray.Dataset objects
HAI_data = dask.compute(*HAI_ds)

# Concatenate the previous list using the time dimension and sort it
HAI_ds = xr.concat(HAI_data,dim="time").sortby("time")

# Load it into memory
HAI_ds = HAI_ds.compute()

### Compute Vegetation Indices

Compute **spectral indices** for the DE-Hai dataset using the [`spyndex` package](https://github.com/awesome-spectral-indices/spyndex). In this example, we calculate **NDVI** and **kernel NDVI (kNDVI)**.

- [**NDVI (Normalized Difference Vegetation Index)**](https://ntrs.nasa.gov/citations/19740022614) is a widely used index to monitor vegetation health and greenness. It is calculated as:

$$\text{NDVI} = \frac{N - R}{N + R}$$

where `N` is the near-infrared band (`B8A`) and `R` is the red band (`B04`).

- [**kNDVI (Kernel NDVI)**](https://doi.org/10.1126/sciadv.abc7447) is a kernelized version of NDVI. Here, an **RBF (Radial Basis Function) kernel** is applied:

$$\text{kNDVI} = \frac{k(N,N) - k(N,R)}{k(N,N) + k(N,R)}$$

where $k(a,b)$ is the RBF kernel:

$$k(N,N) = 1$$

$$k(N,R) = \exp\left(-\frac{(N-R)^2}{2 \sigma^2}\right)$$

Here, $\sigma$ is claculated as the median in the time dimension of $0.5(N+R)$.

The `spyndex.computeIndex` function computes both indices across the time series and stores them in an **`xarray.Dataset`** named `idx` for subsequent anomaly analysis.

The `spyndex.computeKernel` function computes the kernel for the kNDVI.

Note that in both cases `spyndex` just requires the data for computing the indices without the need of hard-coding the formulas.


In [None]:
idx = spyndex.computeIndex(
    ["NDVI","kNDVI"], # Indices to compute
    N = HAI_ds["b8a"], # NIR band
    R = HAI_ds["b04"], # Red band
    kNN = 1.0,
    kNR = spyndex.computeKernel(
        "RBF", # RBF kernel
        a = HAI_ds["b8a"],
        b = HAI_ds["b04"],
        sigma = ((HAI_ds["b8a"] + HAI_ds["b04"])/2.0).median("time")
    )
).to_dataset("index")

Add the name and units of each index to the attributes according to the CF conventions.

In [None]:
idx.NDVI.attrs["long_name"] = spyndex.indices.NDVI.long_name
idx.NDVI.attrs["units"] = "1"

idx.kNDVI.attrs["long_name"] = spyndex.indices.kNDVI.long_name
idx.kNDVI.attrs["units"] = "1"

Resample the NDVI and kNDVI time series to **weekly frequency**, taking the **median** within each week. After resampling, fill temporal gaps by applying **cubic interpolation** along the time dimension. This produces smooth, continuous weekly index time series suitable for anomaly computation.

In [None]:
idx = idx.resample(time="1W").median().interpolate_na(dim="time",method="cubic")

### Calculate Vegetation Anomalies

Compute the **median seasonal cycle** (MSC) for NDVI and kNDVI.

By grouping the time series by `weekofyear` and taking the **median across years**, this step produces a climatological baseline that represents the typical vegetation state for each week of the year.  

The MSC will be used later to derive vegetation anomalies.

In [None]:
msc = idx.groupby("time.weekofyear").median("time")

Plot the MSC of the NDVI.

In [None]:
msc.NDVI.plot.imshow(col = "weekofyear",cmap = "viridis",col_wrap = 8,vmin=0,vmax=1)

Plot the MSC of the kNDVI.

In [None]:
msc.kNDVI.plot.imshow(col = "weekofyear",cmap = "viridis",col_wrap = 8,vmin=0,vmax=1)

Compute **vegetation anomalies** by subtracting the **median seasonal cycle (MSC)** from the weekly NDVI and kNDVI values. This step isolates deviations from the expected seasonal pattern, allowing us to identify abnormal vegetation conditions potentially linked to stress or extreme events.

In [None]:
idx_anomalies = idx.groupby("time.weekofyear") - msc

Add the name and units of each index anomaly to the attributes according to the CF conventions.

In [None]:
idx_anomalies.NDVI.attrs["long_name"] = spyndex.indices.NDVI.long_name + " Anomaly"
idx_anomalies.NDVI.attrs["units"] = "1"

idx_anomalies.kNDVI.attrs["long_name"] = spyndex.indices.kNDVI.long_name + " Anomaly"
idx_anomalies.kNDVI.attrs["units"] = "1"

### Visualize Time Series

Aggregate the indices in space using the median to produce a time series.

In [None]:
idx_agg = idx.median(["x","y"])

Plot the NDVI and kNDVI time series together with the GPP measurements for the DE-Hai site.

A secondary axis is used to display GPP, allowing direct visual comparison between vegetation dynamics and ecosystem productivity.

Extreme low-GPP events are highlighted as shaded red intervals.  

These events are defined as **periods of at least two consecutive days** in which GPP anomalies fall **below the 10th percentile** of the lower tail of the distribution.  

This visualization helps link vegetation index anomalies to observed reductions in carbon uptake, revealing how forest canopy responses relate to ecosystem-scale stress signals.

In [None]:
fig, ax = plt.subplots(figsize=(15, 3))

ax.plot(idx_agg.time, idx_agg["NDVI"],  color=NDVI_COLOR,  label="NDVI")
ax.plot(idx_agg.time, idx_agg["kNDVI"], color=kNDVI_COLOR, label="kNDVI")
ax.set_ylim([-0.15,1.2])
ax.set_ylabel("VI")
ax.legend(loc="upper left")

ax2 = ax.twinx()
ax2.scatter(df.index, df["GPP_NT_VUT_REF"], 
            s=20, color="grey", alpha=0.6, label="GPP")
ax2.set_ylim([-3.5,17.5])
ax2.set_ylabel("GPP")
ax2.legend(loc="upper right")

extreme_mask = df["extreme"] == 1
groups = (extreme_mask != extreme_mask.shift()).cumsum()

for _, group in df[extreme_mask].groupby(groups):
    start = group.index.min()
    end   = group.index.max()
    ax.axvspan(start, end, color="red", alpha=0.15)

plt.title("NDVI, kNDVI, GPP and Extreme Events")
plt.tight_layout()
plt.show()

Aggregate the anomalies of the indices in space using the median to produce a time series.

In [None]:
idx_anomalies_agg = idx_anomalies.median(["x","y"])

Visualize the **anomaly time series** of NDVI, kNDVI, and GPP for the DE-Hai site.  

Here, both vegetation indices and GPP have been transformed into **weekly anomalies**, representing deviations from their typical seasonal cycles. A horizontal line at zero indicates the expected baseline.

A secondary axis displays **GPP anomalies**, allowing direct comparison between canopy-level spectral responses and ecosystem-level carbon uptake changes.

Extreme low-GPP events are shown as shaded red intervals.  

This plot highlights how vegetation index anomalies co-occur with (e.g. 2021) or diverge (e.g. 2018) from carbon uptake anomalies, offering insight into forest responses to stress events.

In [None]:
fig, ax = plt.subplots(figsize=(15, 3))

ax.plot(idx_anomalies_agg.time, idx_anomalies_agg["NDVI"],  color=NDVI_COLOR,  label="NDVI")
ax.plot(idx_anomalies_agg.time, idx_anomalies_agg["kNDVI"], color=kNDVI_COLOR, label="kNDVI")
ax.axhline(0, color=ZERO_COLOR, linewidth=1)
ax.set_ylim([-0.45,0.45])
ax.set_ylabel("VI Anomaly")
ax.legend(loc="upper left")

ax2 = ax.twinx()
ax2.scatter(df.index, df["anomaly"], 
            s=20, color="grey", alpha=0.6, label="GPP")
ax2.set_ylim([-6.5,6.5])
ax2.set_ylabel("GPP Anomaly")
ax2.legend(loc="upper right")

extreme_mask = df["extreme"] == 1
groups = (extreme_mask != extreme_mask.shift()).cumsum()

for _, group in df[extreme_mask].groupby(groups):
    start = group.index.min()
    end   = group.index.max()
    ax.axvspan(start, end, color="red", alpha=0.15)

plt.title("NDVI, kNDVI, GPP Anomalies and Extreme Events")
plt.tight_layout()
plt.show()

<hr>

## ðŸ’ª Now it is your turn

The following exercises will help you reproduce the previous workflow for another dataset.

### Task 1: Create a data cube for DE-Tha
* Use the coordinates of DE-Tha (provided in the cell below) to create a data cube for this site.
* Retrieve the red edge bands in addition to the NIR and red bands.
* Modify the code as you need.

In [None]:
# DE-Tha Coordinates
THA_LAT = 50.9625
THA_LON = 13.56515

# Get all items as a list of dicts
# THA_items = get_items(THA_LAT,THA_LON,return_as_dicts=True)

# Create the delayed Dask objects
# THA_ds = [open_and_curate_data(..., bands=["b04", "b05", "b06", "b07", "b8a"]) for ...]

# ...

### Task 2: Compute Vegetation Indices
* Select a vegetation index from [Awesome Spectral Indices](https://github.com/awesome-spectral-indices/awesome-spectral-indices) that uses the Red Edge bands.
* Modify the code as you need.

In [None]:
# Indices that include any of the red edge bands
for idx, attrs in spyndex.indices.items():
    if any(item in ["RE1","RE2","RE3"] for item in attrs.bands):
        print(idx)

### Task 3: Calculate Vegetation Anomalies
* Calculate anomalies for the selected index and compare them against the GPP anomalies of DE-Tha.

In [None]:
THA_df = curate_gpp("DE-Tha")

# THA_df["GPP_NT_VUT_REF"] -> GPP values
# THA_df["time"] -> time simension
# THA_df["anomaly"] -> Anomalies
# THA_df["extreme"] -> Extreme = 1, Normal condition = 0

## Conclusion

In this notebook, we explored how **Sentinel-2 L2A zarr data cubes** can be used to monitor forest vegetation dynamics and detect anomalous behavior linked to ecosystem stress. By leveraging zarr, STAC-based discovery, and `xarray`/`Dask` for scalable computation, we built an end-to-end workflow that included:

- Accessing Sentinel-2 data from the EOPF zarr STAC  
- Creating spatiotemporal data cubes centered on forest monitoring sites  
- Computing spectral indices (NDVI, kNDVI) using the Awesome Spectral Indices catalogue  
- Constructing weekly time series and climatological baselines  
- Deriving vegetation anomalies and comparing them with GPP anomalies from ICOS  
- Identifying and visualizing extreme low-GPP events

Through the joint analysis of **spectral indices** and **ecosystem productivity**, we demonstrated how remote sensing can reveal (or not) signals of forest stress and complement flux tower observations. This workflow illustrates the value of **zarr-based EO data**, **open standards (STAC)**, and **modern geospatial Python tools** for reproducible and scalable environmental monitoring.

### Acknowledgements

We would like to thank ICOS for providing the data on the Ecosystem stations DE-Hai [1] and DE-Tha [2].

### References

[1] Knohl, A., Schulze, E.-D., Kolle, O., & Buchmann, N. (2003). Large carbon uptake by an unmanaged 250-year-old deciduous forest in Central Germany. Agricultural and Forest Meteorology, 118(3â€“4), 151â€“167. https://doi.org/10.1016/s0168-1923(03)00115-1

[2] GrÃ¼nwald, T., & Bernhofer, C. (2007). A decade of carbon, water and energy flux measurements of an old spruce forest at the Anchor Station Tharandt. Tellus B: Chemical and Physical Meteorology, 59(3), 387. https://doi.org/10.1111/j.1600-0889.2007.00259.x
