### Author

- **Walid Ghariani** - [GitHub Profile](https://github.com/WalidGharianiEAGLE)

### Affiliation
- **DHI** - https://www.dhigroup.com/

### Introduction

Water is a vital part of Earth ecosystems and life, supporting biodiversity, and sustaining human livelihoods. Monitoring surface water dynamics (occurence, frequency and change) is important for managing resources, and mitigating natural hazards such as floods and droughts. Wetlands in particular are unique ecosystems under the influence of precipitaion, hydrological processes and coastal dynamics, which contribute in shaping the habitats, and species diversity. Within this context, the [Keta](https://rsis.ramsar.org/ris/567) and [Songor](https://rsis.ramsar.org/ris/566) [Ramsar](https://www.ramsar.org/) Sites in southeastern Ghana present a dynamic coastal wetland system where water levels fluctuate due to rainfall, tidal influence, and lagoon-river interactions. 

These ramsar sites consists of different wetlands classes such as marshes, floodplains, mangroves, and seasonally inundated grasslands. These unique sites are also critical habitats for thousands of resident and migratory birds, fish, and sea turtles, while supporting local livelihoods through fishing, agriculture, and salt production. Threfore an effective monitoring of water dynamics of these ecosystems is essential for conserving biodiversity and sustaining community resources.

Sentinel-1 Synthetic Aperture Radar (SAR) can detect water under all weather conditions, overcoming limitations of optical imagery caused by cloud cover. By processing and analyzing Sentinel-1 time series and its radar backscatter (VH, VV), water occurrence, frequency and dynamics in Ramsar wetlands could be detected and monitored providing valuable insights of these complex wetlands sites.

In this notebook, we will explore how to monitor surface water dynamics in coastal wetlands using Sentinel-1 time series. We will process radar backscatter data to detect water occurrence, frequency, and change over time.

<hr>

### What we will learn

- üõ∞Ô∏è Accessing Sentinel-1 Ground Range Detected (GRD) `.zarr` data via the EODC STAC API using `pystac-client`.
- üõ†Ô∏è Preprocessing radar imagery including spatial subsetting, radiometric calibration, speckle filtering, georeferencing, and regridding.
- üåä Generating surface water masks using an adaptative thresholding algorithm.  
- üìä Time series analysis of water dynamics.


#### Import libraries

In [None]:
import datetime as dt
from tqdm import tqdm

import fsspec
import numpy as np
import xarray as xr
import hvplot.xarray
import pandas as pd
import matplotlib.pyplot as plt
from pystac_client import Client

import session_info
import warnings

warnings.filterwarnings("ignore", category=UserWarning, module="pkg_resources")

#### Helper functions

Before starting the analysis workflow, we import several helper functions from **`zarr_s1_utils.py`** that implement key processing steps for Sentinel-1.

In [None]:
from zarr_s1_utils import (
    subset,
    radiometric_calibration,
    lee_filter_dask,
    regrid,
    xr_threshold_otsu,
)

In [None]:
session_info.show()

<hr>

## Data search

As we are interested into processing data that covers [Keta](https://rsis.ramsar.org/ris/567) and [Songor](https://rsis.ramsar.org/ris/566) [Ramsar](https://www.ramsar.org/) Sites in southeastern Ghana  we define our interest parameters for filering the Sentinel-1 GRD data using `pystac-client` in the EOPF STAC Catalog.

In [None]:
# Configs for the Area Of Interest (AOI), time range and Polarization
aoi_bounds = [0.60912065235268, 5.759873096746288, 0.714565658530316, 5.837736228130655]
date_start = dt.datetime(2024, 1, 1)
date_end = dt.datetime(2025, 1, 1)

In [None]:
catalog = Client.open("https://stac.core.eopf.eodc.eu")
search = catalog.search(
    collections=["sentinel-1-l1-grd"],
    bbox=aoi_bounds,
    datetime=f"{date_start:%Y-%m-%d}/{date_end:%Y-%m-%d}",
)
items = search.item_collection()

To have an overview of the retrieved scenes, we can inspect the first item

In [None]:
# lets inspect the first item
items[0]

### Data exploration

To have an overview of Sentinel-1's `.zarr` product, we can navigate its hierarchical structure, and extract the data, geolocation conditions, and calibration metadata for the polarization we are interested in.

In [None]:
polarization = "VH"  # or "VV"

url = items[0].assets["product"].href
store = fsspec.get_mapper(url)
datatree = xr.open_datatree(store, engine="zarr", chunks={})
group = [x for x in datatree.children if f"{polarization}" in x][0]
group

In [None]:
grd = datatree[group]["measurements/grd"]
gcp = datatree[group]["conditions/gcp"].to_dataset()
calibration = datatree[group]["quality/calibration"].to_dataset()

In [None]:
grd

In [None]:
gcp

In [None]:
calibration

## Preprocessing

### Spatial subset

To focus our analysis over the chosen AOI, we can efficiently crop our dataset using a spatial subset. The function **`subset()`** determine the slices in azimuth_time and ground_range that cover the AOI, and then extract and mask the corresponding portion of the GRD dataset.

The function **`subset()`** takes the following keyword arguments:

* **`grd`**:The GRD dataset to be cropped (the radar image in azimuth and ground range coordinates).
* **`gcp_ds`**: The GCP (Ground Control Points) dataset containing the latitude and longitude grids used to geolocate the GRD image.
* **`aoi_bounds`**: The geographic bounding box of the Area of Interest, given as: `[min_lon, min_lat, max_lon, max_lat]`.
* **`offset`**: The number of GCP grid cells to include around the AOI center. This adds a small margin around the AOI to ensure the cropped region fully covers it.

In [None]:
grd_subset = subset(grd=grd, gcp_ds=gcp, aoi_bounds=aoi_bounds, offset=1)

In [None]:
grd_subset

In [None]:
grd_subset.plot(robust=True, cmap="cividis")
plt.show()

### Radiometric calibration

In order to get a meanigful physical properties of features in the SAR scene that could be used for quantitative analysis, we need to apply a radiometric calibration on the backscatter values. This step converts the backscatter into a calibrated normalized radar cross section, correcting for incidence angle and sensor characteristics ensuring SAR images from different dates or viewing geometries are directly comparable. 

- Reference: https://step.esa.int/docs/tutorials/S1TBX%20SAR%20Basics%20Tutorial.pdf

The radiometric calibration is done using the **`radiometric_calibration()`** function which takes the following keyword arguments:

* **`grd`**:The GRD data array to be calibrated. 
* **`calibration_ds`**: The calibration dataset containing the radiometric calibration lookup tables provided with the product. This dataset is interpolated to the GRD grid before being applied.
* **`calibration_type`**: The name of the calibration parameter to use from the calibration dataset. In this case, `sigma_nought` is used to compute the sigma nought backscatter coefficient.

To have a look into the available data vars within the calibration dataset that could be used for the radiometric calibration

In [None]:
calibration.data_vars

We apply the calibration, obtaining:

In [None]:
sigma_0 = radiometric_calibration(
    grd=grd_subset, calibration_ds=calibration, calibration_type="sigma_nought"
)

In [None]:
sigma_0

In [None]:
sigma_0.plot(robust=True, cmap="cividis", cbar_kwargs={"label": "Sigma Nought"})

### Speckel filtering

Raw SAR imagery is characterized by "grainy" or "salt and pepper" effect caused by random constructive and destructive interference, known as **speckle**. In order to reduce this effect and noise, we apply the spatial **Lee Filter** ([Lee et al., 2009](https://doi.org/10.1080/02757259409532206)) that averages the pixel values while preserving edges.

For speckel filtering we will use the **`lee_filter_dask()`** function which takes the following keyword arguments:

* **`da`**: The input `xarray.DataArray` to be filtered. Here, `sigma_0` is the radiometrically calibrated GRD subset.
* **`size`**: The size of the square moving window used to compute the local statistics for the Lee filter. Odd numbers are recommended (e.g., 5√ó5). Larger windows produce stronger smoothing but may reduce spatial detail.

In [None]:
sigma_0_spk = lee_filter_dask(da=sigma_0, size=5)

In [None]:
sigma_0_spk

Obtaining a cleaner image

In [None]:
sigma_0_spk.plot(robust=True, cmap="cividis", cbar_kwargs={"label": "Sigma Nought"})
plt.title("Sigma Nought after Lee Filter")
plt.show()

### Georefrecing and regredding

Sentinel-1 GRD Zarr comes in irregular image geometry that does not align with common geographic grids. In order to conduct spatial analysis, and scenes comparison, and mapping, we need to georefrence and resample the data onto a regular grid.<br>
We will use an [ODC GeoBox](https://odc-geo.readthedocs.io/en/latest/intro-geobox.html) along with with [scipy.griddata](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html) to interpolate the SAR values onto a consistent latitude-longitude grid. These utilities are available within the **`regrid()`** function which takes the following keyword arguments:

* **`da`**: The input `xarray.DataArray` to regrid. Here, `sigma_0_spk` is the speckle-filtered GRD subset.
* **`bounds`**: The geographic bounding box for the output grid: `[min_lon, min_lat, max_lon, max_lat]`.
* **`resolution`**: Tuple `(dx, dy)` defining the grid spacing in the coordinate units of the CRS. For example, `10 / 111320` degrees corresponds roughly to 10 meters.
* **`crs`**: Coordinate reference system of the output grid (default is `"EPSG:4326"` for WGS84).
* **`method`**: Interpolation method for mapping irregularly spaced data to the regular grid. Options include `"nearest"`, `"linear"`, or `"cubic"`.

In [None]:
resolution = 10 / 111320  # approx 10 meters in degrees
crs = "epsg:4326"

sigma_0_spk_geo = regrid(
    da=sigma_0_spk,
    bounds=aoi_bounds,
    resolution=resolution,
    crs=crs,
    method="nearest",  # "linear" or "cubic"
)

In [None]:
sigma_0_spk_geo

In [None]:
sigma_0_spk_geo.odcgeobox

In [None]:
sigma_0_spk_geo.plot(robust=True, cmap="cividis", cbar_kwargs={"label": "Sigma Nought"})
plt.title("Regridded Sigma Nought after Lee Filter")
plt.show()

### Convert backscatter to dB

We convert the regridded `sigma_0` backscatter intensity from Linear scale to decibels (dB) using a logarithmic transformation. This enhances contrast and simplifies statistical analysis and interpretation of the image. It is considered a standard approach for representing SAR intensity.

In [None]:
sigma_0_spk_geo_db = 10 * np.log10(sigma_0_spk_geo)

In [None]:
sigma_0_spk_geo_db

In [None]:
sigma_0_spk_geo_db.plot(
    robust=True, cmap="cividis", cbar_kwargs={"label": "Sigma Nought dB"}
)
plt.title("Calibrated Sigma Nought in dB")
plt.show()

In [None]:
sigma_0_spk_geo_db.hvplot.image(
    x="x",
    y="y",
    robust=True,
    cmap="cividis",
    title="SAR GRD",
)

## Water mask 

To separate water from non-water surfaces, we first inspect the distribution of backscatter values using a histogram. In the following histogram we could choose -19 as therhold.

In [None]:
plt.hist(sigma_0_spk_geo_db.values.ravel(), bins=50, alpha=0.7)
plt.title("Histogram of Sigma Nought (dB) Values")
plt.show()

Since SAR water thresholds vary across scenes and times, a fixed cutoff is unreliable. Threfore, we apply an adaptative thresholding method using **Otsu** algorithm provided by [skimage](https://scikit-image.org/docs/stable/api/skimage.filters.html#skimage.filters.threshold_otsu), which automatically determines an optimal threshold from the intensity distribution and create a water mask accordingly. This algorithm is available within **`xr_threshold_otsu`** function which takes the following keyword arguments:

* **`da`**: Input `xarray.DataArray` to threshold. 
* **`mask_nan`**: If True (default), any NaN values are ignored during threshold computation.
* **`return_threshold`**: If True, the calculated threshold value is stored as an attribute of the resulting mask.
* **`mask_name`**: Optional name for the binary mask DataArray. This helps with metadata or when saving to file.

In [None]:
water_mask = xr_threshold_otsu(
    da=sigma_0_spk_geo_db, mask_nan=True, return_threshold=True, mask_name="water_mask"
)
print(f"Otsu threshold: {water_mask.attrs['threshold']}")

In [None]:
water_mask

In [None]:
(1 - water_mask).plot(cmap="Blues")
plt.title("Water Mask (1 = Water, 0 = Non-Water)")
plt.show()

In [None]:
(1 - water_mask).hvplot.image(
    x="x",
    y="y",
    cmap="Blues",
    robust=True,
    title="Water Mask",
)

## Time series analysis 

So far we have walked through each processing step separately. To automate the workflow and apply it efficiently across Sentinel-1 acquisitions, we could now wrap all these operations into a single processing function **`process_item`**. This allows us to automatically generate water masks for every item in our STAC collection.

In [None]:
def process_item(
    item, aoi_bounds, polarization="VH", resolution=10 / 111320, crs="epsg:4326"
):
    """Process a STAC item to generate water mask and timestamp."""
    url = item.assets["product"].href
    store = fsspec.get_mapper(url)
    datatree = xr.open_datatree(store, engine="zarr", chunks={})

    group_VH = [x for x in datatree.children if f"{polarization}" in x][0]

    grd = datatree[group_VH]["measurements/grd"]
    gcp = datatree[group_VH]["conditions/gcp"].to_dataset()
    calibration = datatree[group_VH]["quality/calibration"].to_dataset()

    grd_subset = subset(grd, gcp, aoi_bounds, offset=1)
    sigma_0 = radiometric_calibration(
        grd_subset, calibration, calibration_type="sigma_nought"
    )
    sigma_0_spk = lee_filter_dask(sigma_0, size=5)
    sigma_0_spk_geo = regrid(
        da=sigma_0_spk,
        bounds=aoi_bounds,
        resolution=resolution,
        crs=crs,
        method="nearest",
    )
    sigma_0_spk_geo_db = 10 * np.log10(sigma_0_spk_geo)
    water_mask = xr_threshold_otsu(
        sigma_0_spk_geo_db, return_threshold=True, mask_name="water_mask"
    )

    t = np.datetime64(item.properties["datetime"][:-2], "ns")
    water_mask = water_mask.assign_coords(time=t)

    return water_mask

Now we loop through all STAC items and use the `process_item` to build the full water-mask dataset.

In [None]:
water_masks = []
thresholds = []

for item in tqdm(items):
    water_mask = process_item(item, aoi_bounds)
    water_masks.append(1 - water_mask)  # invert mask to have 1 = water
    thresholds.append(water_mask.attrs["threshold"])

water_mask_ds = xr.concat(water_masks, dim="time")
water_mask_ds = water_mask_ds.assign_coords(threshold=("time", thresholds)).sortby(
    "time"
)

In [None]:
water_mask_ds

In [None]:
# Check the water mask thresholds over time
plt.plot(water_mask_ds.time, water_mask_ds.threshold, marker="o")
plt.title("Water Mask Thresholds over Time")
plt.show()

In [None]:
dates = water_mask_ds.time
dates

Lets inspect the water mask for the first 4 dates.

In [None]:
water_mask_ds.sel(time=dates[:4]).plot(col="time", cmap="Blues", vmin=0, vmax=1)

We found out that the RAW Sentinel-1 GRD scene for 2024-06-07 has issues and artificat that will lead to incorrect water classification. Therefore, we will exclude it from our water frequency analysis.

In [None]:
bad_date = pd.to_datetime("2024-06-07").date()
filtered_water_ds = water_mask_ds.sel(
    time=water_mask_ds.time.to_index().date != bad_date
)

In [None]:
filtered_water_ds

### Monthly surface water frequency

Now that we have the water masks, we can group them by month, compute the average water presence for each pixel, and convert it to a percentage (%) to represent monthly water frequency.

In [None]:
monthly_water_frequency = filtered_water_ds.groupby("time.month").mean("time") * 100

monthly_water_frequency.name = "SWF"
month_names = pd.date_range(start="2024-01-01", periods=12, freq="ME").strftime("%B")
monthly_water_frequency.coords["month"] = ("month", month_names)
monthly_water_frequency = monthly_water_frequency.assign_attrs(
    long_name="Surface Water Frequency"
)
monthly_water_frequency = monthly_water_frequency.assign_attrs(units="%")
monthly_water_frequency

In [None]:
monthly_water_frequency.plot(col="month", col_wrap=4, cmap="Blues", robust=True)

### Annual surface water frequency

Lets calculates the average water occurence across the entire time series over the year.

In [None]:
annual_water_frequency = filtered_water_ds.mean("time") * 100
annual_water_frequency = annual_water_frequency.assign_attrs(
    long_name="Surface Water Frequency"
)
annual_water_frequency = annual_water_frequency.assign_attrs(units="%")

In [None]:
annual_water_frequency.plot(cmap="Blues")
plt.title("Annual Surface Water Frequency (%)")
plt.show()

In [None]:
annual_water_frequency.hvplot.image(
    x="x",
    y="y",
    robust=True,
    cmap="Blues",
    title="Annual Surface Water Frequency (%)",
)

### Annual water change

We could also computes the standard deviation of water presence over time. This highlights areas with high water variability (seasonal or dynamic water bodies).

In [None]:
annual_water_change = filtered_water_ds.std("time") * 100
annual_water_change = annual_water_change.assign_attrs(
    long_name="Surface Water Variation"
)
annual_water_change = annual_water_change.assign_attrs(units="%")

In [None]:
annual_water_change.plot(cmap="magma")
plt.title("Annual Surface Water Variation")
plt.show()

In [None]:
annual_water_change.hvplot.image(
    x="x",
    y="y",
    robust=True,
    cmap="magma",
    title="Annual Surface Water Variation",
)

<hr>

## üí™ Now it is your turn 

congratulations üéâ<br>

We have worked through the complete workflow for analyzing Sentinel-1 surface water dynamics from `.zarr` data. Now it is your turn to explore and expand the analysis in the following ways:

### Task 1: Explore your own area of interest

Choose a different wetland, lake system, river delta, or floodplain anywhere in the world. Use different STAC search configurations (`aoi_bounds`, `date_start`, `date_end`) and derive the:

* Seasonal variation
* Maximum water extent
* Minumum water extent
* Long-term changes in water extent (more than 1 year of data)

###  Task 2: Experiment with other polarizations and combinations

Instead of relying solely on `VH` for water extraction try:

* `VV` polarization
* `VH/VV` ratio
* Dual-polarization thresholding or machine learning classification ([Kreiser, Z. et al., 2018](https://doi.org/10.1109/IGARSS.2018.8517447))

### Task 3: Compare or integrate Sentinel-1 with Sentinel-2

* Design a workflow for Sentinel-2 water detection and consider using spectral indices such as the Normalized Difference Water Index (NDWI) [Gao, 1996](https://doi.org/10.1016/S0034-4257(96)00067-3) or the Modified Normalized Difference Water Index (mNDWI) [Xu, 2006](https://doi.org/10.1080/01431160600589179).

* Compare Sentinel-2 water dectection with our Sentinel-1 workflow. 
* Recent studies show that combining Sentinel-1 and Sentinel-2 data can improve water detection ([Bioresita et al., 2019](https://doi.org/10.1080/01431161.2019.1624869); [Kaplan and Avdan 2018](https://doi.org/10.5194/isprs-archives-XLII-3-729-2018)). Consider exploring methods that fuse the information from both sensors.

<hr>

## Conclusion

In this notebook, we demonstrated how to use Sentinel-1 data in `.zarr` format for time-series analysis of surface water dynamics in a wetland coastal and cloudy-prone area. The zarr structure is especially useful for efficient extraction of data over a specific area of interest without loading the full dataset.

We developed a streamlined time series workflow using `pystac-client` and the EODC STAC API to preprocess Sentinel-1 GRD data. This included spatial subsetting, radiometric calibration, speckle filtering, georeferencing, and regridding. We also implemented an automated process for surface water detection and derived surface water occurrence, frequency, and change.

**Note**: Although in this notebook we opted to use `VH` polarization to detect the water, users may also experiment with `VV` polarization or combine both polarizations for improved water detection, and implement their own methods for deriving surface water masks. This workflow can also be adapted for other applications, such as monitoring lake dynamic and flood events.

<hr>

## What's next?

This resource is constantly updated!. Stay Tuned for new chapters üõ∞Ô∏è !