# Water Detection with Sentinel-2 Pre-trained Model

[![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/opengeos/geoai/blob/main/docs/examples/water_detection_s2_hf.ipynb)

This notebook demonstrates surface water detection using a semantic segmentation model trained on the [Earth Surface Water Dataset](https://zenodo.org/records/5205674). The model uses an **EfficientNet-B4** encoder with a **UNet++** decoder architecture, trained on Sentinel-2 multispectral imagery (6 bands).

## Key Features

- **Pre-trained model** loaded directly from HuggingFace Hub
- **6-band Sentinel-2 input** (Blue, Green, Red, NIR, SWIR1, SWIR2)
- **Sliding-window inference** for processing large satellite scenes
- **Vectorization** of predicted masks into water body polygons

## Install package

To use the `geoai-py` package, ensure it is installed in your environment. Uncomment the command below if needed.

In [None]:
# %pip install geoai-py timm segmentation-models-pytorch smoothify

## Import libraries

In [None]:
import geoai

## Download sample data

Download a sample Sentinel-2 scene and its ground truth mask from the [Earth Surface Water Dataset](https://zenodo.org/records/5205674) on HuggingFace.

In [None]:
image_url = "https://huggingface.co/datasets/giswqs/s2-water-dataset/resolve/main/val_scene/S2A_L2A_20190318_N0211_R061_6Bands_S2.tif"
image_path = geoai.download_file(image_url)

In [None]:
truth_url = "https://huggingface.co/datasets/giswqs/s2-water-dataset/resolve/main/val_truth/S2A_L2A_20190318_N0211_R061_S2_Truth.tif"
truth_path = geoai.download_file(truth_url)

## Visualize input data

View the Sentinel-2 scene using a false-color composite (SWIR2, SWIR1, NIR â€” bands 6, 5, 4) for better water visibility.

In [None]:
geoai.view_raster(image_path, indexes=[4, 3, 2], vmax=3000)

## Run water detection

Use the pre-trained model from HuggingFace Hub to detect surface water. The `timm_segmentation_from_hub` function automatically downloads the model and configuration, then runs sliding-window inference on the input scene.

**Model details:**
- **Architecture**: UNet++ with EfficientNet-B4 encoder
- **Training data**: Earth Surface Water Dataset (Sentinel-2)
- **Input**: 6-band Sentinel-2 (B2, B3, B4, B8, B11, B12)
- **Classes**: Background (0) and Water (1)

In [None]:
output_path = "s2_water_prediction.tif"

geoai.timm_segmentation_from_hub(
    input_path=image_path,
    output_path=output_path,
    repo_id="giswqs/s2-water-unetplusplus-efficientnet-b4",
    window_size=512,
    overlap=256,
    batch_size=4,
)

## Visualize raster mask

View the predicted water mask overlaid on the input imagery.

In [None]:
geoai.view_raster(
    output_path,
    nodata=0,
    basemap=image_path,
    opacity=0.5,
    backend="ipyleaflet",
)

## Compare with ground truth

Compare the model prediction against the ground truth annotation.

In [None]:
save_path = "s2_water_comparison.png"

fig = geoai.plot_prediction_comparison(
    original_image=image_path,
    prediction_image=output_path,
    ground_truth_image=truth_path,
    titles=["Sentinel-2 (False Color)", "Prediction", "Ground Truth"],
    figsize=(15, 5),
    save_path=save_path,
    show_plot=True,
    indexes=[5, 4, 3],
    divider=5000,
)

## Vectorize water mask

Convert the predicted raster mask to vector polygons representing water bodies.

In [None]:
output_vector_path = "s2_water_polygons.geojson"
gdf = geoai.raster_to_vector(
    raster_path=output_path,
    output_path=output_vector_path,
    min_area=100,
    simplify_tolerance=None,
)

## Smooth water body polygons

Smooth the vectorized polygons using the [smoothify](https://github.com/DPIRD-DMA/Smoothify) library to produce more natural-looking water body boundaries.

In [None]:
smoothed_path = "s2_water_smoothed.geojson"
gdf = geoai.smooth_vector(
    gdf,
    smooth_iterations=3,
    output_path=smoothed_path,
)

## Add geometric properties

Calculate geometric properties such as area and perimeter for each detected water body.

In [None]:
gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")
gdf_props.head()

## Filter small artifacts

Remove small detected regions that are unlikely to be actual water bodies.

In [None]:
gdf_filtered = gdf_props[gdf_props["area_m2"] > 100]
print(f"Water bodies detected: {len(gdf_filtered)}")
print(f"Removed {len(gdf_props) - len(gdf_filtered)} small artifacts")

## Visualize water body polygons

Display the detected water body polygons on an interactive map, colored by area.

In [None]:
geoai.view_vector_interactive(
    gdf_filtered,
    column="area_m2",
    tiles=image_path,
)

## Split map comparison

Create a side-by-side comparison between the detected water bodies and the original imagery.

In [None]:
geoai.create_split_map(
    left_layer=gdf_filtered,
    right_layer=image_path,
    left_args={"style": {"color": "blue", "fillOpacity": 0.3}},
    basemap=image_path,
)

## Water body area statistics

Analyze the distribution of water body sizes in the detected polygons.

In [None]:
print(gdf_filtered["area_m2"].describe())

In [None]:
gdf_filtered["area_m2"].hist(bins=50)
import matplotlib.pyplot as plt

plt.xlabel("Area (m\u00b2)")
plt.ylabel("Count")
plt.title("Distribution of Water Body Areas")
plt.show()

## Save results

Save the final water body polygons to a GeoJSON file.

In [None]:
gdf_filtered.to_file("s2_water_bodies_final.geojson", driver="GeoJSON")
print(f"Saved {len(gdf_filtered)} water body polygons to s2_water_bodies_final.geojson")

## Summary

This notebook demonstrated:

1. **Loading a pre-trained model** from HuggingFace Hub with a single function call
2. **Running water detection** on Sentinel-2 imagery using sliding-window inference
3. **Comparing predictions** against ground truth annotations
4. **Vectorizing results** into water body polygons
5. **Smoothing polygons** with smoothify for natural-looking boundaries
6. **Analyzing water bodies** with geometric properties and area statistics
7. **Visualizing results** with interactive maps and split-map comparisons

### Model Details

| Property | Value |
|----------|-------|
| Architecture | UNet++ |
| Encoder | EfficientNet-B4 |
| Training Data | Earth Surface Water Dataset |
| Input | 6-band Sentinel-2 (B2, B3, B4, B8, B11, B12) |
| Classes | Background (0), Water (1) |
| HuggingFace | [giswqs/s2-water-unetplusplus-efficientnet-b4](https://huggingface.co/giswqs/s2-water-unetplusplus-efficientnet-b4) |

### References

- Earth Surface Water Dataset: Luo, X. et al. (2021). An applicable and automatic method for earth surface water mapping based on multispectral images. *International Journal of Applied Earth Observation and Geoinformation*, 103, 102472. https://doi.org/10.1016/j.jag.2021.102472
- Dataset: https://zenodo.org/records/5205674