# Water Observations using Landsat Timeseries

## Background
Geoscience Australia's Water Observations from Space (WOfS) classifier is a decision tree that transforms individual multispectral Landsat observations into a surface-water occurrence map. 

## Description
In this example, we apply the WOfS classifier to the USGS Landsat Collection 2 to map surface-water occurrence and long-term trends.

### Packages and functions
Import Python packages that are used for the analysis.

In [None]:
# only importing required packages

from datacube import Datacube
from odc.geo.geom import point

#from odc.algo import mask_cleanup

# if dask is used
#import dask.config
#from dask.distributed import Client, LocalCluster

# for landsat data access
import os
from odc.stac import configure_s3_access
from piksel.utils import patch_usgs_landsat

# for wofs on landsat
import numpy as np
from wofs.classifier import classify
from wofs.constants import NO_DATA
from wofs.filters import c2_filter, eo_filter, terrain_filter
from wofs.virtualproduct import scale_and_clip_dataarray
from xarray import DataArray

# for DSM
from pystac_client import Client
from odc.stac import load
from planetary_computer import sign_url

In [None]:
from collections import Counter

def mostcommon_crs(datasets):
    crs_counts = Counter(dataset.metadata_doc["crs"] for dataset in datasets)
    return crs_counts.most_common(1)[0][0]

## Select a study site

Keep the area small for long time series analysis

In [None]:
start_date = "2025-01"
end_date = "2025-12"

coords = -4.02, 120.02  # Lake Tempe

aoi_point = point(coords[1], coords[0], crs="EPSG:4326")
area = aoi_point.buffer(0.04).boundingbox

area.explore()

## Load satellite data

Load spectral bands and data quality measurements relevant for WOfS.

In [None]:
dc = Datacube()

In [None]:
# bands have to be supplied in order
wofs_bands = ["blue", "green", "red", "nir08", "swir16", "swir22"]

In [None]:
# define sensor and resolution
product = ["ls5_c2l2_sr", "ls7_c2l2_sr", "ls8_c2l2_sr", "ls9_c2l2_sr"]
resolution = 30
qa_band = "qa_pixel"

In [None]:
# Find datasets
datasets = dc.find_datasets(
    product=product,
    time=(start_date, end_date),
    longitude=(area.left, area.right),
    latitude=(area.bottom, area.top),
    # only use high-quality imagery for testing
    landsat_collection_category='T1',
    cloud_cover = (0, 60),
)

crs = mostcommon_crs(datasets)

print(f"Found {len(datasets)} datasets")
print(f"Most common CRS is {crs}")

data = dc.load(
    datasets=datasets,
    longitude=(area.left, area.right),
    latitude=(area.bottom, area.top),
    resolution=resolution,
    output_crs=crs,
    measurements=wofs_bands + [qa_band],
    group_by="solar_day",
    dask_chunks={"time": 1, "x": 500, "y": 500},
    resampling={
        "*": "bilinear",
        qa_band: "nearest",
    },
    patch_url=patch_usgs_landsat,
    driver="rio",
)

In [None]:
#data=data.compute()
data

## Digital Elevation Model for terrain masking

In areas with steep terrain, slope masking would be helpful, as water is unlikely to occur on steep slopes.

In [None]:
# DSM for terrain masking
mspc_catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"

# Get a STAC client
client = Client.open(mspc_catalog)

# Load elevation
dem_items = list(
    client.search(
        collections=["cop-dem-glo-30"],
        intersects = data.odc.geobox.geographic_extent,
    ).items()
)

dem = load(dem_items, like=data, patch_url=sign_url).squeeze()

dsm = dem.rename({"data": "elevation"})

In [None]:
#dsm.elevation.plot.imshow()

## Water classification with masking

We will use the the full suite of functions from `wofs` to classify water and mask for invalid and unreliable observations.

In [None]:
# define functions to apply for wofs

def scale_data(data, scale_factor=1, add_offset=0):
    """Apply scaling and offset"""
    attrs = data.attrs
    data =  data.apply(scale_and_clip_dataarray, keep_attrs=False,
                       scale_factor=scale_factor, add_offset=add_offset,
                       clip_range=(0, 10000), valid_range=None)
    data.attrs = attrs
    return data

def scale_data_ls(data):
    return scale_data(data, scale_factor=0.275, add_offset=-2000)
    

def ls_wofs(in_data: DataArray, dsm: DataArray):
    """
    Classify Landsat observations
    """
    data_scaled = scale_data_ls(in_data[wofs_bands])

    if 'time' in data_scaled.dims.keys():
        data_scaled = data_scaled.squeeze("time", drop=True)

    # Run the classifier
    wofs_raw = classify(data_scaled.to_array())

    # Set up attrs on the DSM
    dsm.attrs["crs"] = in_data.odc.crs

    # Set up the masks and combine them
    no_data = in_data.red == 0

    wofs_masked = (
        wofs_raw
        | eo_filter(data_scaled)
        | terrain_filter(dsm, in_data, no_data=0, ignore_dsm_no_data=True)
        | c2_filter(in_data.qa_pixel)
        | no_data
    )

    # Messy little thing to make any nodata areas ONLY nodata
    nodata_set = np.bitwise_and(wofs_masked, NO_DATA) == NO_DATA

    # If we don't specifically set the dtype in the following line,
    # dask arrays explode to int64s. Make sure it stays a uint8!
    wofs_masked.data[nodata_set] = np.array(NO_DATA, dtype="uint8")

    return wofs_masked

In [None]:
wofs_masked = data.groupby("time").apply(ls_wofs)

## Interpreting WOfS bit masks

## All-time and annual statistics

## Further exploration

* Are there other areas where this analysis could be applied?

* What additional types of analysis could help support your use case?

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 