# Water classification with radar from Sentinel 1

In order to define the coastline in a satellite image, one needs to be able to distinguish land and water. While this can be done with optical data, it is also useful to look at how it can be done with radar data. This is because radar images are largely unaffected by weather and cloud cover, so can provide a more reliable image.

In this notebook, you'll explore how to distinguish land and water in Sentinel 1 images using a series of commands from the Open Data Cube, as well as some self-defined functions.

As you work through the notebook you will:
1. Pick a study area along the coast.
1. Explore available data products and load Sentinel 1 data.
1. Visualize the returned data.
1. Perform pre-processing steps on the Sentinel 1 VV and VH bands.
1. Design a classifier to distinguish land and water.
1. Apply the classifier to the study area and interpret the results.
1. Investigate how to identify change in the coastline.

Let's get started.

## Picking the study area

The example we've selected looks at part of the the coastline of Melville Island, which sits off the coast of the Northen Territory, Australia. The study area also contains an additional small island, which will be useful for assessing how well radar data distinguishes between land and water.

Run the following two cells to set the latitude and logitude range, and then view the area.

In [None]:
latitude = (-11.287611, -11.085876)
longitude = (130.324262, 130.452652)

In [None]:
from utils.display import display_map
display_map(latitude = latitude, longitude = longitude)

## Loading available data

Before loading the data, we'll need to import the Open Data Cube library and load the `Datacube` class.

In [None]:
import datacube

dc = datacube.Datacube(app = 'sentinel-1-water-classifier')

When working with the Open Data Cube, it's important to check which products that are available. You can do this with the `list_products()` function provided as part of the `Datacube` class. Run the cell and identify the available Sentinel 1 products, which should contain 's1' in the name. You can scroll across the table for additional information about each available product.

In [None]:
dc.list_products()

### Specify product information

Before loading the data, you'll need to specify which product you want to load. You should have found one Sentinel 1 product in the product list. In the next cell, replace `product_name` with the name you found in the available products list. You'll need to keep the quotation marks.

In [None]:
product_information = dict(product = "product_name",
                           output_crs = "EPSG:4326",
                           resolution = (0.00013557119,0.00013557119))

In [None]:
# Answer -- remove later. Keep for running purposes.

product_information = dict(product = "s1_gamma0_geotif_scene",
                           output_crs = "EPSG:4326",
                           resolution = (0.00013557119,0.00013557119))

### Specify latitude and longitude information

We can specify the latitude and longitude bounds of our area using the variables we defined earlier in the notebook.

In [None]:
area_information = dict(latitude = latitude,
                        longitude = longitude) 

### Load Data

You might have noticed that we defined the product and area information a little differently than we did in other notebooks. Above, we specified the information in two dictonaries, which the `dc.load()` function can access by including `**` before the name of each dictionary, as demonstrated in the next cell.

*Note that the load command will return an error if you have provided an incorrect product name in the* `product_information` *dictionary. If you see such an error, check that you correctly specified the name of the Sentinel 1 data product.*

In [None]:
dataset = dc.load(**product_information, **area_information)

If the load was sucessful, running the next cell should return the `xarray` summary of the dataset. Make a note of dimensions and data variables, as you'll need these variables during the data preperation and analysis.

In [None]:
dataset

## Visualize loaded data

Sentinel 1 data has two observations, *VV* and *VH*, which correspond to the polarisation of the light sent and received by the satellite. *VV* refers to the satellite sending out vertically-polarised light and receiving vertically-polarised light back, whereas *VH* refers to the satellite sending out vertically-polarised light and receiving horizontally-polarised light back. These two bands can tell us different information about the area we're studying. 

Before running any plotting commands, we'll load the *matplotlib* library in the cell below, along with the *numpy* library. We'll also make use of the in-built plotting functions from *xarray*.

*Note that we take the base-10 logarithm of the bands before plotting them such that we work in units of decibels (dB) rather than digital number (DN)*

In [None]:
import matplotlib.pyplot as plt
import numpy as np

### Visualize VH bands

In [None]:
# Plot all VH observations for the year 

converted_vh = np.log10(dataset.vh)  # Scale to plot data in decibels

converted_vh.plot(cmap="Blues", col="time", col_wrap=5)
plt.show()

In [None]:
# Plot the average of all VH observations

mean_converted_vh = converted_vh.mean(dim = "time")

fig = plt.figure(figsize=(7,9))
mean_converted_vh.plot(cmap = "Blues")
plt.title("Average VH")
plt.show()

What key differences do you notice between each individual observation and the mean?

### Visualize VV bands  

We've provided two empty cells for you to perform the same analysis as above, but now for the *VV* band. Try and type the code out -- it will help you get better at using the Open Data Cube library!

In [None]:
# Plot all VV observations for the year



In [None]:
# Answer -- remove later. Keep for running purposes.

converted_vv = np.log10(dataset.vv)  # Scale to plot data in decibels

converted_vv.plot(cmap="Blues", col="time", col_wrap=5)
plt.show()

In [None]:
# Plot the average of all VV observations



In [None]:
# Answer -- remove later. Keep for running purposes.

mean_converted_vv = converted_vv.mean(dim = "time")

fig = plt.figure(figsize=(7,9))
mean_converted_vv.plot(cmap = "Blues")
plt.title("Average VV")
plt.show()

What key differences do you notice between each individual observation and the mean? What about differences between the average *VH* and *VV* bands?

Take a look back at the map image to remind yourself of the shape of the land and water of our study area. In both bands, what distinguishes the land and the water?

## Preprocessing the data through filtering

### Speckle Filtering using Lee Filter

You may have noticed that the water in the individual *VV* and *VH* images isn't a consistent colour. The distortion you're seeing is a type of noise known as speckle, which gives the images a grainy appearence. If we want to be able to easily decide whether any particular pixel is water or land, we need to reduce the chance of misinterpreting a water pixel as a land pixel due to the noise.

Speckle can be removed through filtering. If interested, you can find a technical introduction to speckle filtering [here](https://earth.esa.int/documents/653194/656796/Speckle_Filtering.pdf). For now, it is enough to know that we can filter the data using the python function defined in the next cell:

In [None]:
# Adapted from https://stackoverflow.com/questions/39785970/speckle-lee-filter-in-python

from scipy.ndimage.filters import uniform_filter
from scipy.ndimage.measurements import variance

def lee_filter(da, size):
    img = da.values
    img_mean = uniform_filter(img, (size, size))
    img_sqr_mean = uniform_filter(img**2, (size, size))
    img_variance = img_sqr_mean - img_mean**2

    overall_variance = variance(img)

    img_weights = img_variance / (img_variance + overall_variance)
    img_output = img_mean + img_weights * (img - img_mean)
    return img_output

Now that we've defined the filter, we can run it on the *VV* and *VH* data. You might have noticed that the function takes a `size` argument. This will change how blurred the image becomes after smoothing.

In [None]:
# Set any null values to 0 before applying the filter to prevent issues
dataset_zero_filled = dataset.where(~dataset.isnull(), 0)

# Create a new entry in dataset corresponding to filtered VV and VH data
dataset["filtered_vv"] = dataset_zero_filled.vv.groupby('time').apply(lee_filter, size=7)
dataset["filtered_vh"] = dataset_zero_filled.vh.groupby('time').apply(lee_filter, size=7)

### Visualize Filtered VH bands

In [None]:
# Plot all filtered VH observations for the year 

converted_filtered_vh = np.log10(dataset.filtered_vh)  # Scale to plot data in decibels

converted_filtered_vh.plot(cmap="Blues", col="time", col_wrap=5)
plt.show()

In [None]:
# Plot the average of all filtered VH observations

mean_converted_filtered_vh = converted_filtered_vh.mean(dim = "time")

fig = plt.figure(figsize=(7,9))
mean_converted_filtered_vh.plot(cmap = "Blues")
plt.title("Average filtered VH")
plt.show()

### Visualize Filtered VV bands

In [None]:
# Plot all filtered VV observations for the year



In [None]:
# Answer -- remove later. Keep for running purposes.

converted_filtered_vv = np.log10(dataset.filtered_vv)  # Scale to plot data in decibels

converted_filtered_vv.plot(cmap="Blues", col="time", col_wrap=5)
plt.show()

In [None]:
# Plot the average of all filtered VH observations



In [None]:
# Answer -- remove later. Keep for running purposes.

mean_converted_filtered_vv = converted_filtered_vv.mean(dim = "time")

fig = plt.figure(figsize=(7,9))
mean_converted_filtered_vv.plot(cmap = "Blues")
plt.title("Average filtered VV")
plt.show()

Now that you've finished filtering the data, compare the plots before and after and you should be able to notice the impact of the filtering. If you're having trouble spotting it, it's more noticable in the VH band. 

### Observing VV and VH histograms

Another way to observe the impact of filtering is to view histograms of the pixel values before and after filtering.

In [None]:
fig = plt.figure(figsize = (15,3))
_ = np.log10(dataset.filtered_vv).plot.hist(bins = 1000, label = "VV filtered")
_ = np.log10(dataset.vv).plot.hist(bins = 1000, label = "VV", alpha = .5)
plt.legend()
plt.title("Comparison of filtered VV bands to original") 
plt.show()

In [None]:
fig = plt.figure(figsize = (15,3))
_ = np.log10(dataset.filtered_vh).plot.hist(bins = 1000, label = "VH filtered")
_ = np.log10(dataset.vh).plot.hist(bins = 1000, label = "VH", alpha = .5)
plt.legend()
plt.title("Comparison of filtered VH bands to original")
plt.show()

# Designing a threshold based water classifier

A 2d visualization of imagery alone, suggests a stark contrast in `land` and `water` values.    
The visualization of the fitlered S1 data highlights a clear bimodal distribution on the `filtered VH` domain.   

In this section, a classifier is built based on a static threshold on `filtered_vh` values.  

$$ threshold = -2.0 $$

In [None]:
threshold = -2.0

The classifier separates data into two classes, data above, and data below the threshold. An assumption is made that values of both segments correspond to the same `water` and `not water` distinctions we make visually.  


<br>  

$$  water(Dataset) = \left\{
     \begin{array}{lr}
       True & :   Dataset_{VH} \le threshold\\
       False & :   Dataset_{VH} > threshold
     \end{array}
   \right.\\ $$  

<br>


### Visualize threshold

In [None]:
fig = plt.figure(figsize = (15,3))
plt.axvline(x=-2, label='Threshold at {}'.format(threshold), color = "red")
_ = np.log10(dataset.filtered_vh).plot.hist(bins = 1000, label = "VH filtered")
_ = np.log10(dataset.vh).plot.hist(bins = 1000, label = "VH", alpha = .5)
plt.legend()
plt.title("Histogram Comparison of filtered VH bands to original") 

In [None]:
fig, ax = plt.subplots(figsize = (15,3))
_ = np.log10(dataset.filtered_vh).plot.hist(bins = 1000, label = "VH filtered")
ax.axvspan(xmin=-2,xmax = -.5, alpha=0.25, color='red', label = "Not Water")
ax.axvspan(xmin=-3.5,xmax = -2, alpha=0.25, color='green', label = "Water")
plt.legend()
plt.title("Comparison of filtered VH bands to original") 

# Coding the classifier

In [None]:
import numpy as np
import xarray as xr 

def s1_water_classifier(ds:xr.Dataset, threshold = -2) -> xr.Dataset:
    assert "vh" in ds.data_vars, "This classifier is expecting a variable named `vh` expressed in DN, not DB values"
    filtered = ds.vh.groupby('time').apply(lee_filter, size=7)
    water_data_array = np.log10(filtered) < threshold
    return water_data_array.to_dataset(name = "s1_wofs")

# Running the classifier

In [None]:
dataset["s1_wofs"] = s1_water_classifier(dataset).s1_wofs

# Validation

### Water Classification Frequency

In [None]:
plt.figure(figsize = (15,12))
dataset.s1_wofs.mean(dim = "time").plot(cmap = "jet_r")

> #### Interpretation and Ideas: 

- There exists fairly consistent classifications inland and off the coasts.  
- The coastline in not consitently water.
- Check Variance

### Water Classification Standard Deviation

In [None]:
plt.figure(figsize = (15,12))
dataset.s1_wofs.std(dim = "time").plot(cmap = "jet")

> #### Interpretation and Ideas: 

- variance can capture long term trends like coastal erosion or degredation, but may also capture noise.  
  take, for example an alternating sequence of classifications $ts_1 = [0,1,0,1,...,0,1, 0, 1]$ and the sequence $ts_2 = [0,0,0,0,...,1,1,1,1]$    
  It's safe to assume that $var(ts_1) == var(ts_2)$ despite the fact that one might be frequent alternating changes in state of water, while the later might be lasting transition. 
  
- The coastline is not always consitently water

# Detecting Coastal Change 

### Simple Differencing Approach

In [None]:
t1 = 0
t2 = 26

In [None]:
change = dataset.s1_wofs.isel(time = t1) - dataset.s1_wofs.isel(time = t2)
change = change.where(change != 0) 
dataset["change"] = change

In [None]:
plt.figure(figsize = (15,12))
dataset.filtered_vh.mean(dim = "time").plot(cmap = "Blues")
dataset.change.plot(cmap = "jet", levels = 2) 

# Auto Correlation

In [None]:
def rtk(ts:np.array, k = 1):
    a = np.append(np.array(ts).copy(),
                  np.zeros(k))
    
    b = np.append(np.zeros(k),
                  np.array(ts).copy())
    
    auto = (a * b)[k:-k]
    return np.mean(auto)

In [None]:
auto_correlation_ds = xr.DataArray(auto_correlation, dims = dict((k, dataset[k].values) for k in ('latitude', 'longitude')))

In [None]:
auto_correlation = np.apply_along_axis(rtk,0,dataset.s1_wofs)

In [None]:
auto_correlation_ds = xr.DataArray(auto_correlation, dims = dict((k, dataset[k].values) for k in ('latitude', 'longitude')))

In [None]:
freq = dataset.s1_wofs.mean(dim = "time")
varying_pixels = np.logical_and(freq != 0, freq != 1) 

In [None]:
fig = plt.figure(figsize = (15,3))
_ = auto_correlation_ds.where(varying_pixels).plot.hist(bins = 256)
plt.title("Histogram of autocorrelation") 


In [None]:
plt.figure(figsize = (15,12))
dataset.filtered_vh.mean(dim = "time").plot(cmap = "Blues")
dataset.change.where(auto_correlation_ds > 0.8).plot(cmap = "jet", levels = 2)