# Data wrangling: raster data 

This lab will demonstrate a range of techniques for wrangling (transforming) raster data. This will include techniques to tidy, check, and visualise data; data subsetting; and various operations to transform image pixel values or summarise raster datasets. 

Here, you will work with two Sentinel-2 satellite images captured before and after <a href="https://emergency.copernicus.eu/mapping/list-of-components/EMSR489" target="_blank">Tropical Cyclone Yasa struck Fiji in December 2020</a>. Tropical Cyclone Yasa made landfall in Fiji on 17-18 December with heavy rain and storm surges causing flooding. You will convert these images into datasets that reflect the presence of water and moisture on the land surface and conduct a change detection exercise comparing pre and post event images to estimate the area of cyclone induced flooding. Here, we'll be focusing on flood impacts on croplands surrounding Labasa on the island of Viti Levu. 

![](https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-4-disaster-charter-sentinel-1-labasa.jpg)


## Setup

### Run the labs

You can run the labs locally on your machine or you can use cloud environments provided by Google Colab. **If you're working with Google Colab be aware that your sessions are temporary and you'll need to take care to save, backup, and download your work.**

<a href="https://colab.research.google.com/github/geog3300-agri3003/coursebook/blob/main/docs/notebooks/week-4_1.ipynb" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Download data

If you need to download the data for this lab, run the following code snippet. 

In [None]:
import os
import subprocess

if "data_lab-4_1" not in os.listdir(os.getcwd()):
    subprocess.run('wget "https://github.com/geog3300-agri3003/lab-data/raw/main/data_lab-4_1.zip"', shell=True, capture_output=True, text=True)
    subprocess.run('unzip "data_lab-4_1.zip"', shell=True, capture_output=True, text=True)
    if "data_lab-4_1" not in os.listdir(os.getcwd()):
        print("Has a directory called data_lab-4_1 been downloaded and placed in your working directory? If not, try re-executing this code chunk")
    else:
        print("Data download OK")

### Install packages

If you're working in Google Colab, you'll need to install the required packages that don't come with the colab environment.

In [None]:
if 'google.colab' in str(get_ipython()):
    !pip install xarray[complete]
    !pip install rioxarray
    !pip install mapclassify
    !pip install rasterio

## What is data wrangling?

<a href="https://r4ds.had.co.nz/wrangle-intro.html" target="_blank">Wickham and Grolemund (2017)</a> and <a href="https://wesmckinney.com/book/" target="_blank">McKinney (2022)</a> state that data wrangling consists of data import, data cleaning, and data transformation. 

#### Data import

Data import was covered in week 3 with examples of how to read tabular, vector, and raster data into Python programs. 

#### Data cleaning

Data cleaning includes handling outliers and missing data. Here, we'll cloud mask Sentinel-2 remote sensing images, which is a data cleaning exercise. 

#### Data transformation

<a href="https://wesmckinney.com/book/" target="_blank">McKinney (2022)</a> define data transformation as mathematical or statistical operations applied to data to generate new datasets. Data transformation can also include operations that reshape datasets or combine two or more datasets.

<details>
    <summary><b>Detailed notes on data transformation for spatial and non-spatial data</b></summary>
<p></p>
As we're working with spatial and non-spatial data we can categorise data transformation operations as attribute operations, spatial operations, geometry operations, and raster-vector  operations (<a href="https://geocompr.robinlovelace.net/index.html" target="_blank">Lovelace et al. (2022)</a>).
<p></p>
    
**Attribute operations** are applied to non-spatial (attribute data). This could be a tabular dataset without any spatial information, the attribute table of a vector dataset, or the pixel values of a raster dataset. Common attribute operations include:

* Selecting columns from a table based on a condition. 
* Selecting (subsetting) pixels from a raster based on a condition.
* Filtering rows from a table based on a condition. 
* Creating a new column of values using a function applied to existing data.
* Computing summary statistics of columns in a table or of pixel values in a raster.
* Joining datasets based on matching values in columns (keys).

**Spatial operations** transform data using the data's geographic information including shape and location. Vector spatial operations include:

* Spatial subsetting by selecting data points based on a geographic condition (e.g. selecting all fields in Western Australia).
* Spatial joins where datasets are combined based on their relationship in space. 
* Spatial aggregation where summaries are produced for regions (e.g. the average crop yield for all fields in a region).

Spatial operations on raster data are based on map algebra concepts and include:

* Local operations which are applied on a pixel by pixel basis (e.g. converting a raster of temperature values in °F to °C).
* Focal operations which summarise or transform a raster value using the values of neihbouring pixels (e.g. computing the average value within a 3 x 3 pixel moving window).
* Zonal operations which summarise or transform raster values using values inside an irregular shaped zone.
* Global operations which summarise the entire raster (e.g. computing the minimum value in the raster dataset). 

**Geometry operations** transform a dataset's geographic information. Common geometry operations for vector data include:

* Simplification of shapes.
* Computing the centroid of polygons.
* Clipping (subsetting) of geometries based on their intersection or relationship with another geometry. 

and geometry operations on raster data typically involve changing the spatial resolution and include:

* Aggregation or dissagregation.
* Resampling.

**Raster-vector operations** involve both raster and vector datasets and include:

* Cropping or masking raster data using a vector geometry.
* Extracting raster values that intersect with a vector geometry.
* Rasterisation where a vector dataset is transformed to a raster layer.
* Vectorisation where a raster dataset is transformed to a vector layer.
</details>

<p></p>

In this lab we're focusing on data transformation operations applied to raster data. Raster data breaks the Earth's surface up into a grid of cells (pixels). Each pixel is assigned a value that corresponds to the geographic feature or phenomenon of interest. In particular, we'll be working with remote sensing images where each pixel in a two dimensional raster stores a reflectance value (i.e. how much incoming light was reflected off the portion of the Earth's land surface that the pixel represents). 

To complete the task of creating a map of flooding following Tropical Cyclone Yasa, we will apply a range of subsetting, map algebra, and geometry operations to transform mutlispectral Sentinel-2 remote sensing images into a flood map.  

### Import modules

In [None]:
import os
import time

import rioxarray as rxr
import xarray as xr
import plotly.express as px
import numpy as np
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors
import stackstac
from collections import OrderedDict

import plotly.io as pio

# setup renderer
if 'google.colab' in str(get_ipython()):
    pio.renderers.default = "colab"
else:
    pio.renderers.default = "jupyterlab"

In [None]:
data_path = os.path.join(os.getcwd(), "data_lab-4_1")

### Lab data

Let's quickly inspect the files that we have downloaded for this lab.

In [None]:
os.listdir(data_path)

* `s2_tc_yasa_pre_event.tif` is a four band Sentinel-2 image from 25 October 2020. The four bands are blue, green, red, and near infrared (NIR).
* `s2_tc_yasa_pre_event_cloud_probability.tif` is a cloud probability raster for the Sentinel-2 image on the 25 October 2020. Each pixel has a value between 0 and 100 indicating the probability of that pixel being cloud covered.
* `s2_tc_yasa_post_event.tif` is a four band Sentinel-2 image from 19 December 2020. The four bands are blue, green, red, and near infrared (NIR).
* `s2_tc_yasa_post_event_cloud_probability.tif` is a cloud probability raster for the Sentinel-2 image on the 19 December 2020. Each pixel has a value between 0 and 100 indicating the probability of that pixel being cloud covered.

The cloud probability rasters are generated using <a href="https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_CLOUD_PROBABILITY#description" target="_blank">Sentinel Hub's sentinel2-cloud-detector machine learning algorithm</a>.

## Xarray refresher

<a href="" target="_blank">Xarray</a> `DataArray` data structures are objects that store multidimensional arrays of raster values and also store metadata information that describe the raster values. `xarray.DataArray` objects have the following properties:

* `values`: the multidimensional array of raster values
* `dims`: a list of names for the dimensions of the array (e.g. instead of axis 0 describing the 0th (row) dimension of an array, that dimension can have a descriptive label such as longitude)
* `coordinates`: a `list` of array-like objects that describe the location of an array element along that dimension (e.g. a 1D array of longitude values describing the location on the Earth's surface for each row in the array)
* `attrs`: a `dict` of metadata attributes describing the dataset

`xarray.DataArray` objects can be stored within a larger container called an `xarray.Dataset`. An `xarray.Dataset` can store many `xarray.DataArray` objects that share `dims` and `coordinates`. This is useful if you have arrays of different `Variables` that correspond to the same locations and time-periods (e.g. you could have a separate array for temperature and precipitation values organised within a single `xarray.Dataset`).

![](https://docs.xarray.dev/en/stable/_images/dataset-diagram.png)

*Schematic of an xarray.Dataset (source: xarray Getting Started)*

### Data input

The `rioxarray` package provides tools for reading and writing raster geospatial data files into `xarray.DataArray` objects.

Let's pass the path to a GeoTIFF file of the pre Tropical Yasa Sentinel-2 image into the `rioxarray` `open_rasterio()` function:

In [None]:
pre_tc_yasa_s2_path = os.path.join(data_path, "s2_tc_yasa_pre_event.tif")
pre_s2 = rxr.open_rasterio(pre_tc_yasa_s2_path)

In [None]:
pre_s2

## Subsetting data

<a href="https://docs.xarray.dev/en/latest/user-guide/indexing.html#indexing-and-selecting-data" target="_blank">Subsetting data (or selecting data)</a> refers to operations that extract a subset of data from a larger dataset. Subsetting operations can be spatial or non-spatial. 

**Spatial subsetting** operations select data based on their location in space (e.g. extracting all paddocks that intersect with a farm boundary, or subsetting all remote sensing image pixels within a geographic region). 

**Non-spatial subsetting** operations select data based upon their attribute values or another non-spatial condition. For example, we could subset all values in a column of temperature measurements where the temperature was greater than 30 °C or all temperature measurements that match a date range. We can also subset data based on their position within a dataset (e.g. select the first 10 rows of a DataFrame or columns by column name). Subsetting data based on their position within the dataset is often referred to as indexing. 

`xarray.DataArray` objects support label-based and positional subsetting:

* With positional subsetting, we can access `xarray.DataArray` values by selecting them using `[]` and index positions. For example, `demo_ds[0:4, 0:3, 0:3]` would select the first four bands (along the 0th dimension), first three rows (along the 1st dimension), and first three columns of array (along the 2nd dimension) of the array.
* With label-based subsetting we can use the `sel()` method to select elements from the array using dimension names. For example, `demo_ds.sel(time="2020-01-31, bands=[1, 2])` would select array values that correspond to the the 31st January 2020 and have band labels 1 and 2. 

**It is recommended that you read the xarray guide on <a href="https://docs.xarray.dev/en/latest/user-guide/indexing.html#indexing-and-selecting-data" target="_blank">Subsetting data (or selecting data)</a>.**

To visualise the pre Tropical Cyclone Yasa Sentinel-2 image as an RGB image on our display, we need to select the red, green, and blue bands. This is a good use case for the `sel()` method. 

In [None]:
pre_s2.sel(band=[3, 2, 1]).plot.imshow(vmin=0, vmax=2000, add_labels=False, aspect=3, size=4)

#### Recap quiz

**Can you visualise the `pre_s2` image as a false colour composite where near infrared is rendered as red, red is rendered as green, and green is rendered as blue on the display?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
# Data inspection
pre_s2.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=2000, add_labels=False, aspect=3, size=4)
```
</details>

As discussed in previous labs, one of the nice features of `xarray.DataArray` objects is the ability to store information about the array data as dimensions with associated coordinates. Let's add a time dimension to store the date of image capture. This is important as our workflow involves change detection, so it's useful to keep track of which array corresponds to remote sensing images before the Tropical Cyclone event. 

In [None]:
# add time dimension and coords
pre_s2 = pre_s2.expand_dims(dim={"time": [pd.to_datetime("2020-10-25")]}, axis=0)
pre_s2

## Map algebra

Following <a href="https://geocompr.robinlovelace.net/index.html" target="_blank">Lovelace et al. (2022)</a>, we refer to map algebra as operations that transform raster pixel values via statistical or mathematical operations which can involve combining pixel values from different raster layers or using neighbouring raster values. 

<a href="https://geocompr.robinlovelace.net/index.html" target="_blank">Lovelace et al. (2022)</a> outline four different types of map algebra operations:

### Local operations

**Local** map algebra operations operate on a pixel by pixel basis; the mathematical operation is applied independently to each pixel without reference to neighbouring pixel values. For example, addition, subtraction, multiplication, and logical operations can all be applied on a pixel by pixel basis. 

Commonly used local operations when working with remote sensing data are computing spectral indices or masking out cloudy pixels. Spectral indices are pixel by pixel mathematical combinations of spectral reflectance in different wavelengths that are used to monitor vegetation or land surface conditions. Read <a href="https://doi.org/10.1038/s43017-022-00298-5" target="_blank">Zeng et al. (2022)</a> for a review of vegetation indices.

The normalised difference vegetation index (NDVI) is used for tracking vegetation condition and representing the greenness of vegetation in a remote sensing image. 

The NDVI is computed as:

$NDVI=\frac{NIR-red}{NIR+red}$

Thus, the NDVI is computed via division, subtraction, and addition operations computed on a pixel by pixel basis using raster data corresponding to red and near infrared reflectance. 

### Focal / neighbourhood operations

Focal operations update a pixel's value using a combination of values from a regular shaped neighbourhood centred on the focal pixel (e.g. 3 x 3 or 5 x 5 pixel neighbourhood). An example of focal operations are dilation operations, which assign the focal pixel the maximum value found within the neighbourhood. This is often used to extend the coverage of cloud masks to conservatively remove thin clouds / poor atmospheric conditions in remote sensing images. 

### Zonal operations

Zonal operations summarise raster values in a target layer using a categorical zones raster layer to identify zones. For example, if we have a raster layer of NDVI values and a zones layer where each pixel value represents a crop type. A zonal operation computes the mean NDVI for for each crop type. 

### Global operations

Global operations compute summary statistics for an entire raster layer. For example, we could have a raster layer indicating the presence of forest or non-forest areas. Counting the number of forest pixels and multiplying the count by the pixel area creates global summary statistic of the area of forest cover. 

## Local operations

As mentioned above, local operations perform mathematical operations on arrays on a pixel by pixel basis. To identify and compute the area of cropland flooded by Tropical Cyclone Yasa, we will need to perform a range of local operations:

1) Convert a cloud probability raster into a binary cloud mask (i.e. set all pixels to `True` where cloud probability is less than 50%).
2) Use the cloud mask to mask all cloudy pixels in Sentinel-2 multispectral satellite images (i.e. set all pixels to no data where the cloud mask is `False` - cloudy pixels).
3) Compute a spectral index, the normalised difference water index (NDWI), using bands of green and near infrared reflectance to highlight water and moisture presence.
4) Compute a difference image using the before and after Tropical Cyclone Yasa NDWI images to identify locations of change which are indicative of flood impacts.
5) Threshold the difference image to identify pixels where a large change in NDWI occurred to represent an estimate of flood extent.
6) Mask out non-cropland pixels using a land cover map to return an array of flooded cropland pixels.

An example of a local operation is dividing each pixel value by 10000 (or multiply by 0.0001). This is necessary because the data is spectral reflectance so should have a value between 0 and 1 (i.e. the ratio of incoming light to reflected light off the land surface in a spectral band). If you look at the <a href="https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED#bands" target="_blank">information page for the Sentinel-2 remote sensing data</a> we are using here, you can see it has a scale factor of 0.0001. This means the reflectance data has been scaled by multiplying it by 10000 so it can be stored as integer type. We need to convert it back to reflectance units. 

In [None]:
pre_s2 = pre_s2 / 10000
pre_s2

### Cloud masking

As you can see above, there are clouds obscuring parts of the land surface in the pre Tropical Cyclone Yasa Sentinel-2 image. We'll need to mask out those clouds. Let's start by loading a cloud probablity raster layer where each pixel value is a cloud probability score for the Sentinel-2 image taken before Tropical Cyclone Yasa. 

In [None]:
pre_s2_cloud_prob_path = os.path.join(data_path, "s2_tc_yasa_pre_event_cloud_probability.tif")
pre_s2_cloud_prob = rxr.open_rasterio(pre_s2_cloud_prob_path)
pre_s2_cloud_prob

#### Recap quiz

**Can you use the `plot.imshow()` method of an `xarray.DataArray` object to visualise the cloud probability raster?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
pre_s2_cloud_prob.sel(band=1).plot.imshow(aspect=3, size=4)
```
</details>

Next, we need to convert the `pre_s2_cloud_prob` `xarray.DataArray` of cloud probability scores (i.e. values between 0 and 100) to a binary array where pixels have the value of `True` if they're not cloudy. We can use a cloud probability threshold of 50, where a value less than 50 indicates no cloud. This involves applying a less than (`<`) operation to every pixel in `pre_s2_cloud_prob`.

In [None]:
pre_s2_cloud_mask = pre_s2_cloud_prob < 50
pre_s2_cloud_mask

We can plot the cloud mask `pre_s2_cloud_mask` to see which pixels are clear and which are cloudy. Note, that `True` values will be rendered as 1 and `False` as 0. 

In [None]:
pre_s2_cloud_mask.sel(band=1).plot.imshow(aspect=3, size=4)

Now use the cloud mask to set all pixels in the `pre_S2` `xarray.DataArray` of Sentinel-2 multispectral data to NaN (not a number - a no data indicator). We can use the <a href="https://docs.xarray.dev/en/stable/user-guide/indexing.html#masking-with-where" target="_blank">`where()` method</a> of `xarray.DataArray` objects to do this. 

The `where()` method takes in an array of bool type values (`True`, `False`) and sets all pixels to NaN (no data) where the value passed into `where()` is `False`. Or, `where()` can take a conditional or comparison statement that evaluates to `True` or `False`.

Let's apply the cloud mask to `pre_s2`. 

In [None]:
pre_s2_cm = pre_s2.where(pre_s2_cloud_mask.sel(band=1))

We also need to update the metadata for the `pre_s2` `xarray.DataArray` object to identify the no data value. This is important for when we save the array data to a file (e.g. GeoTIFF file) to keep a record of which pixels have no data. We can do this using the `rio` accessor from <a href="https://corteva.github.io/rioxarray/stable/getting_started/nodata_management.html" target="_blank">rioxarray</a>.

In [None]:
pre_s2_cm.rio.write_nodata(np.nan, encoded=True, inplace=True)
# check no data value has been set
print(f"nodata: {pre_s2_cm.rio.nodata}")
print(f"encoded_nodata: {pre_s2_cm.rio.encoded_nodata}")
pre_s2_cm

#### Recap quiz

**How could you check that the cloud mask has been successfully applied?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

You could print <code>pre_s2_cm</code> and inspect the array values to see if any have been converted to NaN. Or, better, you could plot the RGB image and see if the clouds have been masked out.

```python
pre_s2_cm.sel(time="2020-10-25", band=[3, 2, 1]).plot.imshow(vmin=0, vmax=0.2, add_labels=False, aspect=3, size=4)
```
</details>

#### Recap quiz

**Can you repeat the process we have gone through above to cloud mask the post Tropical Cyclone Yasa Sentinel-2 image?**

**First, you will need to create a path to the file `s2_tc_yasa_post_event.tif`, and then you will need to read the file into an `xarray.DataArray` object using rioxarray's `open_rasterio()` method. Use the variable name `post_s2` as a reference for the `xarray.DataArray` object.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
post_tc_yasa_s2_path = os.path.join(data_path, "s2_tc_yasa_post_event.tif")
post_s2 = rxr.open_rasterio(post_tc_yasa_s2_path)
```
</details>

**Next, you need to add a time dimension as the first (0th) axis. Set the time value to `"2020-12-19"`, which corresponds to the date of the Sentinel-2 image capture.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
# add time dimension and coords
post_s2 = post_s2.expand_dims(dim={"time": [pd.to_datetime("2020-12-19")]}, axis=0)
```
</details>

**Now, can you rescale the Sentinel-2 multispectral reflectance values to be between 0 and 1 by dividing all array values by 10000?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
# rescale reflectance to 0 and 1
post_s2 = post_s2 / 10000
```
</details>

**Can you read in the cloud probability raster and set all values less than 50 to `True`? Use the variable name `post_s2_cloud_mask` to refer to the cloud mask array.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
post_s2_cloud_prob_path = os.path.join(data_path, "s2_tc_yasa_post_event_cloud_probability.tif")
post_s2_cloud_prob = rxr.open_rasterio(post_s2_cloud_prob_path)
post_s2_cloud_mask = post_s2_cloud_prob < 50
```
</details>

**Finally, can you apply the cloud mask to `post_s2` using the `where()` method to set all cloudy pixels to NaN (no data)?**

**Remember to update the no data value in the `post_s2` `xarray.DataArray` objects metadata.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
# apply cloud mask
post_s2_cloud_mask = post_s2_cloud_prob < 50
post_s2_cm = post_s2.where(post_s2_cloud_mask.sel(band=1))
post_s2_cm.rio.write_nodata(np.nan, encoded=True, inplace=True)
```
</details>

### Spectral indices

Spectral indices are mathematical combinations of spectral bands from remote sensing images to highlight features of interest on the land surface. To detect flooding associated with Tropical Cyclone Yasa we need to use a spectral index that's sensitive to the presence of water and moisture. We will compute the normalised difference water index (NDWI).

The NDWI is computed as:

$NDWI=\frac{Green-NIR}{Green+NIR}$

The NDWI ranges from -1 to 1, with positive values corresponding to the presence of water. The NDWI uses reflectance in the green and near infrared (NIR) portions of the electromagnetic spectrum. NIR light is absorbed by water and relatively more green light is reflected. It is this contrast between green and NIR reflectance that highlights water and moisture in images. 

#### Recap quiz

**Computing the NDWI is a local map algebra operation, the NDWI is computed for each pixel in turn. Can you compute the NDWI using the arrays referenced by `pre_s2_cm` and `post_s2_cm`?**

**Assign the results to the variables `pre_s2_ndwi` and `post_s2_ndwi`. You will need to use the <a href="https://docs.xarray.dev/en/latest/generated/xarray.DataArray.sel.html" target="_blank">`.sel()` method</a> to select the bands that correspond to green and NIR reflectance. Green is band 2 and NIR is band 4.**

**If you need help using `sel()`, look at how it is used earlier in this notebook or refer to the xarray documentation.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
pre_s2_ndwi = (pre_s2_cm.sel(band=2) - pre_s2_cm.sel(band=4)) / (pre_s2_cm.sel(band=2) + pre_s2_cm.sel(band=4))
post_s2_ndwi = (post_s2_cm.sel(band=2) - post_s2_cm.sel(band=4)) / (post_s2_cm.sel(band=2) + post_s2_cm.sel(band=4))
```
</details>

#### Recap quiz

**Why do we need to use the `sel()` method to select the green and NIR bands?**

<details>
    <summary><b>answer</b></summary>

The NDWI equation subtracts, adds, and divides two dimensional arrays (or rasters) (i.e. pixel wise subtracting the NIR array from the green array, pixel wise adding the green array and NIR array). The Sentinel-2 images that we're working with here have four bands (blue, green, red, and NIR). We need to convert the three dimensional four band arrays into two, two dimensional arrays to compute the NDWI. The `sel()` method lets us select a band to subset returning a two dimensional array. 
</details>

Let's check the NDWI was computed correctly by visualising the data. The values should be between -1 and 1, and higher values should correspnd to the presence of water. The ocean and rivers should appear in blue shades as we're using a green to blue colour palette here (low NDWI values are mapped to green shades and high NDWI values are mapped to blue shades). 

Note, here we use the `sel()` method again, but we're subsetting along the time dimension. As there is only one period along the time dimension, we're effectively removing the time informaton from the `xarray.DataArray` object to leave a two dimensional array for plotting. 

In [None]:
pre_s2_ndwi.sel(time="2020-10-25").plot.imshow(cmap="GnBu", robust=True, aspect=3, size=4)

In [None]:
post_s2_ndwi.sel(time="2020-12-19").plot.imshow(cmap="GnBu", robust=True, aspect=3, size=4)

#### Change detection 

Change detection in the context of remote sensing image analysis is comparing two or more remote sensing images from different dates to detect change on the Earth's land surface. Change detection can be implemented as a local map algebra operation -  comparing change in pixels values across time where a large change in values indicates change in land surface conditions. 

We can detect flooded locations following Tropical Cyclone Yasa using a change detection analysis by comparing the pre and post event NDWI images. A large increase in NDWI values would indicate flooding. 

#### Recap quiz

**A simple change detection technique is to compute the difference in NDWI values for the pre and post event images. Can you do this and assign the difference image to the variable `diff_ndwi`?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
diff_ndwi = post_s2_ndwi.sel(time="2020-12-19") - pre_s2_ndwi.sel(time="2020-10-25")
diff_ndwi
```
</details>

Let's plot the difference image. We'll use the `"Blues"` colour palette so areas of an increase in water should appear as darker blue shades. 

In [None]:
diff_ndwi.plot.imshow(cmap="Blues", robust=True, aspect=3, size=4)

#### Recap quiz

**Can you consider a strength and limitation of using the `"Blues"` colour palette to viualise change in NDWI values?**

<details>
    <summary><b>answer</b></summary>

<p><b>strength:</b> change indicating increasing NDWI values are rendered in darker bluer shades which intuitively looks wetter.</p>
<p><b>weakness:</b> it is a sequential colour palette of blue shades from decreasing to increasing change in NDWI values. The sequential palette does not obviously convey both increasing and decreasing change values around zero. We could use a diverging palette with increasing change in NDWI represented by blue shades and decreasing change in NDWI represented by red shades.</p>

<h4>Recap quiz</h4>

<b>Here are the docs for <a href="https://docs.xarray.dev/en/stable/generated/xarray.plot.imshow.html" target="_blank"><code>plot.imshow()</code></a> and <a href="https://matplotlib.org/stable/gallery/color/colormap_reference.html" target="_blank"><code>Matplotlib's colour palettes</code></a>. Can you use these docs to render the NDWI difference image with a diverging colour palette centred on zero?</b>
</details>

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
diff_ndwi.plot.imshow(cmap="RdBu", robust=True, center=0, aspect=3, size=4)
```
</details>

#### Recap quiz

**What are some of the challenges / limitations to using the difference as a change detection metric?**

<details>
    <summary><b>answer</b></summary>

The magnitude can be influenced by how reflective a surface is or the illumination conditions when the image was captured. For example, a small absolute difference in spectral reflectance for a surface that does not reflect much might be a large relative change. Similarly, if one of the pre or post image capture conditions was brighter, then that could amplify or attenuate the difference signal. Therefore, often relative change is used as a change detection metric. 
</details>

<p></p>

<a href="https://pro.arcgis.com/en/pro-app/latest/tool-reference/image-analyst/compute-change-raster.htm" target="_blank">ESRI provide a formula for computing the relative change</a> between two images:

$relative change=\frac{Post - Pre}{max(Post, Pre)}$

The difference between the post event and pre event NDWI images is divided by the maximum NDWI pixel value considering both the pre and post images. Let's compute the relative change. 

In [None]:
rel_change_ndwi = diff_ndwi / np.maximum(post_s2_ndwi.sel(time="2020-12-19"), pre_s2_ndwi.sel(time="2020-10-25"))
rel_change_ndwi

Let's plot the relative change map. 

In [None]:
rel_change_ndwi.plot.imshow(cmap="RdBu", robust=True, center=0, aspect=3, size=4)

### Flood extent maps

Here, we'll use a simple threshold to classify pixels as flooded or not-flooded. To start with, let's use a simple threshold from visual inspection of the data.

Let's use a change threshold of an increase in NDWI of greater than 1 to represent flooded areas.

In [None]:
flood_map = rel_change_ndwi > 1
flood_map.plot.imshow(cmap="Blues", aspect=3, size=4)

## Geometry operations

### Vector geometry operations

*Vector geometry operations transform the shape, size, and / or projection of vector datasets. They're applied to the geometries as opposed to the dataset attributes.*. 

We can load in a "ground truth" flood map prepared by UNITAR and compare their flood maps to our change image. 

In [None]:
unitar_yasa_gdf = gpd.read_file(os.path.join(data_path, "st1_20201219_floodextent_labasa.geojson"))
unitar_yasa_gdf.explore()

The `xarray.DataArray` object `flood_map` stores a raster representation of the flooded extents (i.e. areas where there was a large relative change in NDWI after Tropical Cyclone Yasa made landfall). 

However, it would be good to overlay the vector geometries of "ground truth" flood extents on our NDWI difference image see if our choice of threshold is sensible. To do this we'll need to make sure the coordinate reference systems (CRS) for our raster data and vector data match. Let's check their CRS.

In [None]:
print(f"The CRS of the raster data is: {rel_change_ndwi.rio.crs}")
print(f"The CRS of the vector data is: {unitar_yasa_gdf.crs}")

#### Recap quiz

**By executing `unitar_yasa_gdf.crs`, are we accessing an attribute or a method?**

<details>
    <summary><b>answer</b></summary>

An attribute. Specifically we are accessing the <code>crs</code> attribute of a <code>GeoDataFrame</code> object. The <code>crs</code> attribute is self describing, it stores the coordinate reference system that the geometry information in the <code>GeoDataFrame</code> corresponds to.
</details>

The CRS of the vector data is EPSG:4326 which uses latitude and longitude to identify locations on the Earth's surface. The CRS of the raster data is EPSG:32760 which is UTM zone 60S - a projected CRS. This means the raster data has been projected onto a flat surface. Let's convert the vector data to match the raster data's CRS. 

In [None]:
unitar_yasa_gdf_epsg_32760 = unitar_yasa_gdf.to_crs("EPSG:32760")

#### Recap quiz

**By executing `unitar_yasa_gdf.to_crs("EPSG:32760")`, are we accessing an attribute or a method?**

<details>
    <summary><b>answer</b></summary>

A method. We can see it is a method by the parentheses following <code>to_crs()</code> and the fact "to" implies that we are doing something. Specifically, here, we are converting our geometry data to another CRS. 
</details>

By executing `to_crs()` we are performing a **geometry data transformation** operation. We are converting the geometry information (i.e. coordinates defining the polygons of flood extents) from one CRS to another. 

Now, let's plot our raster NDWI difference image and vector ground truth flood extents on the same map. You can see the correspondance between the dark blue shades of flooding detected using our remote sensing data and the light blue polygons which are the "ground truth" flood extents.


In [None]:
fig, ax = plt.subplots(figsize=(12, 4))

flood_map.plot.imshow(
    cmap="Blues",
    ax=ax,
)
    
unitar_yasa_gdf_epsg_32760.plot(
    color="None",
    edgecolor="cyan",
    linewidth=0.25,
    ax=ax,
    zorder=4,
)
    
ax.axis('off')
plt.show()

## Raster geometry operations

*Raster geometry operations transform the shape, resolution, size, and / or projection of raster datasets. They're applied to the pixel geometries as opposed to the dataset attributes (i.e. pixel values).* 

The final task we need to perform is to estimate the area of cropland that was flooded. To do this we need a map of cropland extents. We can read in a 10 m spatial resolution land cover map extracted from the <a href="https://gee-community-catalog.org/projects/esrilc2020/" target="">ESRI 2020 Global Land Use Land Cover from Sentinel-2 dataset</a>. 

In the ESRI land cover map, each pixel is assigned an integer value that corresponds to a land cover class. Cropland is class value 5. Let's read in the data. 

In [None]:
esri_lulc_path = os.path.join(data_path, "esri_lulc_2020.tif")
esri_lulc = rxr.open_rasterio(esri_lulc_path)

In [None]:
esri_lulc

Let's visualise the ESRI land cover map. Don't worry about the fiddly code to draw the legend. 

In [None]:
esri_lulc_colours = [
    "#1A5BAB",
    "#358221",
    "#A7D282",
    "#87D19E",
    "#FFDB5C",
    "#EECFA8",
    "#ED022A",
    "#EDE9E4",
    "#F2FAFF",
    "#C8C8C8",
  ]

esri_lulc_classes = [
    "water", 
    "trees", 
    "grass", 
    "flooded vegetation", 
    "crops", 
    "scrub/shrub", 
    "build area", 
    "bare ground", 
    "snow/ice", 
    "clouds",
]

fig, ax = plt.subplots(figsize=(12, 4))
cmap = matplotlib.colors.ListedColormap(esri_lulc_colours)
esri_lulc.sel(band=1).plot.imshow(cmap=cmap, ax=ax, add_colorbar=False)

legend_labels = [f'{i+1}: {esri_lulc_classes[i]}' for i in range(len(esri_lulc_classes))]
ax.legend(handles=[plt.Line2D([0], [0], marker="s", color="w", markerfacecolor=f"{esri_lulc_colours[i]}", markersize=10, label=label) for i, label in enumerate(legend_labels)])
plt.show()

To identify flooded croplands, we'll need to create a cropland mask that we can apply to our flood map (a similar process to cloud masking). This will require us to create a binary cropland `True` or `False` layer that has the same shape and CRS as our flood map. First, let's check the shape of our flood map and our land cover map.

In [None]:
print(f"The shape of the flood map is: {flood_map.shape}")
print(f"The shape of the land cover map is: {esri_lulc.shape}")

In [None]:
print(f"The CRS of the flood map is: {flood_map.rio.crs}")
print(f"The CRS of the land cover map is: {esri_lulc.rio.crs}")

We can see that the shape and CRS of the land cover map does not match the flood map. Therefore, we'll need to perform a range of raster geometry operations to change the shape and dimensions of the land cover data. First, we'll need to reproject the land cover map to the CRS EPSG:32760 which can change the shape of the array (here we're moving from a spherical to a flat projection). Then, we need to clip the land cover map to the extent of the flood map. And, finally, we need to resample the land cover map so it has the same shape as the flood map. Helpfully, the rioxarray package has a <a href="" target="_blank">`reproject_match()` method</a> that handles all of these steps for us.

In [None]:
esri_lulc_match = esri_lulc.rio.reproject_match(flood_map)

Let's check this worked OK.

In [None]:
print(f"The shape of the flood map is: {flood_map.shape}")
print(f"The shape of the land cover map is: {esri_lulc_match.shape}")

In [None]:
print(f"The CRS of the flood map is: {flood_map.rio.crs}")
print(f"The CRS of the land cover map is: {esri_lulc_match.rio.crs}")

Now, let's create a cropland mask. Cropland is represented by the pixel value 5.

In [None]:
cropland = esri_lulc_match == 5
cropland

In [None]:
cropland.sel(band=1).plot.imshow(cmap="Greens", aspect=3, size=4)

#### Recap quiz

**As a final exercise, can you use the cropland mask to create a flood map of only flooded croplands? Use the cloud masking example as a template for how to do this.**

**Can you compute the area of cropland that is flooded? You can do this by summing the number of flooded cropland pixels and multiplying the result by 100 (each pixel is 10 m x 10 m).**

You can find information aggregation of `xarray.DataArray` objects <a href="https://docs.xarray.dev/en/latest/user-guide/computation.html#agg" target="_blank">here</a>. Is recommended you use the <a href="https://docs.xarray.dev/en/latest/generated/xarray.DataArray.sum.html" target="_blank">`sum()` method</a> for this task. 

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
flooded_croplands = flood_map.where(cropland)
print(f"The area of flooded croplands is {(flooded_croplands.sum() * 100).values} square metres")
```
</details>