# Demo: Working with pixel quality masks

This notebook demonstrates how to apply a pixel quality mask to remove poor-quality or undesired pixels from loaded data. 

The notebook demonstrates:

1. How to load data with multiple pixel quality masks
1. How to apply a given mask to loaded data
1. How to apply morphological operations and custom nodata values to the mask

## Set up
The following cell should be uncommented and run if you installed the package in editable mode and are actively developing and testing modules.
Otherwise, it can be left commented.

In [None]:
# %load_ext autoreload
# %autoreload 2

### Enable logging

This will allow you to see info and warning messages from the package.

In [None]:
import logging
import sys

logging.basicConfig(
    format="%(asctime)s | %(levelname)s : %(message)s",
    level=logging.INFO,
    stream=sys.stdout,
)

### Import the relevant packages

Masking functionality can be directly accessed from the `RasterBase` class and has been designed to operate on the contents of the `mask` attribute.

In [None]:
import numpy as np
from pprint import pprint

from eo_insights.stac_configuration import de_australia_stac_config
from eo_insights.raster_base import RasterBase, QueryParams, LoadParams

## Set up and run query

For more information on how to load data, see [load_demo.ipynb](load_demo.ipynb).
This demonstration uses Digital Earth Australia's Sentinel-2 product, which comes with two pixel quality masks: `fmask` and `s2cloudless`.

In [None]:
query_params = QueryParams(
    bbox=(145.02, -37.46, 145.01, -37.45),
    start_date="2020-11-01",
    end_date="2020-12-01",
)

load_params = LoadParams(
    crs="EPSG:3577",
    resolution=10,
    bands=("red", "green", "blue", "nir", "fmask", "s2cloudless"),
)

stac_raster = RasterBase.from_stac_query(
    config=de_australia_stac_config,
    collections=["ga_s2am_ard_3", "ga_s2bm_ard_3"],
    query_params=query_params,
    load_params=load_params,
)

## Apply masking

To start, it is useful to display an unmasked version of the data.

### Display an RGB plot for a subset of images

In [None]:
stac_raster.data.isel(time=slice(0, 3))[
    ["red", "green", "blue"]
].to_array().plot.imshow(col="time", vmin=0, vmax=3000)

### Apply fmask and create new masked variables

When applying masks, you can choose between applying the masking in-place (where the original variables will be overwritten) or not.
This applies to the contents of both `data` and `masks`.

The next few cells will demonstrate how masking works when `inplace` is set to `False` for both `data` and `masks`.

In [None]:
stac_raster.apply_mask("fmask", data_inplace=False, mask_inplace=False)

Running the above step produces two INFO messages:

- Converting categorical mask to boolean
- Selecting all pixels belonging to any of ['nodata', 'cloud', 'shadow', 'snow', 'water']

The first message specifies that `fmask` has been listed as a categorical mask in the configuration.
The second message specifies that the default configuration when applying `fmask` is to select all pixels belonging to any of `['nodata', 'cloud', 'shadow', 'snow', 'water']`. 
This can be confirmed by looking at the configuration settings for `fmask`:

In [None]:
pprint(de_australia_stac_config.collections["ga_s2am_ard_3"].masks["fmask"])

When applying a mask, the first step is to identify which pixels should be excluded, and which should be kept.
This is a boolean version of the categorical mask, where the selected values are `True` and the remaining values are `False`. 
This mask is then inverted when it is applied to the data.

By default, masked pixels are replaced with the band's default nodata value.

When `mask_inplace` is set to `False`, a boolean version of the mask is saved in a new variable, `fmask_bool` as shown below: 

In [None]:
stac_raster.masks

When `data_inplace` is set to `False`, masked versions of each band are saved to `bandname_masked`, as shown below: 

In [None]:
stac_raster.data

Having applied the mask, the masked bands can now be displayed.

In [None]:
stac_raster.data.isel(time=slice(0, 3))[
    ["red_masked", "green_masked", "blue_masked"]
].to_array().plot.imshow(col="time", vmin=0, vmax=3000)

### Apply s2cloudless in place

The next few cells will demonstrate how masking works when `data_inplace` and `mask_inplace` are set to `True`, which is configured to be the default behaviour. 
As such, the two arguments can be excluded from the `apply_mask()` function call.

In [None]:
stac_raster.apply_mask("s2cloudless")

Again, it is possible to view the configuration settings for this mask:

In [None]:
pprint(de_australia_stac_config.collections["ga_s2am_ard_3"].masks["s2cloudless"])

Because the in-place approached was used, the original bands and masks have been overwritten. This can be seen by displaying the bands and the masks:

In [None]:
stac_raster.data.isel(time=slice(0, 3))[
    ["red", "green", "blue"]
].to_array().plot.imshow(col="time", vmin=0, vmax=3000)

When viewing the `masks` attribute, it is possible to see that applying the masking in-place converts the mask from its original type to a boolean:

In [None]:
stac_raster.masks

### Additional functionality: custom nodata and morphological operations

The next few cells demonstrate how to apply morphological operations (`opening`, `closing`, `dilation` and `erosion`) to the mask, as well as how to specify a custom `nodata` value.

Morphological operations are supplied in a list of tuples, with each tuple containing the name of the operation, and the radius to use for the disk kernel.

For this step, we load a clean version of the data (to avoid using the data that has already been modified in-place during the last step.)

In [None]:
stac_raster_2 = RasterBase.from_stac_query(
    config=de_australia_stac_config,
    collections=["ga_s2am_ard_3", "ga_s2bm_ard_3"],
    query_params=query_params,
    load_params=load_params,
)

In [None]:
stac_raster_2.apply_mask(
    "fmask",
    mask_filters=[("opening", 3), ("dilation", 5)],
    nodata=np.nan,
    data_inplace=False,
    mask_inplace=False,
)

The effect of the morphological operations and use of `NaN` as the no data value are evident when plotting the masked data:

In [None]:
stac_raster_2.data.isel(time=slice(0, 3))[
    ["red_masked", "green_masked", "blue_masked"]
].to_array().plot.imshow(col="time", vmin=0, vmax=3000)