# GEM ML Framework Demonstrator - Deforestation Detection
In these notebooks, we provide an in-depth example of how the GEM ML framework can be used for segmenting deforested areas using Sentinel-2 imagery as input and the [TMF dataset](https://forobs.jrc.ec.europa.eu/TMF/) as a reference.
The idea is to use a neural network (NN) model for the analysis.
Thanks to the flexibility of the GEM ML framework, we can easily substitute the model in the future by adjusting only the configuration file.
We will have a look at the following notebooks separately:
- 00_Configuration
- 01_DataAcquisition
- 02_DataNormalization
- 03_TrainingValidationTesting
- 04_Inference_Clouds

Authors: Michael Engel (m.engel@tum.de) and Joana Reuss (joana.reuss@tum.de)

-----------------------------------------------------------------------------------

# Configuration
Here, we define the configuration of our segmentation pipeline.
Let's import all libraries we need for that!

In [None]:
from libs.ConfigME import Config
import os
import platform
import datetime as dt
from sentinelhub import SHConfig
from matplotlib.colors import ListedColormap

## Configuration pipeline

We initialize the configuration file with a proper name and identifiers for storing.

In [None]:
config = Config(
    name = 'GEM-ML-Framework_DeforestationDetection', # name of the project
    savename = 'DeforestationDetectionRun', # basic name to store stuff
    savename_config = "config.dill" # name of configuration file
)

Our pipeline is defined by 4 notebooks.

In [None]:
config.file_DataAcquisition = "01_DataAcquisition.ipynb"
config.file_DataNormalization = "02_DataNormalization.ipynb"
config.file_TrainingValidationTesting = "03_TrainingValidationTesting.ipynb"
config.file_showcase = "04_Inference_Clouds.ipynb"

Let's define the directories we are working with, i.e. in which directories to store our `EOPatches` and results.
By that, we ensure that everything is defined only once.

In [None]:
#%% folder where data necessary for running the notebook is stored such as the geojson of the AOI
config.dir_inputs = os.path.join(os.getcwd(),"inputs")
config.dir_extra = os.path.join(os.getcwd(),"extra")

#%% results
config.basedir = os.path.join(os.getcwd(),config["savename"])
config.dir_results = os.path.join(config["basedir"], "results")
config.dir_checkpoints = os.path.join(config["dir_results"], "checkpoints")
config.dir_tensorboard = os.path.join(config["dir_results"], "tensorboard")
config.dir_imgs = os.path.join(config["dir_results"], "imgs")
config.dir_imgs_validation = os.path.join(config["dir_imgs"],"PredictionValidation")

#%% locations for collected data
config.dir_data = os.path.join(config["basedir"],"data")
config.dir_train = os.path.join(config["dir_data"], "train")
config.dir_validation = os.path.join(config["dir_data"], "validation")
config.dir_test = os.path.join(config["dir_data"], "test")
config.dir_showcase = os.path.join(config["dir_data"], "showcase")

#%% locations for GeoTiffs
config.dir_tiffs = os.path.join(config["dir_results"],"tiffs")
config.dir_tiffs_train = os.path.join(config["dir_tiffs"],"train")
config.dir_tiffs_validation = os.path.join(config["dir_tiffs"],"validation")
config.dir_tiffs_test = os.path.join(config["dir_tiffs"],"test")
config.dir_tiffs_showcase = os.path.join(config["dir_tiffs"],"showcase")

#%% caching
config.dir_cache = os.path.join(os.getcwd(),"cache")

We're defining some `tif`-filenames in order to store some results of our showcase.

In [None]:
config.savename_showcase_tiff = config["savename"]+"_showcase.tif"
config.savename_showcase_tiff_post = config["savename"]+"_showcase_postprocessed.tif"

## Reference data configuration

Our reference is obtained from the [TMF dataset](https://forobs.jrc.ec.europa.eu/TMF/).

In [None]:
config.path_reference = os.path.join(config["dir_inputs"],"JRC_TMF_AnnualChange_v2_2021_SAM_ID30_N0_W60.tif").replace("\\","/")

The six original classes are aggreagted to the following four:

In [None]:
config.class_water = 1
config.class_forest = 2
config.class_deforestation = 3
config.class_indefinite = 4

The simpler class scheme above was obtained by joining the following initially provided classes:
- `1: Undisturbed tropical moist forest` and `4:Tropical moist forest regrowth`
- `2: Degraded tropical moist forest` and `3: Deforested land`

We want to map our reference data in accordance to the simpler mapping scheme. Therefore, we apply the following label mapping:

In [None]:
config.labelmapping = {
    1:config["class_forest"],
    2:config["class_deforestation"],
    3:config["class_deforestation"],
    4:config["class_forest"],
    5:config["class_water"],
    6:config["class_indefinite"]
}

Further, we would like to incorporate the cloud cover in our reference.
Accordingly, we define the desired class value for clouds.

In [None]:
config.class_clouds = 0

Our new reference labels ask for a unique and nice colormap.

In [None]:
 config.cmap_reference = ListedColormap([
     "white", # clouds
     "blue", # water
     "darkgreen", # forest
     "orange", # deforestation
     "black" # indefinite
 ])

## Configuration for acquiring Sentinel data

In case you did not store your credentials on disk in advance, take a look at the following [notebook](https://gitlab.lrz.de/mkoerner/projects-and-proposals/projects/2020_GEM/howto-eo-learn/-/blob/main/1_Configuration/tutorial1_config.ipynb).

Loading Sentinel Hub **credentials** from storage:

In [None]:
#%% Sentinel Hub credentials
config.SHconfig = SHConfig()

Here we define parameters like the resolution and pixel width of our patches which will later be fed to our model.

In [None]:
config.patchpixelwidth = 256
config.resolution = 20

For the sake of completeness, we enable the user to apply some buffer to the AOIs.

In [None]:
config.AOIbuffer = 0

Further, we set a value for our data's desired maximum cloud coverage (in percentage).

In [None]:
config.maxcc = 0.3

We have defined our areas of interest (AOIs) for train, validation and test, separately (both spatially and temporally) and saved them within `geojson`-files. We can now easily point to these files and assign their location to a configuration parameter.

We choose the 2021-12-31 as our cutoff date (for train, validation and test) and obtain the closest (`config.start_train = 1`) date ahead of the cutoff date with the maximum allowed cloud coverage.

In [None]:
config.AOI_train = os.path.join(config["dir_inputs"],"AOI_train.geojson")
config.start_train = 1
config.end_train = dt.datetime(year=2021,month=12,day=31,hour=23,minute=59,second=59)
config.checktimedelta = dt.timedelta(days=365)

config.AOI_validation = os.path.join(config["dir_inputs"],"AOI_validation.geojson")
config.start_validation = config["start_train"]
config.end_validation = config["end_train"]

config.AOI_test = os.path.join(config["dir_inputs"],"AOI_test.geojson")
config.start_test = config["start_train"]
config.end_test = config["end_train"]

Furthermore, we have defined showcase AOI used for inference for which we acquire data from the year 2022.

In [None]:
config.AOI_showcase = os.path.join(config["dir_inputs"],"AOI_showcase.geojson")
config.start_showcase = dt.datetime(year=2022,month=10,day=1)
config.end_showcase = dt.datetime(year=2022,month=11,day=1)

## Configuration for ML pipeline and training setup

In the following, we define some general ML parameters.

As we want to use both CPU and GPU, we have to define the number of threads and device.

In [None]:
config.threads = 1 if platform.system()=="Windows" else 5
config.device = "cuda"

In [None]:
config.n_epochs = 128
config.num_classes = 4
config.batch_size = 12
config.max_batch_size = 3
config.checkpoint_bestloss = True
config.checkpoint_bestmetric = True
config.checkpoint_freq = 8
config.eval_freq = 2
config.seed = 42

We use the DeepLabV3Plus architecture as provided by [Pavel Yakubovskiy](https://segmentation-modelspytorch.readthedocs.io/en/latest/).

In [None]:
config.module_model = "segmentation_models_pytorch.DeepLabV3Plus"
config.kwargs_model = {
    "encoder_name":"resnet34", # think of changing this default value!
    "encoder_depth":5,
    "encoder_weights":"imagenet", # think of changing this default value!
    "encoder_output_stride":16, # think of changing this default value!
    "decoder_channels":256, # think of changing this default value!
    "decoder_atrous_rates":(12, 24, 36), # think of changing this default value!
    "in_channels":6,
    "classes":config["num_classes"],
    "activation":None, # think of changing this default value!
    "upsampling":4, # think of changing this default value!
    "aux_params":None, # think of changing this default value!
}

Storing our trained model to disk.

In [None]:
config.model_savename = config["savename"]
config.model_savename_bestloss = config["model_savename"]+"_bestloss"
config.model_savename_bestmetric = config["model_savename"]+"_bestmetric"
config.model_savename_inference = config["savename"]+"_inference"
config.model_savename_inference_bestloss = config["model_savename_inference"]+"_bestloss"
config.model_savename_inference_bestmetric = config["model_savename_inference"]+"_bestmetric"

We will use the classic [CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html).
We will not apply loss reduction since we would like to apply our mask manually in the training notebook.

In [None]:
config.module_loss = "torch.nn.CrossEntropyLoss"
config.kwargs_loss = {
    "weight":None, # change
    "size_average":None,
    "ignore_index":-100,
    "reduce":None,
    "reduction":"none",
    "label_smoothing":0.0,
}

We will use the standard [Adam Optimizer](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html).

In [None]:
config.module_optimizer = "torch.optim.Adam"
config.kwargs_optimizer = {
    "lr":0.007,
    "betas":(0.9, 0.999),
    "eps":1e-08,
    "weight_decay":1e-06,
    "amsgrad":False
}

For evaluation, we need some metrics.
We will use the standard Accuracy and Cohen Kappa.
We emphasize that you could use an arbitrary amount of metrics by expanding this list.

In [None]:
config.module_metric = ["../utils/metrics.accuracy", "../utils/metrics.cohen_kappa"]

## Configuration for Data Normalization

For the data normalization, we use the `QuantileScaler_eolearn_tdigest` as established by TUM.
Hence, we need to define the filenames and corresponding parameters.

In [None]:
config.savename_tdigest = config["savename"]+"_TDigest.npy" 
config.savename_scaler = config["savename"]+"_QuantileScaler.dill" 

config.scaler_minquantile = 0.02 # minquantile
config.scaler_maxquantile = 0.98 # maxquantile
config.scaler_valmin = 0 # corresponding value for minquantile
config.scaler_valmax = 1 # corresponding value for maxquantile

config.scaler_nanval = [0,0,0,0,0,0] # value to replace nans with
config.scaler_infval = [0,0,0,0,0,0] # value to replace infs with

## Final configuration setup

Finally, store our configuration file on disk and apply some checking routines.

In [None]:
#%% saving and checking
#%%% check directories
config.checkdir()
#%%% check files
config.checkfile()
#%%% check modules
config.checkmodule()
#%%% save config
file = config.save()
file2 = config.save(os.path.join(config["dir_results"],config["savename_config"])) # saving to results folder
#%% print config
# config.print()

In [None]:
print("Done")