# GEM ML Framework Demonstrator - Water Segmentation
In these notebooks, we will get a feeling of how the GEM ML framework can be used for the segmentation of water bodies using Sentinel-1 imagery as input and Sentinel-2 based normalized difference water index (NDWI) as a reference.
The idea is to use a neural network (NN) model for the analysis.
Thanks to the flexibility of the GEM ML framework, the model used can be replaced by changing the configuration only.
We will have a look at the following notebooks separately:
- 00_Configuration
- 01_DataAcquisition
- 02_DataNormalization
- 03_TrainingValidationTesting
- 04_PyTorchTasks_ModelForwardTask

by Michael Engel (m.engel@tum.de)

-----------------------------------------------------------------------------------

# Configuration
Here, we define the configuration of our segmentation pipeline.
Let's import all libraries we need for that!

In [None]:
import datetime as dt
import os
import platform

from sentinelhub import SHConfig
from torch.cuda import is_available as cuda_available

from libs.ConfigME import Config

Now, we can initialize the configuration file with a proper name and identifiers for storing.

In [None]:
config = Config(
    name = 'GEM-ML-Framework_WaterSegmentation', # name of the project
    savename = 'WaterSegmentationRun', # basic name to store stuff
    savename_config = "config.dill" # name of configuration file
)

Our pipeline is defined by 4 notebooks.

In [None]:
config.file_DataAcquisition = "01_DataAcquisition.ipynb"
config.file_DataNormalization = "02_DataNormalization.ipynb"
config.file_TrainingValidationTesting = "03_TrainingValidationTesting.ipynb"
config.file_PyTorchTasks_ModelForwardTask = "04_PyTorchTasks_ModelForwardTask.ipynb"

Let's define the directories we are working with, i.e. in which directories to store our `EOPatches` and results.
By that, we ensure that everything is only defined once.

In [None]:
#%% folder where data necessary for running the notebook is stored such as the geojson of the AOI
config.dir_inputs = os.path.join(os.getcwd(),"inputs")
config.dir_extra = os.path.join(os.getcwd(),"extra")

#%% results
config.basedir = os.path.join(os.getcwd(),config["savename"])
config.dir_results = os.path.join(config["basedir"], "results")
config.dir_checkpoints = os.path.join(config["dir_results"], "checkpoints")
config.dir_tensorboard = os.path.join(config["dir_results"], "tensorboard")
config.dir_imgs = os.path.join(config["dir_results"], "imgs")
config.dir_imgs_validation = os.path.join(config["dir_imgs"],"PredictionValidation")

#%% locations for collected data
config.dir_data = os.path.join(config["basedir"],"data")
config.dir_train = os.path.join(config["dir_data"], "train")
config.dir_validation = os.path.join(config["dir_data"], "validation")
config.dir_test = os.path.join(config["dir_data"], "test")
config.dir_showcase = os.path.join(config["dir_data"], "showcase")

#%% locations for GeoTiffs
config.dir_tiffs = os.path.join(config["dir_results"],"tiffs")
config.dir_tiffs_train = os.path.join(config["dir_tiffs"],"train")
config.dir_tiffs_validation = os.path.join(config["dir_tiffs"],"validation")
config.dir_tiffs_test = os.path.join(config["dir_tiffs"],"test")
config.dir_tiffs_showcase = os.path.join(config["dir_tiffs"],"showcase")

#%% caching
config.dir_cache = os.path.join(os.getcwd(),"cache")

Let's load our **credentials** for Sentinel Hub from storage.
If you don't your credentialshave stored on disk yet, you should have a look at this [notebook](https://sentinelhub-py.readthedocs.io/en/latest/configure.html).

In [None]:
#%% Sentinel Hub credentials
config.SHconfig = SHConfig()

Here we define the parameters like the resolution and pixelwidth of our patches which will be fed to our model later on.

In [None]:
config.patchpixelwidth = 256
config.resolution = 20

Further, we set some values for the desired maximum cloud coverage of our reference observations and the maximum allowed time period our input data could be apart of that reference date.

In [None]:
config.maxcc = 0.5
config.datatimedelta = dt.timedelta(days=1,hours=12)

In a next step, we define our areas of interest - both spatially and temporally.

In [None]:
config.AOI_train = os.path.join(config["dir_inputs"],"PakistanFlood_train.json")
config.start_train = dt.datetime(year=2022,month=8,day=30)
config.end_train = dt.datetime(year=2022,month=9,day=1)

config.AOI_validation = os.path.join(config["dir_inputs"],"PakistanFlood_validation.json")
config.start_validation = config["start_train"]
config.end_validation = config["end_train"]

config.AOI_test = os.path.join(config["dir_inputs"],"PakistanFlood_test.json")
config.start_test = config["start_train"]
config.end_test = config["end_train"]

config.AOI_showcase = os.path.join(config["dir_inputs"],"NigeriaFlood.json")
config.end_showcase = dt.datetime(year=2022,month=11,day=7)
config.end_showcase = dt.datetime(year=2020,month=9,day=7) # KhotanRiver
config.checktimedelta_showcase = dt.timedelta(days=80)
config.n_observations_showcase = 8

In order to prevent overlapping training, validation and testing regions, we erode our AOIs by half the patchwidth in meter.
Hence, we set a buffer value used for that.

In [None]:
config.AOIbuffer = -config["patchpixelwidth"]*config["resolution"]/2

Since we want to store some results of our showcase, we have to define some savenames for those.

In [None]:
config.savename_showcase_tiff = "NigeriaFlood_WaterMask.tif"
config.savename_showcase_tiff_reproject = "NigeriaFlood_WaterMask_reprojected.tif"
config.savename_showcase_GradientShap_tiff = "NigeriaFlood_GradientShap.tif"
config.savename_showcase_GradientShap_tiff_reproject = "NigeriaFlood_GradientShap_reprojected.tif"

As we want to use both CPU and GPU, we have to define the number of threads and device.

In [None]:
config.threads = 1 if platform.system()=="Windows" else 5
config.device = "cuda" if cuda_available() else "cpu"

In the following, we define some general ML parameters.

In [None]:
config.n_epochs = 32
config.num_classes = 2
config.batch_size = 16
config.max_batch_size = 6
config.checkpoint_bestloss = True
config.checkpoint_freq = 8
config.eval_freq = 2
config.seed = 42

We want to use the DeepLabV3Plus architecture as provided by [Pavel Yakubovskiy](https://segmentation-modelspytorch.readthedocs.io/en/latest/).

In [None]:
config.module_model = "segmentation_models_pytorch.DeepLabV3Plus"
config.kwargs_model = {
    "encoder_name":"resnet34", # think of changing this default value!
    "encoder_depth":5, # think of changing this default value!
    "encoder_weights":"imagenet", # think of changing this default value!
    "encoder_output_stride":16, # think of changing this default value!
    "decoder_channels":256, # think of changing this default value!
    "decoder_atrous_rates":(12, 24, 36), # think of changing this default value!
    "in_channels":2,
    "classes":config["num_classes"],
    "activation":None, # think of changing this default value!
    "upsampling":4, # think of changing this default value!
    "aux_params":None, # think of changing this default value!
}

Of course, we want to store our trained model to disk.

In [None]:
config.model_savename = config["savename"]
config.model_savename_bestloss = config["model_savename"]+"_bestloss"
config.model_savename_inference = config["savename"]+"_inference"
config.model_savename_inference_bestloss = config["model_savename_inference"]+"_bestloss"

Here, we will use the [CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) as a classic.
We will not apply reduction since we would like to apply our mask manually in the training notebook.

In [None]:
config.module_loss = "torch.nn.CrossEntropyLoss"
config.kwargs_loss = {
    "weight":None,
    "size_average":None,
    "ignore_index":-100,
    "reduce":None,
    "reduction":"none",
    "label_smoothing":0.0,
}

We will use the standard [Adam Optimizer](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html).

In [None]:
config.module_optimizer = "torch.optim.Adam"
config.kwargs_optimizer = {
    "lr":0.007,
    "betas":(0.9, 0.999),
    "eps":1e-08,
    "weight_decay":1e-06,
    "amsgrad":False
}

For evaluation, we need some metrics.
We will use the standard Accuracy and Cohen Kappa.
We emphasize that you could use an arbitrary amount of metrics by expanding that list.

In [None]:
config.module_metric = ["../utils/metrics.accuracy", "../utils/metrics.cohen_kappa"]

For the data normalisation, we use the `QuantileScaler_eolearn_tdigest` as established by TUM.
Hence, we need to define the savenames and corresponding parameters.

In [None]:
config.savename_tdigest = config["savename"]+"_TDigest.npy" 
config.savename_scaler = config["savename"]+"_QuantileScaler.dill" 

config.scaler_minquantile = 0.01 # minquantile
config.scaler_maxquantile = 0.96 # maxquantile
config.scaler_valmin = 0 # corresponding value for minquantile
config.scaler_valmax = 1 # corresponding value for maxquantile

config.scaler_nanval = [0,0] # value to replace nans with
config.scaler_infval = [0,0] # value to replace infs with

Finally, we may not forget to store our configuration file to disk and apply some checking routines.

In [None]:
#%% saving and checking
#%%% check directories
config.checkdir()
#%%% check files
config.checkfile()
#%%% check modules
config.checkmodule()
#%%% save config
file = config.save()
file2 = config.save(os.path.join(config["dir_results"],config["savename_config"])) # saving to results folder
#%% print config
# config.print()