<font face='Calibri' size='2'> <i>eSBAE - Notebook Series - Part 4, version 0.1, April 2023. Andreas Vollrath, UN-Food and Agricultural Organization, Rome</i>
</font>

![title](images/header.png)

# IV - eSBAE Dataset Augmentation
### Run various change detection algorithms on previously extracted time-series data
-------

This notebook takes you through the process of running various change detection algorithms for a set of points using [Google Earth Engine](https://earthengine.google.com/) as well as Python routines. The script is optimized to deal with thousands of points and will use parallelization to efficiently extract the information from the platform.

**You will need**:
- a valid Earth Engine account ([sign up here](https://code.earthengine.google.com/register))
- having successfully executed Notebook 3 of the eSBAE notebook series

**This notebook runs best on an r16 instance.** Initialize this by going back to the terminal tab and typing "r16". You may need to stop your previous session by clicking on the $ amount on the bottom right of the screen and/or refresh the webpage in order to be able to type a new instance in the terminal.

### 1 - Import libraries

In [1]:
# initialize EE    
import ee
try:
    ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com')
except:
    ee.Authenticate()
    ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com')
    
from sampling_handler import DatasetAugmentation

  warn("cupy is not available in this environment, GPU fonctionnalities won't be available")


### 2 - Basic Input Variables

In [2]:
esbae = DatasetAugmentation(
    
    # your project name, as set in previous notebooks
    project_name = 'my_first_esbae_project',

    # start of calibration period (mainly for bfast)
    calibration_start = '2015-01-01',  # YYYY-MM-DD format

    # Actual period of interest, i.e. monitoring period
    monitor_start =  '2020-01-01',  # YYYY-MM-DD format
    monitor_end   =  '2023-01-01',  # YYYY-MM-DD format

    # select the band for univariate ts-analysis (has to be inside bands list)
    ts_band = 'ndfi'
)

INFO: Using existing project directory at /home/sepal-user/module_results/esbae/esbae_project_Jan25
INFO: Using existent config file from project directory /home/sepal-user/module_results/esbae/esbae_project_Jan25


### 3 - Basic Settings

In [3]:
# select basic pre-processing options
esbae.outlier_removal =     True
esbae.smooth_ts =           True

# Select the algorithms to run
esbae.run_cusum =           True
esbae.run_bfast =           True
esbae.run_ts_metrics =      True
esbae.run_bs_slope =        True
esbae.run_jrc_nrt =         True     # needs further debugging right now
esbae.run_ccdc =            True
esbae.run_land_trendr =     False     # not yet implemented
esbae.run_global_products = True

### 4 - Advanced Settings
The code below refers to various time series algorithms and sets their parameters. It also calls various relevant global products available on GEE.

In [4]:
esbae.bfast = {
    'run': esbae.run_bfast,
    'start_monitor': esbae.monitor_start,
    'freq': 365,
    'k': 3,
    'hfrac': 0.25,
    'trend': True,
    'level': 0.05,
    'backend': 'python'
}

esbae.cusum = {
    'run': esbae.run_cusum,
    'nr_of_bootstraps': 1000
}

esbae.bs_slope = {
    'run': esbae.run_bs_slope,
    'nr_of_bootstraps': 1000
}

esbae.ts_metrics = {
    'run': esbae.run_ts_metrics,
    'bands': ['red', 'nir', 'swir1', 'swir2', 'ndfi', 'brightness', 'greenness', 'wetness'],
    'metrics': ['mean', 'stddev', 'min', 'max'],  # DO NOT CHANGE YET
    'outlier_removal': False,
    'z_threshhold': 3
}

esbae.ccdc = {
    'run': esbae.run_ccdc,
    'breakpointBands': ['green', 'red', 'nir', 'swir1', 'swir2'],
    'tmaskBands': ['green', 'swir2'],
    'minObservations': 6,
    'chiSquareProbability': 0.99,
    'minNumOfYearsScaler': 1,
    'dateFormat': 2,
    'lambda': 20,
    'maxIterations': 1000
}

esbae.land_trendr = {
    'run': esbae.run_land_trendr,
    'maxSegments': 6,
    'spikeThreshold': 0.9,
    'vertexCountOvershoot': 3,
    'preventOneYearRecovery': True,
    'recoveryThreshold': 0.25,
    'pvalThreshold': 0.05,
    'bestModelProportion': 0.75,
    'minObservationsNeeded': 3
}

esbae.jrc_nrt = {
    'run': esbae.run_jrc_nrt
}

esbae.global_products = {
    'run': esbae.run_global_products,
    'gfc': True,
    'tmf': True,
    'tmf_years': True,
    'esa_lc20': True,
    'copernicus_lc': True,
    'esri_lc': True,
    'lang_tree_height': True,
    'potapov_tree_height': True,
    'elevation': True,
    'dynamic_world_tree_prob': True,
    'dynamic_world_class_mode': True
}

esbae.py_workers = 75
esbae.ee_workers = 15

### 5 - Run the dataset augmentation

In [5]:
esbae.augment()

INFO: Verifying parameter settings...
INFO: Initializing dataset augmentation routine...
INFO: Accumulating batch files of 1/1...
INFO: Running the dataset augmentation routines on 178 points...
INFO: Cleaning the time-series from outliers.
INFO: Outlier removal finished in: 0:00:00.363487
INFO: Smoothing the time-series with a rolling mean.
INFO: Time-series smoothing finished in: 0:00:00.598824
INFO: Creating a subset of the time-series for the full analysis period (calibration & monitoring).
INFO: Time-series subsetting finished in: 0:00:00.102829
INFO: Running CCDC
INFO: Running the B-FAST algorithm on current batch of points.
INFO: BFAST finished in: 0:00:03.953350
INFO: Running the CuSum algorithm on current batch of points.
INFO: CuSum finished in: 0:00:02.270201
INFO: Running the time-scan on current batch of points.
INFO: Time-scan metrics for band red finished in: 0:00:00.053703
INFO: Time-scan metrics for band nir finished in: 0:00:00.052616
INFO: Time-scan metrics for band 