# IV. Optimization of parameters

---

In the previous [tutorial](03_vi_time-series_plot.ipynb), we can see that using the default parameters in each of the methods is not really efficient. the detection dates can be very far from the reference date. It also happens that the methods are too sensitive to the slightest variations in the time series. We therefore propose here to optimize the parameters to minimize the difference between the reference date and the detection date.

---

# 0. Install NRT package (optional)

Depending of your Python environment, you can install the NRT Package in different ways. Here, we choose to install the package in our current session (uncomment the line of code if necessary).

In [None]:
#!pip install nrt

## 1. Import librairies
First, we import some basic librairies.

In [1]:
import os
import pickle
import warnings
warnings.filterwarnings('ignore')
#from IPython.display import display

We also import the spatial librairies

In [2]:
import datetime as dt
import geopandas as gpd
import xarray as xr

...And an add-on to the NRT package, available only in this tutorial. It allows you to test different parameters on NRT methods in order to minimize the lag between the reference date and the detection date.

In [3]:
from nrt_utils import Nrt_run, params_bounds

---

## 2. Loading samples and VI time-series

### 2.1. Point samples

We are working here on a sample of 50 points. Each point is described by an identifier (`id`) and a date (`SAMPLE_1`) of dieback beginning obtained with visual interpretation.

In [4]:
# open samples
vector = gpd.read_file(r"data_ref/plot01_samples_2.shp")
vector.head()

Unnamed: 0,id,SAMPLE_1,geometry
0,1,18437,POINT (3953934.425 2913558.843)
1,2,18437,POINT (3954234.425 2913558.843)
2,3,18414,POINT (3954434.425 2913558.843)
3,4,18437,POINT (3955234.425 2913058.843)
4,5,18232,POINT (3954534.425 2912958.843)


For this exercise, we decide to keep 70% of the points for optimizing the method parameters and 20% for quality control.

In [5]:
vector_train=vector.sample(frac=0.7)
vector_test=vector.drop(vector_train.index)

print('training samples:', len(vector_train))
print('test samples    :', len(vector_test))

training samples: 35
test samples    : 15


The method `Nrt_run` of `nrt_utils` need two information:
- the field name of the sampling vector describing the reference date
- the "pivot" date that separates the fitting and the monitoring periods

In [6]:
fdate = 'SAMPLE_1'
# pivot date (yyyy, mm, dd) fit/monitor
pdate = dt.datetime(2018, 6, 30)


### 2.2. Loading VI time-series

In this example, we test one of the avalaible datasets:

- vi-mask: VI time-series with cloud masks applied

In [7]:
with open('nrt_var.txt', 'rb') as f:
    dict_var = pickle.load(f)

startdate = dict_var['startdate']
enddate = dict_var['enddate']
output_dir = dict_var['output_dir']

In [8]:
vi_mask = xr.open_dataset(
    os.path.join(output_dir, f'S2TS_{startdate}-{enddate}_vi-mask.nc')
    )
crswir = vi_mask.crswir

---

## 3. Finding the most suitable parameters
### 3.1. Generating different parameters configurations

Here, for each method's parameter, we specify the minimum value, the maximum value and the step. Then, we call the `params_bounds` function to generate all the possible parameters combinations.

In [9]:
method_iqr = 'IQR'
params_config_iqr = {'sensitivity': (0.0, 5.0, 0.2),
                     'boundary': (1, 7, 2)}
params_test_iqr = params_bounds(params_config_iqr)
print(f"total number of {method_iqr} tests: {len(params_test_iqr)}")

method_ewma = 'EWMA'
params_config_ewma = {'sensitivity': (1, 10, 2),
                      'lambda_': (0, 1, 0.25),
                      'threshold_outlier': (1, 20, 2)}
params_test_ewma = params_bounds(params_config_ewma)
print(f"total number of {method_ewma} tests: {len(params_test_ewma)}")

method_cusum = 'CUSUM'
params_config_cusum = {'sensitivity': (0.0, 10.0, 0.2)}
params_test_cusum = params_bounds(params_config_cusum)
print(f"total number of {method_cusum} tests: {len(params_test_cusum)}")

method_mosum = 'MOSUM'
params_config_mosum = {'sensitivity': (0.001, 0.05, 0.001),
                       'h': [0.25, 0.5, 1]}
params_test_mosum = params_bounds(params_config_mosum)
print(f"total number of {method_mosum} tests: {len(params_test_mosum)}")

total number of IQR tests: 75
total number of EWMA tests: 200
total number of CUSUM tests: 50
total number of MOSUM tests: 147


### 3.2. Testing in parallel

To improve the calculation, the tests are distributed with the `multipropcessing` Python package. So, we can define the number of processors to use.

In [10]:
nb_process = 10

#### 3.2.1. IQR method

Now, the function `nrt_stat_in_parallel` runs the method with each parameters configuration over the training sampling points.

In [11]:
if __name__ == "__main__":  # mandatory for using multiproc pooling
    optim_iqr = Nrt_run(crswir, method_iqr, pdate, vector_train, fdate)
    results_iqr = optim_iqr.nrt_stat_in_parallel(params_test_iqr, nb_process)

results_iqr.to_csv(f'{output_dir}/optim_{method_iqr}_crswir.csv')
results_iqr.head()

duration: 1.6614600896835328 min


Unnamed: 0,sensitivity,boundary,tp,count,mean,std,min,25%,50%,75%,max
0,0.0,1,1.0,35.0,630.89,125.71,335.0,575.0,685.0,724.0,780.0
1,0.0,3,1.0,35.0,601.11,138.6,200.0,534.0,650.0,708.0,772.0
2,0.0,5,1.0,35.0,518.2,194.19,37.0,393.0,580.0,680.0,760.0
3,0.2,1,1.0,35.0,623.74,129.19,287.0,574.0,673.0,717.5,780.0
4,0.2,3,1.0,35.0,513.06,193.58,57.0,365.5,570.0,685.0,772.0


At the end, we deliver summary statistics for each parameters configuration.

In [12]:
fields = ['mean', 'std', 'min', '25%', '50%', '75%', 'max']

for i in fields:
    print(f"parameters for min value in {i}:")
    display(results_iqr.loc[results_iqr[i] == results_iqr[i].min()].sort_values(by=['mean']).head())

parameters for min value in mean:


Unnamed: 0,sensitivity,boundary,tp,count,mean,std,min,25%,50%,75%,max
20,1.2,5,1.0,35.0,51.51,48.27,0.0,18.5,37.0,65.0,212.0


parameters for min value in std:


Unnamed: 0,sensitivity,boundary,tp,count,mean,std,min,25%,50%,75%,max
20,1.2,5,1.0,35.0,51.51,48.27,0.0,18.5,37.0,65.0,212.0


parameters for min value in min:


Unnamed: 0,sensitivity,boundary,tp,count,mean,std,min,25%,50%,75%,max
20,1.2,5,1.0,35.0,51.51,48.27,0.0,18.5,37.0,65.0,212.0
17,1.0,5,1.0,35.0,55.26,52.88,0.0,24.5,35.0,61.0,212.0
28,1.8,3,1.0,35.0,56.83,57.01,0.0,10.0,35.0,80.0,227.0
29,1.8,5,0.97,34.0,58.12,56.77,0.0,15.0,42.0,74.25,212.0
34,2.2,3,1.0,35.0,58.26,61.71,0.0,11.0,35.0,86.0,227.0


parameters for min value in 25%:


Unnamed: 0,sensitivity,boundary,tp,count,mean,std,min,25%,50%,75%,max
28,1.8,3,1.0,35.0,56.83,57.01,0.0,10.0,35.0,80.0,227.0


parameters for min value in 50%:


Unnamed: 0,sensitivity,boundary,tp,count,mean,std,min,25%,50%,75%,max
17,1.0,5,1.0,35.0,55.26,52.88,0.0,24.5,35.0,61.0,212.0
28,1.8,3,1.0,35.0,56.83,57.01,0.0,10.0,35.0,80.0,227.0
37,2.4,3,1.0,35.0,58.14,59.78,5.0,12.5,35.0,94.0,227.0
34,2.2,3,1.0,35.0,58.26,61.71,0.0,11.0,35.0,86.0,227.0
40,2.6,3,1.0,35.0,60.4,64.28,0.0,11.0,35.0,98.0,235.0


parameters for min value in 75%:


Unnamed: 0,sensitivity,boundary,tp,count,mean,std,min,25%,50%,75%,max
17,1.0,5,1.0,35.0,55.26,52.88,0.0,24.5,35.0,61.0,212.0


parameters for min value in max:


Unnamed: 0,sensitivity,boundary,tp,count,mean,std,min,25%,50%,75%,max
20,1.2,5,1.0,35.0,51.51,48.27,0.0,18.5,37.0,65.0,212.0
17,1.0,5,1.0,35.0,55.26,52.88,0.0,24.5,35.0,61.0,212.0
23,1.4,5,1.0,35.0,57.2,53.6,5.0,18.5,37.0,75.0,212.0
29,1.8,5,0.97,34.0,58.12,56.77,0.0,15.0,42.0,74.25,212.0
32,2.0,5,0.97,34.0,60.68,59.95,0.0,15.0,37.5,96.75,212.0


Here, we can see that the configuration *"sensitivity: **1.2** | boundary: **5**"* seems to give the best results. On **average** the lag between the reference date and the detection date is about **45 / 50 days** and for 75% of points, it does not exceed 60 / 70 days.

Now we can compare these results with the test dataset:

In [13]:
params_config_iqr = {'sensitivity': [1.2],
                     'boundary': [5]}
params_test_iqr = params_bounds(params_config_iqr)
optim_iqr = Nrt_run(crswir, method_iqr, pdate, vector_test, fdate)
results_iqr = optim_iqr.nrt_stat(params_test_iqr[0])

print(f"Statistics for the selected parameters:{params_test_iqr[0]}\n---")
results_iqr

Statistics for the selected parameters:{'sensitivity': 1.2, 'boundary': 5}
---


{'sensitivity': 1.2,
 'boundary': 5,
 'tp': 0.93,
 'count': 14.0,
 'mean': 32.0,
 'std': 20.99,
 'min': 0.0,
 '25%': 21.75,
 '50%': 30.0,
 '75%': 46.5,
 'max': 72.0}

#### 3.2.2. EWMA method

In [14]:
if __name__ == "__main__":
    optim_ewma = Nrt_run(crswir, method_ewma, pdate, vector_train, fdate)
    results_ewma = optim_ewma.nrt_stat_in_parallel(params_test_ewma, nb_process)

results_ewma.to_csv(f'{output_dir}/optim_{method_ewma}_crswir.csv')
results_ewma.head()

duration: 4.31643765370051 min


Unnamed: 0,sensitivity,lambda_,threshold_outlier,tp,count,mean,std,min,25%,50%,75%,max
0,1.0,0.0,1.0,1.0,35.0,637.2,127.97,337.0,576.0,687.0,725.0,782.0
1,1.0,0.0,3.0,1.0,35.0,637.97,127.51,337.0,576.0,690.0,725.0,782.0
2,1.0,0.0,5.0,1.0,35.0,637.97,127.51,337.0,576.0,690.0,725.0,782.0
3,1.0,0.0,7.0,1.0,35.0,637.97,127.51,337.0,576.0,690.0,725.0,782.0
4,1.0,0.0,9.0,1.0,35.0,637.97,127.51,337.0,576.0,690.0,725.0,782.0


In [15]:
for i in fields:
    print(f"parameters for min value in {i}:")
    display(results_ewma.loc[results_ewma[i] == results_ewma[i].min()].sort_values(by=['mean']).head())

parameters for min value in mean:


Unnamed: 0,sensitivity,lambda_,threshold_outlier,tp,count,mean,std,min,25%,50%,75%,max
131,7.0,0.25,3.0,0.03,1.0,48.0,,48.0,48.0,48.0,48.0,48.0


parameters for min value in std:


Unnamed: 0,sensitivity,lambda_,threshold_outlier,tp,count,mean,std,min,25%,50%,75%,max
132,7.0,0.25,5.0,0.97,34.0,54.88,49.45,2.0,10.0,51.5,83.25,217.0


parameters for min value in min:


Unnamed: 0,sensitivity,lambda_,threshold_outlier,tp,count,mean,std,min,25%,50%,75%,max
112,5.0,0.75,5.0,0.89,31.0,63.55,68.97,0.0,12.0,35.0,106.5,235.0
91,5.0,0.25,3.0,0.54,19.0,70.16,70.64,0.0,28.5,37.0,88.0,295.0
102,5.0,0.5,5.0,1.0,35.0,75.77,137.05,0.0,6.5,23.0,88.0,658.0
133,7.0,0.25,7.0,1.0,35.0,78.31,121.24,0.0,10.0,37.0,99.0,533.0
92,5.0,0.25,5.0,1.0,35.0,82.26,140.78,0.0,13.5,30.0,86.5,658.0


parameters for min value in 25%:


Unnamed: 0,sensitivity,lambda_,threshold_outlier,tp,count,mean,std,min,25%,50%,75%,max
102,5.0,0.5,5.0,1.0,35.0,75.77,137.05,0.0,6.5,23.0,88.0,658.0


parameters for min value in 50%:


Unnamed: 0,sensitivity,lambda_,threshold_outlier,tp,count,mean,std,min,25%,50%,75%,max
102,5.0,0.5,5.0,1.0,35.0,75.77,137.05,0.0,6.5,23.0,88.0,658.0


parameters for min value in 75%:


Unnamed: 0,sensitivity,lambda_,threshold_outlier,tp,count,mean,std,min,25%,50%,75%,max
131,7.0,0.25,3.0,0.03,1.0,48.0,,48.0,48.0,48.0,48.0,48.0


parameters for min value in max:


Unnamed: 0,sensitivity,lambda_,threshold_outlier,tp,count,mean,std,min,25%,50%,75%,max
131,7.0,0.25,3.0,0.03,1.0,48.0,,48.0,48.0,48.0,48.0,48.0


#### 3.2.3. CUSUM method

In [16]:
if __name__ == "__main__":
    optim_cusum = Nrt_run(crswir, method_cusum, pdate, vector_train, fdate)
    results_cusum = optim_cusum.nrt_stat_in_parallel(params_test_cusum, nb_process)

results_cusum.to_csv(f'{output_dir}/optim_{method_cusum}_crswir.csv')
results_cusum.head()

duration: 1.0859586238861083 min


Unnamed: 0,sensitivity,tp,count,mean,std,min,25%,50%,75%,max
0,0.0,0.51,18.0,166.17,106.38,42.0,117.25,157.0,201.25,495.0
1,0.2,0.94,33.0,386.64,252.95,8.0,140.0,497.0,600.0,695.0
2,0.4,0.94,33.0,468.33,245.14,12.0,197.0,588.0,673.0,702.0
3,0.6,0.94,33.0,527.33,217.42,17.0,395.0,615.0,688.0,760.0
4,0.8,0.97,34.0,549.62,192.14,40.0,487.5,612.5,697.25,765.0


In [17]:
for i in fields:
    print(f"parameters for min value in {i}:")
    display(results_cusum.loc[results_cusum[i] == results_cusum[i].min()].sort_values(by=['mean']).head())


parameters for min value in mean:


Unnamed: 0,sensitivity,tp,count,mean,std,min,25%,50%,75%,max
40,8.0,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
41,8.2,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
42,8.4,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
43,8.6,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
44,8.8,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0


parameters for min value in std:


Unnamed: 0,sensitivity,tp,count,mean,std,min,25%,50%,75%,max
40,8.0,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
41,8.2,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
42,8.4,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
43,8.6,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
44,8.8,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0


parameters for min value in min:


Unnamed: 0,sensitivity,tp,count,mean,std,min,25%,50%,75%,max
1,0.2,0.94,33.0,386.64,252.95,8.0,140.0,497.0,600.0,695.0
9,1.8,0.94,33.0,386.64,252.95,8.0,140.0,497.0,600.0,695.0


parameters for min value in 25%:


Unnamed: 0,sensitivity,tp,count,mean,std,min,25%,50%,75%,max
44,8.8,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
45,9.0,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
46,9.2,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
47,9.4,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
40,8.0,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0


parameters for min value in 50%:


Unnamed: 0,sensitivity,tp,count,mean,std,min,25%,50%,75%,max
44,8.8,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
45,9.0,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
46,9.2,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
47,9.4,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
40,8.0,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0


parameters for min value in 75%:


Unnamed: 0,sensitivity,tp,count,mean,std,min,25%,50%,75%,max
40,8.0,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
41,8.2,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
42,8.4,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
43,8.6,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
44,8.8,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0


parameters for min value in max:


Unnamed: 0,sensitivity,tp,count,mean,std,min,25%,50%,75%,max
40,8.0,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
41,8.2,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
42,8.4,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
43,8.6,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0
44,8.8,0.51,18.0,149.89,80.36,35.0,96.25,150.0,195.5,325.0


#### 3.2.4. MOSUM method

In [18]:
if __name__ == "__main__":
    optim_mosum = Nrt_run(crswir, method_mosum, pdate, vector_train, fdate)
    results_mosum = optim_mosum.nrt_stat_in_parallel(params_test_mosum, nb_process)

results_mosum.to_csv(f'{output_dir}/optim_{method_mosum}_crswir.csv')
results_mosum.head()

duration: 3.2865004817644756 min


Unnamed: 0,sensitivity,h,tp,count,mean,std,min,25%,50%,75%,max
0,0.001,0.25,0.97,34.0,100.56,135.36,5.0,15.0,32.5,125.0,533.0
1,0.001,0.5,0.97,34.0,91.56,124.6,0.0,22.25,43.5,87.25,458.0
2,0.001,1.0,0.94,33.0,60.55,85.72,0.0,15.0,35.0,57.0,378.0
3,0.002,0.25,0.97,34.0,100.53,136.37,0.0,15.0,32.5,125.0,533.0
4,0.002,0.5,0.97,34.0,106.41,143.49,0.0,18.0,36.0,102.25,488.0


In [19]:
for i in fields:
    print(f"parameters for min value in {i}:")
    display(results_mosum.loc[results_mosum[i] == results_mosum[i].min()].sort_values(by=['mean']).head())

parameters for min value in mean:


Unnamed: 0,sensitivity,h,tp,count,mean,std,min,25%,50%,75%,max
2,0.001,1.0,0.94,33.0,60.55,85.72,0.0,15.0,35.0,57.0,378.0


parameters for min value in std:


Unnamed: 0,sensitivity,h,tp,count,mean,std,min,25%,50%,75%,max
2,0.001,1.0,0.94,33.0,60.55,85.72,0.0,15.0,35.0,57.0,378.0


parameters for min value in min:


Unnamed: 0,sensitivity,h,tp,count,mean,std,min,25%,50%,75%,max
2,0.001,1.0,0.94,33.0,60.55,85.72,0.0,15.0,35.0,57.0,378.0
5,0.002,1.0,0.94,33.0,70.76,101.81,0.0,15.0,30.0,70.0,408.0
8,0.003,1.0,0.94,33.0,78.52,113.59,0.0,18.0,28.0,70.0,425.0
14,0.005,1.0,0.94,33.0,79.52,118.07,0.0,18.0,30.0,65.0,433.0
11,0.004,1.0,0.94,33.0,79.73,117.1,0.0,18.0,30.0,65.0,430.0


parameters for min value in 25%:


Unnamed: 0,sensitivity,h,tp,count,mean,std,min,25%,50%,75%,max
2,0.001,1.0,0.94,33.0,60.55,85.72,0.0,15.0,35.0,57.0,378.0
5,0.002,1.0,0.94,33.0,70.76,101.81,0.0,15.0,30.0,70.0,408.0
3,0.002,0.25,0.97,34.0,100.53,136.37,0.0,15.0,32.5,125.0,533.0
0,0.001,0.25,0.97,34.0,100.56,135.36,5.0,15.0,32.5,125.0,533.0
6,0.003,0.25,0.97,34.0,113.62,148.52,0.0,15.0,36.5,127.25,533.0


parameters for min value in 50%:


Unnamed: 0,sensitivity,h,tp,count,mean,std,min,25%,50%,75%,max
8,0.003,1.0,0.94,33.0,78.52,113.59,0.0,18.0,28.0,70.0,425.0


parameters for min value in 75%:


Unnamed: 0,sensitivity,h,tp,count,mean,std,min,25%,50%,75%,max
2,0.001,1.0,0.94,33.0,60.55,85.72,0.0,15.0,35.0,57.0,378.0


parameters for min value in max:


Unnamed: 0,sensitivity,h,tp,count,mean,std,min,25%,50%,75%,max
2,0.001,1.0,0.94,33.0,60.55,85.72,0.0,15.0,35.0,57.0,378.0
