![header](../figures/DC_SSH_QG_mapping-banner.png)

***
**Authors:**  Datlas & IGE <br>
**Copyright:** 2022 Datlas & IGE <br>
**License:** MIT

<div class="alert alert-block alert-success">
    <h1><center>Evaluate the baseline</center></h1>
    <h5><center>This notebook illustrates how to evaluate a mapping reconstruction, here, produced with the baseline.</h5> 
</div> 




# III- Demo. Data Evaluation

An example of SSH reconstruction has been produced in the "baseline_oi.ipynb" notebook. Here, a data evaluation is proposed. 

The notebook is structured as follow:

     1) reading of reference and reconstructed SSH fields, 
     2) make field on similar spatio-temporal grid and 
     3) comparison of reconstrusted and reference SSH fields (statistical/spectral comparison)
     4) display leaderboard metrics

- the statistical comparison is based on the RMSE-based score $RMSE_{S}$ defined as:

$$RMSE_{S}(t) = 1 - \frac{RMSE(t)}{RMS(SSH_{true})}$$


where RMS is the root mean square function, and with:


$$RMSE(t) = \sqrt{ \frac{1}{N} \sum_{i=1}^N (SSH_{reconstruction}(t,i) - SSH_{true}(t,i))^2   }$$


N is the number of pixels included in the study domain.

- the spectral analysis is based on the wavenumber-frequency power spectral density score $PSD_{S}^{wf}$ defined as:

$$PSD_{S}^{wf} = 1 - \frac{PSD^{wf}(SSH_{reconstruction} - SSH_{true})}{PSD^{wf}(SSH_{true})}$$

The **Leaderboard** summarizes the following key metrics:

   1) $\overline{RMSE_{S}(t)}$ : mean RMSE score (normalized RMSE) 
 
   2) $\sigma(RMSE_{S}(t))$ : standard deviation of the RMSE(t) => give an insight on the temporal stability of the reconstruction
 
   3) $\lambda_{x}$ : the minimum spatial scale resolved (wavelength in degree) 
 
   4) $\lambda_{t}$ : the minimum temporal scale resolved (wavelength in days)


In [None]:
import xarray as xr
import numpy
import warnings
import xrft
import logging
import os
import sys
import pandas as pd
warnings.filterwarnings('ignore')

In [None]:
logger = logging.getLogger()
logger.setLevel(logging.INFO)

In [None]:
sys.path.append('..')

In [None]:
from src.mod_oi import *
from src.mod_inout import *
from src.mod_regrid import *
from src.mod_eval import *
from src.mod_plot import *

In [None]:
time_min = numpy.datetime64('2012-10-22')                # domain min time
time_max = numpy.datetime64('2012-12-02')                # domain max time

### 1) reading of reference and reconstructed SSH fields

In [None]:
# reconstrcuted SSH field
input_file = '../results/ssh_reconstruction_2012-10-22-2012-12-02_jason1.nc'
ds_oi1_grid = xr.open_dataset(input_file)
ds_oi1_grid

In [None]:
# reference SSH field
# Note: dc_ref is used for reggriding step
dc_ref = xr.open_mfdataset('../dc_ref/*.nc', combine='nested', concat_dim='time', parallel=True)
dc_ref

In [None]:
# Note: dc_ref_sample is used for reggriding step (daily mean is enougth !!!!!)
dc_ref_sample = dc_ref.sel(time=slice(time_min, time_max)).resample(time='1D').mean()
del dc_ref
dc_ref_sample

### 2) make field on similar spatio-temporal grid (regridding)

In [None]:
# Regrid    
ds_oi1_regrid = oi_regrid(ds_oi1_grid, dc_ref_sample)

### 3) comparison of reconstrusted and reference SSH fields (statistical/spectral comparison)

In [None]:
# Eval
rmse_t_oi1, rmse_xy_oi1, leaderboard_nrmse, leaderboard_nrmse_std = rmse_based_scores(ds_oi1_regrid, dc_ref_sample)
psd_oi1, leaderboard_psds_score, leaderboard_psdt_score  = psd_based_scores(ds_oi1_regrid, dc_ref_sample)

In [None]:
# Define outputs
output_directory = '../results/'
if not os.path.exists(output_directory):
    os.mkdir(output_directory)
filename_rmse_t = output_directory + 'rmse_t_ssh_reconstruction_2012-10-22-2012-12-02_jason1.nc'
filename_rmse_xy = output_directory + 'rmse_xy_ssh_reconstruction_2012-10-22-2012-12-02_jason1.nc'
filename_psd = output_directory + 'psd_ssh_reconstruction_2012-10-22-2012-12-02_jason1.nc'
filename_dc_ref_sample = output_directory + 'dc_ref_2012-10-22-2012-12-02_sample.nc'
filename_oi_regrid = output_directory + 'ssh_reconstruction_regridded_2012-10-22-2012-12-02_jason1.nc'

In [None]:
# Save results
rmse_t_oi1.to_netcdf(filename_rmse_t)
rmse_xy_oi1.to_netcdf(filename_rmse_xy)
psd_oi1.name = 'psd_score'
psd_oi1.to_netcdf(filename_psd)
dc_ref_sample.to_netcdf(filename_dc_ref_sample)
ds_oi1_regrid.to_netcdf(filename_oi_regrid)

### 4) display leaderboard metrics

In [None]:
data = [['demo 1 nadir', 
         leaderboard_nrmse, 
         leaderboard_nrmse_std, 
         leaderboard_psds_score, 
         leaderboard_psdt_score,
        'Covariances not optimized',
        'example_data_eval.ipynb']]
Leaderboard = pd.DataFrame(data, 
                           columns=['Method', 
                                    "µ(RMSE) ", 
                                    "σ(RMSE)", 
                                    'λx (degree)', 
                                    'λt (days)', 
                                    'Notes',
                                    'Reference'])

In [None]:
print("Summary of the leaderboard metrics:")
Leaderboard

In [None]:
print(Leaderboard.to_markdown())