# Optimizing analysis parameters $L$ and $\epsilon^2$
This notebooks shows different ways to optimize the analysis parameters.

In [None]:
using DIVAnd
using PyPlot
using Dates
using Statistics
include("../config.jl")

## Data reading

In [None]:
varname = "Salinity"
filename = salinityprovencalfile
download_check(salinityprovencalfile, salinityprovencalfileURL)

obsval,obslon,obslat,obsdepth,obstime,obsid = loadobs(Float64, filename, varname);

### Topography and grid definition

See the notebook on [bathymetry](../2-Preprocessing/06-topography.ipynb) for more explanations.

In [None]:
dx = dy = 0.125/2.
lonr = 2.5:dx:12.
latr = 42.3:dy:44.6

mask,(pm,pn),(xi,yi) = DIVAnd_rectdom(lonr, latr)

bathname = gebco04file
download_check(gebco04file, gebco04fileURL)

Reading the bathymetry and creating the mask

In [None]:
bx,by,b = load_bath(bathname,true,lonr,latr)

mask = falses(size(b,1),size(b,2))

for j = 1:size(b,2)
    for i = 1:size(b,1)
        mask[i,j] = b[i,j] >=1.0
    end
end

## Data selection for example

Cross validation, error calculations etc. assume independant data. Hence do not take high-resolution vertical profiles with all data but restrict yourself to specific small depth ranges. Here August data near the surface surface.     
We also perform a test on values to eliminate obvious outliers:

In [None]:
datadepth=1
depthprecision=0.5

sel = (obsdepth .< datadepth) .& 
(obsdepth .>= (datadepth-depthprecision)) .& 
(Dates.month.(obstime) .== 8) .& 
(obsval .> 37)

obsval = obsval[sel]
obslon = obslon[sel]
obslat = obslat[sel]
obsdepth = obsdepth[sel]
obstime = obstime[sel]
obsid = obsid[sel];
@show (size(obsval))
checkobs((obslon,obslat,obsdepth,obstime),obsval,obsid)

We modify data weight by taking into account close points.      
**⚠️** This operation is particularly costly when dealing with big datasets.

In [None]:
?DIVAnd.weight_RtimesOne

In [None]:
rdiag=1.0./DIVAnd.weight_RtimesOne((obslon,obslat),(0.03,0.03))
@show maximum(rdiag),mean(rdiag)

## Analysis

Analysis `fi` using mean data as background.      
Structure `s` is stored for later use.

In [None]:
len=1
epsilon2=1
fi,s = DIVAndrun(mask,(pm,pn),(xi,yi),(obslon,obslat),obsval.-mean(obsval),len,epsilon2*rdiag);

Generate some plots:
1. Analysis with data points
2. Data residuals
3. Residuals vs value

In [None]:
figure()
pcolor(xi,yi,fi/+mean(obsval),vmin=37,vmax=38.5);
colorbar(orientation="horizontal")
gca().set_aspect(1/cos(mean([ylim()...]) * pi/180))
dataresiduals=DIVAnd_residualobs(s,fi)
scatter(obslon,obslat,s=2,c=obsval,vmin=37,vmax=38.5)
aspectratio = 1/cos(mean([ylim()...]) * pi/180)
gca().set_aspect(aspectratio)
rscale=sqrt(var(obsval))

In [None]:
figure()
scatter(obslon,obslat,s=2,c=dataresiduals,vmin=-rscale,vmax=rscale,cmap=ColorMap("RdBu_r"));
colorbar(orientation="horizontal")
gca().set_aspect(aspectratio)
title("Residuals");

In [None]:
figure()
scatter(obsval, dataresiduals, s=2)
title("Residuals as function of value");
xlabel("Salinity")
ylabel("Residuals");

# Cross validation

Take out data and measure difference between these data points not used and the analysis. Three methods are implemented 
## Define method used
    # 1: full CV
    # 2: sampled CV
    # 3: GCV
    # 0: automatic choice between the three possible ones, default value

In [None]:
bestfactorl=ones(4)
bestfactore=ones(4)
for imeth=0:3

    bestfactorl[imeth+1],bestfactore[imeth+1], cvval,cvvalues, x2Ddata,y2Ddata,cvinter,xi2D,yi2D = DIVAnd_cv(mask,(pm,pn),(xi,yi),(obslon,obslat),obsval.-mean(obsval),len,epsilon2*rdiag,2,3,imeth);
    @show bestfactorl[imeth+1],bestfactore[imeth+1]
    
    subplot(2,2,imeth+1)
    pcolor(xi2D,yi2D,cvinter)#,vmin=0,vmax=0.04)
    colorbar()
    xlabel("Log10 scale factor L")
    ylabel("Log10 scale factor e2")
    plot(x2Ddata,y2Ddata,".")
    plot(log10.(bestfactorl[imeth+1]), log10.(bestfactore[imeth+1]),"o")
    title("Method $imeth")
end

**⚠️ WARNING:** any tests with resulting length scales being smaller than around 4 times the grid spacing are meaningless.

Analysis with optimized values:


In [None]:
newl=len*bestfactorl[2]
newe=epsilon2*bestfactore[2]

fi,s = DIVAndrun(mask,(pm,pn),(xi,yi),(obslon,obslat),obsval.-mean(obsval),newl,newe*rdiag);

In [None]:
"New L $(round(newl, digits=3)) and new e2 $(round(newe, digits=3))"

In [None]:
pcolor(xi,yi,fi.+mean(obsval),vmin=37,vmax=38.5);
colorbar(orientation="horizontal")
gca().set_aspect(1/cos(mean([ylim()...]) * pi/180))
dataresiduals=DIVAnd_residualobs(s,fi)
scatter(obslon,obslat,s=2,c=obsval,vmin=37,vmax=38.5)
title("New L  = $(round(newl, digits=3))\n new e2 = $(round(newe, digits=3))")
rscale=sqrt(var(obsval))

In [None]:
figure()
scatter(obslon,obslat,s=2,c=dataresiduals,vmin=-rscale,vmax=rscale);
colorbar(orientation="horizontal")
gca().set_aspect(1/cos(mean([ylim()...]) * pi/180))
title("Residuals")

In [None]:
figure()
scatter(obsval,dataresiduals)
title("Residuals as function of value");

## Only one parameter optimized

If $L$ is fixed by other calibration, you can decide to optimize only $\epsilon^2$:

In [None]:
lenfixed=1
epsilon2=.1
for imeth=0:3
    bestfactore[imeth+1], cvval,cvvalues, x2Ddata,cvinter,xi2D = 
    DIVAnd_cv(mask,(pm,pn),(xi,yi),(obslon,obslat),obsval.-mean(obsval),lenfixed,epsilon2*rdiag,0,4,imeth);

    subplot(2,2,imeth+1)
    plot(xi2D,cvinter,"-")
    xlabel("Log10 scale factor e2")
    plot(x2Ddata,cvvalues,".")
    plot(log10.(bestfactore[imeth+1]), cvval,"o")
    title("Method $imeth")
end

In [None]:
newl=lenfixed
newe=epsilon2*bestfactore[3]
@show newe
fi,s = DIVAndrun(mask,(pm,pn),(xi,yi),(obslon,obslat),obsval.-mean(obsval),newl,newe*rdiag);

In [None]:
figure()
pcolor(xi,yi,fi.+mean(obsval),vmin=37,vmax=38.5);
colorbar(orientation="horizontal")
gca().set_aspect(1/cos(mean([ylim()...]) * pi/180))
dataresiduals=DIVAnd_residualobs(s,fi)
scatter(obslon,obslat,s=2,c=obsval,vmin=37,vmax=38.5)

rscale=sqrt(var(obsval))

In [None]:
figure()
scatter(obslon,obslat,s=2,c=dataresiduals,vmin=-rscale,vmax=rscale);
colorbar(orientation="horizontal")
gca().set_aspect(1/cos(mean([ylim()...]) * pi/180))
title("Residuals")

In [None]:
figure()
scatter(obsval,dataresiduals)
title("Residuals as function of value");

## Adaptive method

G. DESROZIERS, L. BERRE, B. CHAPNIK and P. POLI      
Diagnosis of observation, background and analysis-error statistics in observation space        Q. J. R. Meteorol. Soc. (2005), 131, pp. 3385–3396 doi: [10.1256/qj.05.108](https://rmets.onlinelibrary.wiley.com/doi/abs/10.1256/qj.05.108).

This adaptive method activated by call with 0,0 points to sample.

In [None]:
myiterations=7
cvbest2=zeros(myiterations);
eps2=zeros(myiterations)
epsilon2=1
for i=1:myiterations
    cvval,factor=DIVAnd_cv(mask,(pm,pn),(xi,yi),(obslon,obslat),obsval.-mean(obsval),lenfixed,epsilon2*rdiag,0,0,3);
    eps2[i]=epsilon2;
    cvbest2[i]=cvval;
    epsilon2=epsilon2*factor
    @show epsilon2
end

Other direct application of Desroziers

In [None]:
myiterations=7

eps2=zeros(myiterations)
epsilon2=1
for i=1:myiterations
    fit,sit=DIVAndrun(mask,(pm,pn),(xi,yi),(obslon,obslat),obsval.-mean(obsval),lenfixed,epsilon2*rdiag);
    eps2[i]=epsilon2;
    factor=DIVAnd_adaptedeps2(sit,fit)
    epsilon2=epsilon2*factor
    @show epsilon2
end

Perform a new analysis with the optimized value of $\epsilon^2$:

In [None]:
newl=lenfixed
newe=epsilon2
@show newe
fi,s = DIVAndrun(mask,(pm,pn),(xi,yi),(obslon,obslat),obsval.-mean(obsval),newl,newe*rdiag);

In [None]:
figure()
pcolor(xi,yi,fi.+mean(obsval),vmin=37,vmax=38.5);
colorbar(orientation="horizontal")
gca().set_aspect(1/cos(mean([ylim()...]) * pi/180))
dataresiduals=DIVAnd_residualobs(s,fi)
scatter(obslon,obslat,s=2,c=obsval,vmin=37,vmax=38.5)

rscale=sqrt(var(obsval))

In [None]:
figure()
scatter(obslon,obslat,s=2,c=dataresiduals,vmin=-rscale,vmax=rscale);
colorbar(orientation="horizontal")
gca().set_aspect(1/cos(mean([ylim()...]) * pi/180))
title("Residuals")

In [None]:
figure()
scatter(obsval,dataresiduals)
title("Residuals as function of value");

In [None]:
var(dataresiduals),var(obsval),var(fi[(fi.!==NaN)])

## More information

using DIVAnd

In [None]:
?DIVAnd_cv

=========================================================================================================================

# Exercise

1. Redo for different data by changing the datadepth parameter introduced in the data selection part.      
(Compare surface behaviour with deeper regions by using another `datadepth` value).
2. Take out the data weight modification.
3. Possibly force the cross-validation method (use `?DIVAnd_cv`).
4. Once opimized, try to redo optimization with starting point being the first estimate.

## ⚠️ Important note
Remember: optimization relies on a series of hypotheses.       
In particular, data independance and isotropy is very often NOT ensured.       
When in doubt, increase $\epsilon^2$ and/or check for "duplicates".