# Influence of the analysis parameters
The notebook shows how the main analysis parameters:
* $L$ = correlation length (can be horizontal, vertical or temporal);     
* $\epsilon^2$ = noise-to-signal ratio;     

influence the resulting interpolated field.

In [None]:
import Pkg
Pkg.activate("../..")
Pkg.instantiate()
using DIVAnd
using Makie, CairoMakie, GeoMakie
using Dates
using Statistics
include("../config.jl")

## Data reading
From a netCDF file containing salinity measurements in the Provençal Basin.

In [None]:
varname = "Salinity"
download_check(salinityprovencalfile, salinityprovencalfileURL)
obsval, obslon, obslat, obsdepth, obstime, obsid =
    loadobs(Float64, salinityprovencalfile, varname);

## Topography and grid definition

See the notebook on [bathymetry](../2-Preprocessing/2-01-topography.ipynb) for more explanations about the bathymetry preparation.

In [None]:
dx = dy = 0.125 / 2.0
lonr = 2.5:dx:12.0
latr = 42.3:dy:44.6

mask, (pm, pn), (xi, yi) = DIVAnd_rectdom(lonr, latr)

bathname = gebco04file
download_check(gebco04file, gebco04fileURL)

### Create the mask

In [None]:
bx, by, b = load_bath(bathname, true, lonr, latr)

mask = falses(size(b, 1), size(b, 2))

for j = 1:size(b, 2)
    for i = 1:size(b, 1)
        mask[i, j] = b[i, j] >= 1.0
    end
end

## Data selection for example

Cross validation, error calculations etc. assume independant data.     
Hence we do not take high-resolution vertical profiles with all data but restrict yourself to specific small depth range.     
For this example we select data from August data near surface (depth between 0 and 1 m).

In [None]:
sel = (obsdepth .< 1) .& (Dates.month.(obstime) .== 8)

obsval = obsval[sel]
obslon = obslon[sel]
obslat = obslat[sel]
obsdepth = obsdepth[sel]
obstime = obstime[sel]
obsids = obsid[sel];
@show(size(obsval))
checkobs((obslon, obslat, obsdepth, obstime), obsval, obsid)

## Analysis
### Simple analysis
The function to call is `DIVAndrun`:
```
fi,s = DIVAndrun(mask,(pm,pn),(xi,yi),(obslon,obslat),obsval.-mean(obsval),
            len,epsilon2;alphabc=0);
```
where 
Analysis `fi` using mean data as background.        
Structure `s` is stored for later use.     

* `mask` is the land-sea mask,     
* `(pm,pn)` are the metrics (inverse of resolution)     
* `(xi,yi)` is the grid on which the interpolation is performed.   

These 3 inputs were created before using `DIVAnd_rectdom`.

* `(obslon,obslat)` are the positions of the observations, obtained using `loadobs` (beginning of this notebook).     
* `obsval.-mean(obsval)` is the data anomalies (observations minus mean value).

`len` (correlation length) and `epsilon2` (noise-to-signal ratio) are the main two analysis paramerers that we will test hereinafter.

### Loop on the parameters
Perform the analysis for different values for $L$ and $\epsilon^2$.   
<div class="alert alert-block alert-warning">
⚠️ Don't forget to add back the mean value when plotting the results: <code>fi .+ mean(obsval)</code>.
</div>

In [None]:
fig = Figure()
for i = 1:3
    for j = 0:3
        len = 5 * 10.0^(i - 2)
        epsilon2 = 10.0^(-2 * j + 2)

        # Perform analysis
        fi, s = DIVAndrun(
            mask,
            (pm, pn),
            (xi, yi),
            (obslon, obslat),
            obsval .- mean(obsval),
            len,
            epsilon2;
            alphabc = 0,
        )

        # Make plot
        ga = GeoAxis(
            fig[i, j];
            dest = "+proj=merc",
            title = "L=$len\nepsilon2=$(round(epsilon2, digits=6))",
        )
        heatmap!(
            ga,
            lonr,
            latr,
            fi .+ mean(obsval),
            interpolate = false,
            colorrange = (37.0, 38.5),
        )

    end
end
colgap!(fig.layout, 0)
rowgap!(fig.layout, 0)
fig

## But which combination to use ?

Visual inspection tells you which analyses are obvisouly (?) too noisy or too smooth.     
Also some suspect data points are seen when $L$ and $\epsilon^2$ are small.     
Let's create some figures showing the analysis, data values and residuals.

### Data values over analysis

In [None]:
bx[1], bx[end], by[1], by[end]

In [None]:
len = 1
epsilon2 = 1
fi, s = DIVAndrun(
    mask,
    (pm, pn),
    (xi, yi),
    (obslon, obslat),
    obsval .- mean(obsval),
    len,
    epsilon2,
);

fig = Figure()
ga = GeoAxis(
    fig[1, 1];
    dest = "+proj=merc",
    xticks = 2.0:1.0:14,
    yticks = 42.0:1.0:45.0,
    title = "Observations and analysis with L=$len, epsilon2=$(round(epsilon2, digits=6))",
)
hm = heatmap!(
    ga,
    lonr,
    latr,
    fi .+ mean(obsval),
    interpolate = false,
    colorrange = (37.0, 38.5),
)
sc = scatter!(ga, obslon, obslat, color = obsval, colorrange = (37.0, 38.5))
contourf!(ga, bx, by, b, levels = [-1e5, 0, 1.0], colormap = Reverse(:binary))
xlims!(bx[1], bx[end])
ylims!(by[1], by[end])
Colorbar(fig[2, 1], sc, vertical = false, label = "S")
fig

### Residuals
We get them using `DIVAnd_residualobs`.

In [None]:
dataresiduals = DIVAnd_residualobs(s, fi)
rscale = sqrt(var(obsval))

fig = Figure()
ga = GeoAxis(
    fig[1, 1];
    dest = "+proj=merc",
    title = "Residuals",
    xticks = 2.0:1.0:14,
    yticks = 42.0:1.0:45.0,
)
sc = GeoMakie.scatter!(
    ga,
    obslon,
    obslat,
    color = dataresiduals,
    colorrange = (-rscale, rscale),
    colormap = Reverse(:RdBu)
)
contourf!(ga, bx, by, b, levels = [-1e5, 0, 1.0], colormap = Reverse(:binary))
xlims!(bx[1], bx[end])
ylims!(by[1], by[end])
Colorbar(fig[2, 1], sc, vertical = false, label = "S")
fig

### Observed values vs. residuals

In [None]:
fig = Figure()
ax = Axis(
    fig[1, 1],
    xlabel = "Data values",
    ylabel = "Residuals",
    title = "Residuals as function of value",
)
scatter!(ax, obsval, dataresiduals)
fig


Note how residuals change (decrease if you decrease $\epsilon^2$). 

<div class="alert alert-block alert-warning">
⚠️ Low residuals are not necessarily a good sign as the analysis used the data points to which you compare your analysis.
</div>

In [None]:
var(dataresiduals[(dataresiduals.!==NaN)]), var(obsval), var(fi[(fi.!==NaN)])

## Exercise

Change parameters $L$ or $\epsilon^2$ and see what happens       
(do not rerun the whole notebook, just the last cells)

<div class="alert alert-block alert-warning">
<h2> ⚠️ Important take-home messages</h2>
</div>

1. Remember analyses are not very sensitive to changes in $L$ or $\epsilon^2$ if (in 2D) $L$ $\sqrt{\epsilon^2}$ remains constant and data coverage is reasonable.     
2. To see changes in the analysis, you need significant changes in $L$ or $\epsilon^2$ changing $L$ $\sqrt{\epsilon^2}$.         
3. A few percent changes on the parameters does not really modify things even if $L$ $\sqrt{\epsilon^2}$ changes.