# Estimating the correlation length 

* The correlation length $L$ determines whether two points separated by a given distance are correlated or not.
* `DIVAnd` includes several tools to estimate the correlation length.
* We will start with a 2D case and then consider the 3D case.

In [None]:
import Pkg
Pkg.activate("../..")
Pkg.instantiate()
using DIVAnd
using Makie, CairoMakie, GeoMakie
using Dates
using Statistics
using Random
using Printf
include("../config.jl")

## Data reading

The file `WOD-Salinity-Provencal.nc` contains salinity measurement obtained from the [World Ocean Database](https://www.nodc.noaa.gov/OC5/WOD/pr_wod.html). for the Provençal Basin (Mediterranean Sea). The profiles were not interpolated vertically.

The resulting correlation length can fluctuate a bit between runs, because the correlation is based on a collection of random pairs. 

### 2D case

* First lets consider only the data on a 2D surface (longitude and latitude)
* Load the data file if it is not already present.

In [None]:
varname = "Salinity"
download_check(salinityprovencalfile, salinityprovencalfileURL)

bathname = gebco04file
download_check(gebco04file, gebco04fileURL)

lonr = 3.:0.1:11.8
latr = 42.:0.1:44.5
depthr = [0.,5., 10., 15., 20., 25., 30., 40., 50., 66, 
    75, 85, 100, 112, 125, 135, 150, 175, 200, 225, 250, 
    275, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 
    800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 
    1300, 1350, 1400, 1450, 1500, 1600, 1750, 1850, 2000];
bathisglobal = true

### Extract the bathymetry and mask for plotting

In [None]:
bx, by, b = extract_bath(bathname, bathisglobal, lonr, latr);
_, _, mask = load_mask(bathname, bathisglobal, lonr, latr, depthr);

Load the data and print basic information about it that can be used for quality check.

In [None]:
obsval, obslon, obslat, obsdepth, obstime, obsid =
    loadobs(Float64, salinityprovencalfile, "Salinity")
checkobs((obslon, obslat, obsdepth, obstime), obsval, obsid)

In [None]:
f = Figure()
ax = Axis(f[1, 1], xlabel = "Salinity", title = "Number of observations")
hist!(ax, obsval, bins = 1000)
xlims!(ax, 37, 39)
f

### Removing suspect observations
<div class="alert alert-block alert-warning">
⚠️ It is quite important to remove outliers for the correlation length estimation, as they can have a significant impact.
</div>

For this example we select only the first month (January) and we remove some questionable observations.

In [None]:
badid = ["wod_015600782O","wod_015602753O","wod_015604717O","wod_015606732O",
         "wod_015607117O","wod_015607524O","wod_015607893O","wod_015924970O",
         "wod_015926914O","wod_015928739O","wod_016650545O","wod_008518725O",
         "wod_007643915O","wod_015875365O","wod_006614816O","wod_006614929O",
         "wod_006614937O","wod_007644875O","wod_009996947O","wod_010742471O",
         "wod_010742472O","wod_006614931O","wod_006614934O","wod_006625408O",
         "wod_006752127O","wod_006752129O"];

good = (37.6 .< obsval .< 38.75) .& map(id -> !(id in badid),obsid)


sel = (Dates.month.(obstime) .== 1) .& good
x = (obslon[sel], obslat[sel], obsdepth[sel]);
v = obsval[sel];
z = depthr;

### Data plot
Plot the observation at a given level. What do you think will happend if you run `plotobs.(z)`?

In [None]:
function plotobs(z)

    sel = (Dates.month.(obstime) .== 1) .& (abs.(obsdepth .- z) .< 50) .& good # .& (obsval .< 38.3)

    fig = Figure()
    ga = GeoAxis(
        fig[1, 1];
        dest = "+proj=merc",
        title = "Depth: $(z[1]) m ($(sum(sel)) osbervations)",
    )
    sc = scatter!(ga, obslon[sel], obslat[sel], color = obsval[sel])
    contourf!(ga, bx, by, b, levels = [-1e5, 0, 1.0], colormap = Reverse(:binary))
    Colorbar(fig[2, 1], sc, vertical = false, label = "S")
    #GeoMakie.xlims!(ga, (lonr[1], lonr[end]))
    #GeoMakie.ylims!(ga, (latr[1], latr[end]))
    fig
end

In [None]:
plotobs(z[10])

## Analysis

Prepare the domain, mask and background field.

In [None]:
mask, pmn, xyi = DIVAnd.domain(bathname, bathisglobal, lonr, latr, depthr)
sz = size(mask);
# obs. coordinate matching selection
xsel = (obslon[sel], obslat[sel], obsdepth[sel])

vm = mean(obsval[sel])
va = obsval[sel] .- vm
toaverage = [true, true, false]
background_len = (zeros(sz), zeros(sz), fill(50.0, sz))
background_epsilon2 = 1000.0


fi, vaa = DIVAnd.DIVAnd_averaged_bg(
    mask,
    pmn,
    xyi,
    xsel,
    va,
    background_len,
    background_epsilon2,
    toaverage;
)

fbackground = fi .+ vm
@debug "fbackground: $(fbackground[1,1,:])"

v = vaa;
sel2 = isfinite.(v)
x = (xsel[1][sel2], xsel[2][sel2], xsel[3][sel2])
v = v[sel2];

### Plotting the background

In [None]:
fig = Figure()
ax =
    Axis(fig[1, 1], xlabel = "Salinity", ylabel = "Depth (m)", title = "Background profile")
lines!(ax, fbackground[1, 1, :], -depthr)
fig

## Horizontal correlation length
Estimate the horizontal correlation length for different depth levels, using the function `fithorzlen`  
(can take a few minutes).

In [None]:
@time lenxy, infoxy = fithorzlen(x, v, [0.0]);

The function `fithorzlen` can take optional arguments:
- `distfun`: the function used to get the distance between two points (default: the Euclidian distance);
- `searchz`: the vertical search distance (distance taken the distance of interest; 50 meters by default). 

In [None]:
@time lenxy, infoxy =
    fithorzlen(x, v, z; distfun = DIVAnd.distfun_m, searchz = z -> (z / 4 + 10));

One can get information about the fitting and its quality using the object `infoxy`:

In [None]:
infoxy[:fitinfos][1]

Extract the information for the level `k = 1`:
* `covar`: the empirical covariance
* `fitcovar`: the fitted empirical covariance
* `distx`: distance
* `range`: part of `covar` used for the fitting
* `rqual`: the quality for the fit (1: excellent, 0: poor)

In [None]:
k = 1
covar = infoxy[:fitinfos][k][:covar]
fitcovar = infoxy[:fitinfos][k][:fitcovar]
distx = infoxy[:fitinfos][k][:distx]
range = infoxy[:fitinfos][k][:range]
rqual = infoxy[:fitinfos][k][:rqual]

### Create plots

In [None]:
fig = Figure()
ax = Axis(
    fig[1, 1],
    ylabel = "Covariance [psu²]",
    xlabel = "distance [m]",
    title = "Correlation length fitting",
)
lines!(distx, covar, label = "empirical covariance")
lines!(
    distx[range],
    covar[range],
    color = :red,
    label = "empirical covariance used for fitting",
)
axislegend()
fig

In [None]:
fig = Figure()
ax = Axis(
    fig[1, 1],
    ylabel = "Covariance [psu²]",
    xlabel = "distance [m]",
    title = "Correlation length fitting",
)
lines!(
    distx[range],
    covar[range],
    color = :red,
    label = "empirical covariance used for fitting",
)
lines!(
    distx[range],
    fitcovar[range],
    color = :green,
    label = "fitted covariance (rqual = $(@sprintf("%4.3f",rqual)))",
)
axislegend(ax)
fig

<div class="alert alert-block alert-info">
🖋️ Try different values of `k` (the level index) and re-run the previous two cells.<br>     
Note that at some level the fit was quite poor.<br>
Additional filtering (vertically) is done to smooth the horizontal correlation length.
</div>

### Horizontal correlation length with respect to the depth
For the deepest layers, there is less observations, hence the decreasing quality of the fit.

In [None]:
rqual = [f[:rqual] for f in infoxy[:fitinfos]]

fig = Figure()
ax1 = Axis(
    fig[1, 1],
    ylabel = "Depth (m)",
    xlabel = "Horizontal correlation length (km)",
    title = "Correlation length profile",
)
scatterlines!(ax1, lenxy / 1000, -z, color = :black, label = "Correlation length")
lines!(ax1, infoxy[:len] / 1000, -z, linestyle = :dash)
axislegend(ax1)

ax2 = Axis(fig[1, 2], ylabel = "Depth (m)", xlabel = "Quality of the fit")
scatterlines!(ax2, rqual, -z)
fig

In [None]:
plotobs.(2000)

It is useful to limit the acceptable range of the correlation length by providing a function `limitfun` with the argument depth `z` and estimated correlation length `len`.     
It then returns the adjusted correlation length. This adjustememt is done before the filtering.

In [None]:
@time lenxy2, infoxy2 = fithorzlen(
    x,
    v,
    z;
    distfun = DIVAnd.distfun_m,
    limitfun = (z, len) -> min(max(len, 25e3), 60e3),
);

In [None]:
fig = Figure()
ax1 = Axis(
    fig[1, 1],
    ylabel = "Depth (m)",
    xlabel = "Vertical correlation length (m)",
    title = "Correlation length profile",
)
scatterlines!(ax1, lenxy2, -z, color = :black)
#plot(infoxy2[:len],-z,":");
fig

## Vertical correlation length 
The vertical correlation length is also estimated for different depth levels

In [None]:
?fitvertlen

In [None]:
lenz, infoz = fitvertlen(x, v, z);

### Make a plot

In [None]:
k = 45
covar = infoz[:fitinfos][k][:covar]
fitcovar = infoz[:fitinfos][k][:fitcovar]
distx = infoz[:fitinfos][k][:distx]
range = infoz[:fitinfos][k][:range]
rqual = infoz[:fitinfos][k][:rqual]

fig = Figure()
ax1 = Axis(
    fig[1, 1],
    ylabel = "Depth (m)",
    xlabel = "Covariance [psu²]",
    title = "Correlation length profile",
)
lines!(ax1, distx, covar, label = "empirical covariance", color = :black)
lines!(
    ax1,
    distx[range],
    covar[range],
    color = :red,
    label = "empirical covariance used for fitting",
)
axislegend(ax1)
fig

In [None]:
rqual = [f[:rqual] for f in infoz[:fitinfos]]

fig = Figure()
ax1 = Axis(
    fig[1, 1],
    ylabel = "Depth (m)",
    xlabel = "Horizontal correlation length (km)",
    title = "Correlation length profile",
)
scatterlines!(ax1, lenz, -z, color = :black, label = "Correlation length")
lines!(ax1, infoz[:len], -z, linestyle = :dash)
axislegend(ax1)

ax2 = Axis(fig[1, 2], ylabel = "Depth (m)", xlabel = "Quality of the fit")
scatterlines!(ax2, rqual, -z)
fig

In [None]:
infoz[:fitinfos][end]

An alternative is to use of the vertical coordinate to obtain a reasonable guess of the vertical correlation

In [None]:
Dz = (z[3:end] - z[1:end-2]) / 2
lenz = 3 * [Dz[1], Dz..., Dz[end]]
lenzf = DIVAnd.smoothfilter(1:length(lenz), lenz, 10)

fig = Figure()
ax1 = Axis(
    fig[1, 1],
    ylabel = "Depth (m)",
    xlabel = "Vertical correlation length (m)",
    title = "Correlation length profile",
)
lines!(ax1, lenz, -z)
lines!(ax1, lenzf, -z);
fig