# DIVAnd Oceanographic Data Analysis - Adriatic Sea Salinity

This notebook performs a 3D oceanographic data analysis using DIVAnd (Data-Interpolating Variational Analysis in n-dimensions) to interpolate salinity data in the Adriatic Sea. The analysis creates spatially and temporally resolved salinity fields from sparse observational data.

## Workflow Overview:
1. **Setup & Configuration** - Import packages and define analysis parameters
2. **Data Acquisition** - Download observational and bathymetry data
3. **Data Visualization** - Plot observation locations and bathymetry
4. **Domain Preparation** - Create and edit analysis masks
5. **Data Quality Control** - Filter observations for realistic values
6. **Analysis Parameters** - Configure DIVAnd correlation lengths and regularization
7. **Output Configuration** - Set up NetCDF file structure and metadata
8. **Analysis Execution** - Run the main DIVAnd interpolation

## 1. Package Imports and Basic Setup

Import all required Julia packages for oceanographic data analysis:

In [None]:
# Import required Julia packages for oceanographic data analysis
using NCDatasets      # For reading and writing NetCDF files
using PhysOcean       # Physical oceanography utilities
using DataStructures  # For ordered dictionaries and other data structures
using DIVAnd          # Data-Interpolating Variational Analysis in n-dimensions
using PyPlot          # Plotting library (matplotlib wrapper)
using Dates           # Date and time handling
using Statistics      # Statistical functions (mean, etc.)
using Random          # Random number generation
using Printf          # String formatting with printf-style syntax

In [5]:
# Example: Open and read a NetCDF file
# Replace "your_file.nc" with the actual filename you want to read
datafile = "data.nc"

"data.nc"

In [None]:
# Check if file exists
if isfile(filename)
    # Open the NetCDF file
    NCDatasets.Dataset(filename, "r") do ds
        println("File opened successfully: $filename")
        
        # List all variables in the file
        println("\nAvailable variables:")
        for (varname, var) in ds
            println("  $varname: $(size(var)) - $(typeof(var))")
            # Print attributes if any
            if length(keys(var.attrib)) > 0
                println("    Attributes: $(keys(var.attrib))")
            end
        end
        
        # List global attributes
        println("\nGlobal attributes:")
        for (attr_name, attr_value) in ds.attrib
            println("  $attr_name: $attr_value")
        end
        
        # Example: Read a specific variable (replace 'variable_name' with actual variable)
        # if haskey(ds, "variable_name")
        #     data = ds["variable_name"][:]
        #     println("\nVariable data shape: $(size(data))")
        #     println("Variable data type: $(typeof(data))")
        # end
    end
else
    println("File not found: $filename")
    println("Please check the filename and path.")
end

## 2. Spatial and Temporal Domain Configuration

Define the analysis grid parameters, spatial boundaries, and time periods:

In [9]:
# Define spatial grid parameters for the Mediterranean Sea analysis
dx, dy = 0.125, 0.125  # Grid resolution in degrees (longitude, latitude)
lonr = -6:dx:37        # Longitude range from -6° to 37° E covering entire Mediterranean
latr = 30:dy:46        # Latitude range from 30° to 46° N covering entire Mediterranean
timerange = [Date(2003,06,06),Date(2012,01,01)];  # Time period for analysis

# Define depth levels for chlorophyll-a 3D analysis (in meters)
# Chlorophyll-a is primarily found in the euphotic zone (0-150m depth)
# Full depth range commented out, focusing on biologically relevant depths
depthr = [0.,5., 10., 15., 20., 25., 30., 40., 50., 66, 
    75, 85, 100, 112, 125, 135, 150, 175, 200, 225, 250, 
    275, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 
    800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 
    1300, 1350, 1400, 1450, 1500, 1600, 1750, 1850, 2000];

# Reduced depth levels for faster computation - key chlorophyll-a depths only
depthr = [0., 10., 25., 50., 100., 150.];  # 6 key depths covering main chlorophyll features

# Define analysis parameters
varname = "Water body chlorophyll-a"    # Variable being analyzed (using correct long_name)
yearlist = [2003:2012]; # Years to include in analysis
monthlist = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]; # Seasonal groupings (quarters)

# Create time selector for seasonal analysis
TS = DIVAnd.TimeSelectorYearListMonthList(yearlist,monthlist);
@show TS;

TS = TimeSelectorYearListMonthList{Vector{UnitRange{Int64}}, Vector{Vector{Int64}}}(UnitRange{Int64}[2003:2012], [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])


## 3. Observational Data Download and Loading

Download salinity observation data from SeaDataNet. Two datasets are available:
- Small test dataset (1000 observations) for quick testing
- Full dataset for complete analysis

In [10]:
# Load observational data from NetCDF files
# First load from small dataset (for testing)
@time obsval,obslon,obslat,obsdepth,obstime,obsid = NCODV.load(Float64, datafile, 
    "Water body chlorophyll-a");

6600 out of 30189 - 21.862267713405544 %
13130 out of 30189 - 43.49266289045679 %
19860 out of 30189 - 65.78555102852032 %
26570 out of 30189 - 88.01218987048263 %
  9.355467 seconds (1.18 M allocations: 55.139 MiB, 0.14% gc time)


## 4. Observational Data Visualization and Quality Check

Plot the geographic distribution of observation points and perform basic quality checks:

In [11]:
# Create a figure showing the geographic distribution of observation points
figure("Adriatic-Data")
ax = subplot(1,1,1)
plot(obslon, obslat, "ko", markersize=.1)  # Plot observation locations as small black dots
aspectratio = 1/cos(mean(latr) * pi/180)   # Calculate proper aspect ratio for latitude
ax.tick_params("both",labelsize=6)
gca().set_aspect(aspectratio)

# Check quality and consistency of observations
checkobs((obslon,obslat,obsdepth,obstime),obsval,obsid)

              minimum and maximum of obs. dimension 1: (3.2175331115722656, 19.19866943359375)
              minimum and maximum of obs. dimension 2: (39.10667037963867, 45.77027893066406)
              minimum and maximum of obs. dimension 3: (0.0, 100.0)
              minimum and maximum of obs. dimension 4: (

┌ Info: Checking ranges for dimensions and observations
└ @ DIVAnd C:\Users\nholodkov\.julia\packages\DIVAnd\4UymR\src\obsstat.jl:77


DateTime("2003-01-07T12:07:21"), DateTime("2012-12-28T08:04:25"))
                          minimum and maximum of data: (9.999999747378752e-5, 147.0)


## 5. Bathymetry Data Processing

Download and process bathymetry (seafloor depth) data needed for creating land/sea masks:

In [12]:
## Download bathymetry data (seafloor depth) for the region
#bathname = "gebco_30sec_8.nc"
#if !isfile(bathname)
#    download("https://dox.ulg.ac.be/index.php/s/U0pqyXhcQrXjEUX/download",bathname)
#else
#    @info("Bathymetry file already downloaded")
#end
#
## Load bathymetry data and interpolate to our grid
#@time bx,by,b = load_bath(bathname,true,lonr,latr);

In [14]:
bathname = "gebco_30sec_8.nc"
#if !isfile(bathname)
#    download("https://dox.ulg.ac.be/index.php/s/U0pqyXhcQrXjEUX/download",bathname)
#else
#    @info("Bathymetry file already downloaded")
#end

# Load bathymetry data and interpolate to our Mediterranean grid
@time bx,by,b = load_bath(bathname,true,lonr,latr);

# Plot the bathymetry data for the Mediterranean Sea
figure("Mediterranean-Bathymetry")
ax = subplot(1,1,1)
pcolor(bx, by, permutedims(b, [2,1]));  # Create colored map of bathymetry
colorbar(orientation="vertical", shrink=0.8).ax.tick_params(labelsize=8)
contour(bx, by, permutedims(b, [2,1]), [0, 0.1], colors="k", linewidths=.5)  # Add coastline contour
gca().set_aspect(aspectratio)
ax.tick_params("both",labelsize=6)
title("Mediterranean Sea Bathymetry")

# ========================================================================
# MASK CREATION AND EDITING FOR MEDITERRANEAN ANALYSIS DOMAIN
# ========================================================================

# Create a 3D mask for the Mediterranean analysis domain
# This mask determines which grid points are valid for analysis (water vs land)
mask = falses(size(b,1),size(b,2),length(depthr))
for k = 1:length(depthr)
    for j = 1:size(b,2)
        for i = 1:size(b,1)
            mask[i,j,k] = b[i,j] >= depthr[k]  # True where water depth >= analysis depth
        end
    end
end
@show size(mask)

# Plot the initial mask (surface level) for Mediterranean
figure("Mediterranean-Mask")
ax = subplot(1,1,1)
gca().set_aspect(aspectratio)
ax.tick_params("both",labelsize=6)
pcolor(bx,by, transpose(mask[:,:,1])); 
title("Mediterranean Sea Initial Mask")

# Create coordinate grids for mask editing
grid_bx = [i for i in bx, j in by];
grid_by = [j for i in bx, j in by];

# Edit the mask to remove specific regions (adapted for Mediterranean)
mask_edit = copy(mask);
# Remove Atlantic Ocean areas west of Gibraltar (longitude < -5.5°)
sel_mask1 = (grid_bx .<= -5.5);  
# Remove Black Sea connections (north of 42° and east of 27°)
sel_mask2 = (grid_by .>= 42.0) .& (grid_bx .>= 27.0);
# Remove areas that are too far north (> 45.5°) to focus on main Mediterranean basin
sel_mask3 = (grid_by .>= 45.5);
# Apply all mask edits
mask_edit = mask_edit .* .!sel_mask1 .* .!sel_mask2 .* .!sel_mask3;
@show size(mask_edit)

# Plot the edited mask for Mediterranean
figure("Mediterranean-Mask-Edited")
ax = subplot(1,1,1)
ax.tick_params("both",labelsize=6)
pcolor(bx, by, transpose(mask_edit[:,:,1])); 
gca().set_aspect(aspectratio)
title("Mediterranean Sea Edited Mask")

  1.688142 seconds (6.63 M allocations: 338.347 MiB, 3.71% gc time, 99.61% compilation time)
size(mask) = (345, 129, 6)
size(mask_edit) = (345, 129, 6)


PyObject Text(0.5, 1.0, 'Mediterranean Sea Edited Mask')

## 6. Analysis Domain Mask Creation and Editing

Create 3D masks that define where the analysis should be performed (water vs land) and edit them to exclude specific regions:

In [15]:
# Create a 3D mask for the analysis domain
# This mask determines which grid points are valid for analysis (water vs land)
mask = falses(size(b,1),size(b,2),length(depthr))
for k = 1:length(depthr)
    for j = 1:size(b,2)
        for i = 1:size(b,1)
            mask[i,j,k] = b[i,j] >= depthr[k]  # True where water depth >= analysis depth
        end
    end
end
@show size(mask)

size(mask) = (345, 129, 6)


(345, 129, 6)

In [16]:
# Plot the initial mask (surface level) for Mediterranean
figure("Mediterranean-Mask")
ax = subplot(1,1,1)
gca().set_aspect(aspectratio)
ax.tick_params("both",labelsize=6)
pcolor(bx,by, transpose(mask[:,:,1])); 
title("Mediterranean Sea Initial Mask")

# Create coordinate grids for mask editing
grid_bx = [i for i in bx, j in by];
grid_by = [j for i in bx, j in by];

# Edit the mask to remove specific regions (adapted for Mediterranean)
mask_edit = copy(mask);
# Remove Atlantic Ocean areas west of Gibraltar (longitude < -5.5°)
sel_mask1 = (grid_bx .<= -5.5);  
# Remove Black Sea connections (north of 42° and east of 27°)
sel_mask2 = (grid_by .>= 42.0) .& (grid_bx .>= 27.0);
# Remove areas that are too far north (> 45.5°) to focus on main Mediterranean basin
sel_mask3 = (grid_by .>= 45.5);
# Apply all mask edits
mask_edit = mask_edit .* .!sel_mask1 .* .!sel_mask2 .* .!sel_mask3;
@show size(mask_edit)

# Plot the edited mask for Mediterranean
figure("Mediterranean-Mask-Edited")
ax = subplot(1,1,1)
ax.tick_params("both",labelsize=6)
pcolor(bx, by, transpose(mask_edit[:,:,1])); 
gca().set_aspect(aspectratio)
title("Mediterranean Sea Edited Mask")

size(mask_edit) = (345, 129, 6)


PyObject Text(0.5, 1.0, 'Mediterranean Sea Edited Mask')

## 7. Data Quality Control and Filtering

Filter the observational data to keep only realistic salinity values for the Adriatic Sea:

In [17]:
## Filter observational data to keep only realistic salinity values
#sel = (obsval .<= 40) .& (obsval .>= 25);  # Typical Adriatic Sea salinity range
#
## Apply the filter to all observation arrays
#obsval = obsval[sel]
#obslon = obslon[sel]
#obslat = obslat[sel]
#obsdepth = obsdepth[sel]
#obstime = obstime[sel]
#obsid = obsid[sel];

## 8. DIVAnd Analysis Parameters Configuration

Set up the key parameters for the DIVAnd interpolation algorithm including correlation lengths and regularization:

In [18]:
# Optional: Calculate observation weights based on data density
# Uncommented code would create spatially varying error estimates
#@time rdiag=1.0./DIVAnd.weight_RtimesOne((obslon,obslat),(0.03,0.03));
#@show maximum(rdiag),mean(rdiag)

# Define grid dimensions for parameter arrays
sz = (length(lonr),length(latr),length(depthr));

# Set correlation lengths (influence radius) for each dimension
lenx = fill(100_000.,sz)   # 100 km correlation length in longitude direction
leny = fill(100_000.,sz)   # 100 km correlation length in latitude direction  
lenz = fill(25.,sz);       # 25 m correlation length in depth direction
len = (lenx, leny, lenz);  # Combine into tuple for DIVAnd

# Set noise-to-signal ratio (regularization parameter)
epsilon2 = 0.1;            # Controls smoothness vs data fidelity tradeoff
#epsilon2 = epsilon2 * rdiag;  # Optional: spatially varying epsilon

0.1

## 9. Output File Setup and Metadata Configuration

Configure the NetCDF output file structure and comprehensive metadata following SeaDataNet standards:

In [20]:
# Set up output directory and filename
outputdir = "./"
if !isdir(outputdir)
    mkpath(outputdir)
end
filename = joinpath(outputdir, "Water_body_$(replace(varname," "=>"_"))Med.4Danl.nc")

"./Water_body_Water_body_chlorophyll-aMed.4Danl.nc"

In [22]:
# Define comprehensive metadata for NetCDF file following SeaDataNet standards
metadata = OrderedDict(
    # Name of the project (SeaDataCloud, SeaDataNet, EMODNET-chemistry, ...)
    "project" => "SeaDataCloud",

    # URN code for the institution EDMO registry,
    # e.g. SDN:EDMO::1579
    "institution_urn" => "SDN:EDMO::1579",

    # Production group
    #"production" => "Diva group",

    # Name and emails from authors
    "Author_e-mail" => ["Your Name1 <name1@example.com>", "Other Name <name2@example.com>"],

    # Source of the observation
    "source" => "observational data from SeaDataNet and World Ocean Atlas",

    # Additional comment
    "comment" => "Duplicate removal applied to the merged dataset",

    # SeaDataNet Vocabulary P35 URN
    # http://seadatanet.maris2.nl/v_bodc_vocab_v2/search.asp?lib=p35
    # example: SDN:P35::WATERTEMP
    "parameter_keyword_urn" => "SDN:P35::EPC00001",

    # List of SeaDataNet Parameter Discovery Vocabulary P02 URNs
    # http://seadatanet.maris2.nl/v_bodc_vocab_v2/search.asp?lib=p02
    # example: ["SDN:P02::TEMP"]
    "search_keywords_urn" => ["SDN:P02::PSAL"],

    # List of SeaDataNet Vocabulary C19 area URNs
    # SeaVoX salt and fresh water body gazetteer (C19)
    # http://seadatanet.maris2.nl/v_bodc_vocab_v2/search.asp?lib=C19
    # example: ["SDN:C19::3_1"]
    "area_keywords_urn" => ["SDN:C19::3_3"],

    "product_version" => "1.0",
    
    "product_code" => "something-to-decide",
    
    # bathymetry source acknowledgement
    # see, e.g.
    # * EMODnet Bathymetry Consortium (2016): EMODnet Digital Bathymetry (DTM).
    # https://doi.org/10.12770/c7b53704-999d-4721-b1a3-04ec60c87238
    # 
    # taken from
    # http://www.emodnet-bathymetry.eu/data-products/acknowledgement-in-publications
    #
    # * The GEBCO Digital Atlas published by the British Oceanographic Data Centre on behalf of IOC and IHO, 2003
    #
    # taken from
    # https://www.bodc.ac.uk/projects/data_management/international/gebco/gebco_digital_atlas/copyright_and_attribution/
        
    "bathymetry_source" => "The GEBCO Digital Atlas published by the British Oceanographic Data Centre on behalf of IOC and IHO, 2003",

    # NetCDF CF standard name
    # http://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html
    # example "standard_name" = "sea_water_temperature",
    "netcdf_standard_name" => "sea_water_chlorophyll-a_concentration",

    "netcdf_long_name" => "sea water chlorophyll-a concentration",

    "netcdf_units" => "1e-3",

    # Abstract for the product
    "abstract" => "...",

    # This option provides a place to acknowledge various types of support for the
    # project that produced the data
    "acknowledgement" => "...",

    "documentation" => "https://doi.org/doi_of_doc",

    # Digital Object Identifier of the data product
    "doi" => "...");

In [23]:
# Convert metadata to NetCDF-compatible attributes
ncglobalattrib, ncvarattrib = SDNMetadata(metadata, filename, varname, lonr, latr)

# Remove any existing analysis file to start fresh
if isfile(filename)
    rm(filename) # delete the previous analysis
    @info "Removing file $filename"
end

## 10. Plotting Function Definition

Define a function to visualize the interpolation results for each time step and depth level during the analysis:

In [24]:
# Set up figure output directory
figdir = "./"

# Define a function to plot interpolation results for each time step
function plotres(timeindex,sel,fit,erri)
    tmp = copy(fit)                            # Copy the fitted data to avoid modifying original
    nx,ny,nz = size(tmp)                       # Get dimensions of the fitted data array
    
    for i in 1:nz                             # Loop through each depth level
        figure("Adriatic-Additional-Data")     # Create or select figure window
        ax = subplot(1,1,1)                   # Create subplot
        ax.tick_params("both",labelsize=6)    # Set tick parameters
        ylim(39.0, 46.0);                     # Set latitude limits
        xlim(11.5, 20.0);                     # Set longitude limits
        title("Depth: $(depthr[i]) \n Time index: $(timeindex)", fontsize=6)  # Add title with depth and time info
        
        # Create colored plot of the interpolated salinity field
        pcolor(lonr.-dx/2.,latr.-dy/2, permutedims(tmp[:,:,i], [2,1]);
               vmin = 33, vmax = 40)           # Set color scale limits for salinity
        colorbar(extend="both", orientation="vertical", shrink=0.8).ax.tick_params(labelsize=8)

        # Add land mask as gray contour 
        contourf(bx,by,permutedims(b,[2,1]), levels = [-1e5,0],colors = [[.5,.5,.5]])
        aspectratio = 1/cos(mean(latr) * pi/180)  # Calculate proper aspect ratio
        gca().set_aspect(aspectratio)
        
        # Save the figure with formatted filename
        figname = varname * @sprintf("_%02d",i) * @sprintf("_%03d.png",timeindex)
        PyPlot.savefig(joinpath(figdir, figname), dpi=600, bbox_inches="tight");
        PyPlot.close_figs()                   # Close figure to free memory
    end
end

plotres (generic function with 1 method)

## 11. Main DIVAnd Analysis Execution

Execute the main 3D interpolation analysis using all the configured parameters and save the results:

In [25]:
# Execute the main DIVAnd 3D analysis
@time dbinfo = diva3d((lonr,latr,depthr,TS),        # Grid coordinates and time selector
    (obslon,obslat,obsdepth,obstime), obsval,        # Observation coordinates and values
    len, epsilon2,                                    # Correlation lengths and regularization
    filename,varname,                                 # Output file and variable name
    bathname=bathname,                               # Bathymetry file for land/sea mask
    #plotres = plotres,                               # Plotting function to call during analysis
    mask = mask_edit,                                # Edited mask for analysis domain
    fitcorrlen = false,                              # Don't fit correlation lengths automatically
    niter_e = 2,                                     # Number of iterations for error estimation
    ncvarattrib = ncvarattrib,                       # NetCDF variable attributes
    ncglobalattrib = ncglobalattrib,                 # NetCDF global attributes
    surfextend = true                                # Extend surface values to deeper levels if needed
    );

# Save observation metadata to the output file
DIVAnd.saveobs(filename,(obslon,obslat,obsdepth,obstime),obsid);

┌ Info: Creating netCDF file ./Water_body_Water_body_chlorophyll-aMed.4Danl.nc
└ @ DIVAnd C:\Users\nholodkov\.julia\packages\DIVAnd\4UymR\src\diva.jl:383
┌ Info: Time step 1 / 4
└ @ DIVAnd C:\Users\nholodkov\.julia\packages\DIVAnd\4UymR\src\diva.jl:436
┌ Info: scaled correlation length (min,max) in dimension 1: (100000.0, 100000.0)
└ @ DIVAnd C:\Users\nholodkov\.julia\packages\DIVAnd\4UymR\src\diva.jl:621
┌ Info: scaled correlation length (min,max) in dimension 2: (100000.0, 100000.0)
└ @ DIVAnd C:\Users\nholodkov\.julia\packages\DIVAnd\4UymR\src\diva.jl:621
┌ Info: scaled correlation length (min,max) in dimension 3: (25.0, 25.0)
└ @ DIVAnd C:\Users\nholodkov\.julia\packages\DIVAnd\4UymR\src\diva.jl:621
└ @ DIVAnd C:\Users\nholodkov\.julia\packages\DIVAnd\4UymR\src\utils.jl:18
┌ Info: number of windows: 4
└ @ DIVAnd C:\Users\nholodkov\.julia\packages\DIVAnd\4UymR\src\DIVAndgo.jl:110
└ @ DIVAnd C:\Users\nholodkov\.julia\packages\DIVAnd\4UymR\src\utils.jl:18
┌ Info: number of windows: 4


125.686676 seconds (123.22 M allocations: 137.832 GiB, 21.40% gc time, 26.68% compilation time)
