# Automatic data downloading

* In this example we will show how one can download data from a jupyter notebook.
* The [PhysOcean](https://github.com/gher-ulg/PhysOcean.jl) module provides ways to automatically download data from the [World Ocean Database](https://www.nodc.noaa.gov/OC5/WOD/pr_wod.html) and from [CMEMS](http://marine.copernicus.eu/) In-Situ TAC.

This module can be installed by: 

```julia
Pkg.add("PhysOcean")
```

In [None]:
Pkg.add("PhysOcean")

Import the necessary packages

In [None]:
using PyPlot              # Visualization package
using PhysOcean           # Download data from the World Ocean Database
using divand              # DIVAnd 

## Settings
Define the time and geospatial bounding box for downloading the data

In [None]:
# resolution (the resolution is only used for DIVAnd analyses)
dx = dy = 0.25   # medium size test 

# vectors defining the longitude and latitudes grids
# Here longitude and latitude correspond to the Mediterranean Sea
lonr = -7:dx:37
latr = 30:dy:46

# time range of the in-situ data
timerange = [Date(2016,1,1),Date(2016,12,31)]

In [None]:
# Name of the variable
varname = "Salinity"

Please use your own email address (!) ðŸ˜‰     
It is only use to get notified by mail once the dataset is ready.

In [None]:
# Email for downloading the data
# Indicate here your email address
email = " "

Define the directory where the results will be saved.    
The tilde ~ will be replaced (expanded) by your home user directory.      
The command `mkpath` will create this path if necessary (including parent path).

In [None]:
basedir = expanduser("~/Downloads/WOD/Med-2016-3")
mkpath(basedir)

## Download the data

* World Ocean Database: example for bulk access data by simulating a web-user.
* Downloading can take several tens of minutes.
* SeaDataNet will provide a dedicated machine-to-machine interface during the SeaDataCloud project

In [None]:
?WorldOceanDatabase.download

In [None]:
WorldOceanDatabase.download(lonr,latr,timerange,varname,email,basedir);

# Load data
Load the data into memory and perform (potentially) an additional subsetting

In [None]:
# load all data under basedir as a double-precision floating point variable
obsval,obslon,obslat,obsdepth,obstime,obsid = WorldOceanDatabase.load(Float64,basedir,varname);

Number of data points

In [None]:
@show size(obsval);

Check some observation IDs

In [None]:
@show obsid[1];
@show obsid[2];

With `checkobs` we get an overview of the extremal values of each dimension and variable.

In [None]:
checkobs((obslon,obslat,obsdepth,obstime),obsval,obsid)

## Additional sub-setting 
Based on time and depth for plotting.     
For instance the month can be extracted from the `Date` using:

In [None]:
Dates.month.(obstime)

In [None]:
# depth range levels
depthr = [0., 20.]

# month range (January to March)
timer = [1,3]

# additional sub-setting and discard bogus negative salinities
sel = ((obsval .> 0 )
       .& (minimum(depthr) .<= obsdepth .<= maximum(depthr))
       .& (minimum(timer) .<= Dates.month.(obstime) .<= maximum(timer)));

@show typeof(sel);
@show size(sel);

The new variables (ending by `sel`) are a sub-selection based on the previous criteria.

In [None]:
valsel = obsval[sel]
lonsel = obslon[sel]
latsel = obslat[sel]
depthsel = obsdepth[sel]
timesel = obstime[sel]
idssel = obsid[sel];

Let's perform again the check.

In [None]:
checkobs((lonsel,latsel,depthsel,timesel),valsel,idssel)

Number of selected data points

In [None]:
length(valsel)

## Bathymetry download 
For plotting purpose. See [06-topography](06-topography.ipynb) for details.

In [None]:
bathname = "gebco_30sec_16.nc"

if !isfile(bathname)
    download("https://b2drop.eudat.eu/s/o0vinoQutAC7eb0/download",bathname)
else
    info("Bathymetry file already downloaded")
end

bathisglobal = true

# Extract the bathymetry for plotting
bx,by,b = divand.extract_bath(bathname,bathisglobal,lonr,latr);

Create a simple plot to show the domain.

In [None]:
pcolor(bx,by,b')
#contourf(bx,by,b', levels = [-1e5,0],colors = [[.5,.5,.5]])
# compute and set the correct aspect ratio
aspect_ratio = 1/cos(mean(latr) * pi/180)
gca()[:set_aspect](aspect_ratio)
colorbar(orientation = "horizontal");

Plotting gotchas:
* `colorbar()` refers to the last added item. If the last added item is the land-sea mask, the colorbar will be all gray.
* maps look nicer if you set the corresponding aspect ratio based on the average latitude `mean(latr)`.

## Data plotting
The bathymetry is used to display a land-sea mask using the `contourf` function with 2 levels.      
The data are shown as colored circles using `scatter`.

In [None]:
contourf(bx,by,b', levels = [-1e5,0],colors = [[.5,.5,.5]])
scatter(lonsel,latsel,10,valsel); 

# compute and set the correct aspect ratio
aspect_ratio = 1/cos(mean(latr) * pi/180)
gca()[:set_aspect](aspect_ratio)
colorbar(orientation = "horizontal");
clim(36,37.7)

# Check for duplicates

There are two ways to call the function `checkduplicates`:

In [None]:
?divand.Quadtrees.checkduplicates

We load a small ODV file containing data in the same domain to test the duplicate detection.     
We use the function `ODVspreadsheet.load` available within `divand.jl`.

In [None]:
download("https://tinyurl.com/ODV-sample","small_ODV_sample.txt")

In [None]:
?ODVspreadsheet.load

In [None]:
obsval_ODV,obslon_ODV,obslat_ODV,obsdepth_ODV,obstime_ODV,obsid_ODV = ODVspreadsheet.load(Float64,["small_ODV_sample.txt"],
                           ["Water body salinity"]; nametype = :localname );

Look for duplicates
* within 0.01 degree (about 1km)
* within 0.01 m depth
* within 1 minute.      

Difference in value is 0.01 psu.

In [None]:
dupl = divand.Quadtrees.checkduplicates((obslon_ODV,obslat_ODV,obsdepth_ODV,obstime_ODV),
    obsval_ODV,(obslon,obslat,obsdepth,obstime),
    obsval,(0.01,0.01,0.01,1/(24*60)),0.01);

In [None]:
size(obsval) == size(dupl)

* `dupl` is an array of the same length as `obsval`
* If the i-th element of `dupl` is an empty list, then the i-th element in `obsval` is probably not a duplicate
* Otherwise, the i-th element in `obsval` is probably a duplicate of the element `val_ODV` with the indices `dupl[i]`.

In [None]:
dupl[1]

To get a list of possible duplicates, we check for the elements of `dupl` that are not empty.

In [None]:
index = find(.!isempty.(dupl))

Number of duplicate candidates

In [None]:
length(index)

Check the first reported duplicate

In [None]:
index_WOD = index[1]

Show its coordinates and value from the ODV file:

In [None]:
obslon[index_WOD],obslat[index_WOD],obsdepth[index_WOD],obstime[index_WOD],obsval[index_WOD]

They are quite close to the data point with the index:

In [None]:
dupl[index_WOD]

In [None]:
index_ODV = dupl[index_WOD][1]

In [None]:
obslon_ODV[index_ODV],obslat_ODV[index_ODV],
obsdepth_ODV[index_ODV],obstime_ODV[index_ODV],
obsval_ODV[index_ODV]

Indeed, it is quite likely that they are duplicates.

Combine the dataset and retain only new points from WOD

In [None]:
#find(isempty.(dupl))

In [None]:
newpoints = find(isempty.(dupl));
@show length(newpoints)

In [None]:
obslon_combined   = [obslon_ODV;   obslon[newpoints]];
obslat_combined   = [obslat_ODV;   obslat[newpoints]];
obsdepth_combined = [obsdepth_ODV; obsdepth[newpoints]];
obstime_combined  = [obstime_ODV;  obstime[newpoints]];
obsval_combined   = [obsval_ODV;   obsval[newpoints]];
obsids_combined   = [obsid_ODV;   obsid[newpoints]];

## CMEMS data download
The function works in a similar way.

In [None]:
?CMEMS.download

## Exercice
1. Download data from CMEMS in the same domain and for the same time period.
2. Plot the data location on a map along with the WOD observations.
3. Check for the duplicates between the two datasets.