# How to read data files?
This notebook describes or points to modules for reading data in different file formats and from different sources.

| Format        | Tool           | 
| ------------- |:-------------:| 
| Delimiter-separated values | [readdlm](https://docs.julialang.org/en/stable/stdlib/io-network/#Base.DataFmt.readdlm-Tuple{Any,Char,Type,Char})
| NetCDF        | [NCDatasets.jl](https://github.com/Alexander-Barth/NCDatasets.jl) | 
| ODV  $^\star$         | [ODVspreadsheet.jl](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/ODVspreadsheet.jl) |
| GEBCO bathymetry $^\star$ | [load_mask.jl](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/load_mask.jl)|
| Big files $^\star$    | [loadbigfile](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/load_obs.jl) |
| NetCDF WOD $^\star$   | [loadobs](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/load_obs.jl)
| Mat files     | [MAT.jl](https://github.com/JuliaIO/MAT.jl)|

$^\star$ denotes functions or tools available in `DIVAnd.jl`.

In [4]:
using DIVAnd
using DelimitedFiles

## Delimiter-separated values files 
This include the comma-separated values (CSV), the tab-separated values, among others.    
We show an example with the NAO indices that we obtain from the [Climate Data Guide](https://climatedataguide.ucar.edu/) website.

In [2]:
download("https://climatedataguide.ucar.edu/sites/default/files/nao_station_annual.txt", 
    "./data/nao_station_annual.txt");

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1876  100  1876    0     0   2152      0 --:--:-- --:--:-- --:--:--  2151


If we use the function without option, the number of column is deduced from the header, which lead to empty data columns:

In [5]:
dataNAO = DelimitedFiles.readdlm("./data/nao_station_annual.txt");

So we indicate that the first line is the header using the option *skipstart*:

In [8]:
dataNAO = DelimitedFiles.readdlm("nao_station_annual.txt", skipstart=1);
dataNAO[1:5, :]

5×2 Array{Float64,2}:
 1865.0  -0.66
 1866.0  -0.2 
 1867.0  -3.04
 1868.0   4.14
 1869.0   0.42

## NetCDF

The 2 main modules available for the reading and writing if netCDF files are:
1. [NetCDF.jl](https://github.com/JuliaGeo/NetCDF.jl)
2. [NCDatasets.jl](https://github.com/Alexander-Barth/NCDatasets.jl)

An example is provided in this [notebook](./03-netCDF.ipynb).

## ODV spreadsheet
ODV spreadsheets constitute one of the standard formats in [SeaDataCloud](https://www.seadatanet.org/).        
In `DIVAnd`, we provide a tool called [ODVspreadsheet.jl](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/ODVspreadsheet.jl) designed to read such format.

An example is provided in this [notebook](./09-ODV-data-import.ipynb).

## Big files
The so-called big files are intermediate files using by DIVA and DIVAnd. The format is rather simple: a tab-separated file containing the following variables:
1. longitude,
2. latitude,
3. field value (e.g., temperature, salinity, chlorophyll concentration, ...), 
4. depth,
5. time,
6. measurement identifier.

In the module [load_obs.jl](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/load_obs.jl), the function `loadbigfile` allows the reading of such file format.    
In the next cell we download a *big file* containing salinity measurements (also used in other examples) and read it using `loadbigfile`.

In [9]:
fname = "./data/Salinity.bigfile"
if !isfile(fname)
    download("https://b2drop.eudat.eu/s/Bv9Fj0YGC0zp2vn/download",fname)
else
    @info("Data file already downloaded")
end
obsval,obslon,obslat,obsdepth,obstime,obsid = loadbigfile(fname);
@show(length(obsval));

┌ Info: Data file already downloaded
└ @ Main In[9]:5
┌ Info: Loading data from 'big file' ./data/Salinity.bigfile
└ @ DIVAnd /home/ctroupin/.julia/packages/DIVAnd/bRUC1/src/load_obs.jl:10
│   caller = macro expansion at logging.jl:312 [inlined]
└ @ Core ./logging.jl:312
│   caller = macro expansion at logging.jl:314 [inlined]
└ @ Core ./logging.jl:314


length(obsval) = 139230


## Bathymetry
The General Bathymetric Chart of the Oceans [GEBCO](https://www.gebco.net/) (in netCDF) is directly read with `DIVAnd` using the function `load_bath`.  

First make sure we have a bathymetry file.

In [10]:
bathname = "gebco_30sec_16.nc"
if !isfile(bathname)
    download("https://b2drop.eudat.eu/s/ACcxUEZZi6a4ziR/download",bathname)
else
    @info("Bathymetry file already downloaded")
end

┌ Info: Bathymetry file already downloaded
└ @ Main In[10]:5


Then we have to define the grid on which we need the bathymetry and apply the function.

In [11]:
lonr = -10:0.5:36.
latr = 37:0.5:48
bx,by,b = load_bath(bathname,true,lonr,latr);

`bx` and `by` are the same as lonr and latr.    
`b` contains the bathymetry values.

A complete example is provided in this [notebook](./06-topography.ipynb). 

## Mat files
The [MAT](https://github.com/JuliaIO/MAT.jl) module was designed to read the files created by MATLAB (.mat extension).      We use the same .mat file as in [04-OI-variational-analysis-introduction](./04-OI-variational-analysis-introduction.ipynb).

In [12]:
using MAT

┌ Info: Recompiling stale cache file /home/ctroupin/.julia/compiled/v0.7/MAT/3FHIv.ji for MAT [23992714-dd62-5051-b70f-ba57cb901cac]
└ @ Base loading.jl:1185
│ Use `mutable struct` instead.
└ @ ~/.julia/packages/MAT/Pn0pR/src/MAT_HDF5.jl:39
│ Use `mutable struct` instead.
└ @ ~/.julia/packages/MAT/Pn0pR/src/MAT_HDF5.jl:535
│ Use `read_complex(dtype::HDF5Datatype, dset::HDF5Dataset, #unused#::Type{Array{T}}) where T` instead.
└ @ ~/.julia/packages/MAT/Pn0pR/src/MAT_HDF5.jl:119
│ Use `m_writetypeattr(dset, #unused#::Type{Complex{T}}) where T` instead.
└ @ ~/.julia/packages/MAT/Pn0pR/src/MAT_HDF5.jl:298
│ Use `m_writearray(parent::HDF5Parent, name::String, adata::Array{T}) where T <: HDF5BitsOrBool` instead.
└ @ ~/.julia/packages/MAT/Pn0pR/src/MAT_HDF5.jl:328
│ Use `m_writearray(parent::HDF5Parent, name::String, adata::Array{Complex{T}}) where T <: HDF5BitsOrBool` instead.
└ @ ~/.julia/packages/MAT/Pn0pR/src/MAT_HDF5.jl:340
│ Use `m_write(mfile::MatlabHDF5File, parent::HDF5Parent, name::S

In [14]:
matfile = "data/dan_field.mat"
mf = matopen(matfile)

│   caller = matopen(::String, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at MAT.jl:56
└ @ MAT /home/ctroupin/.julia/packages/MAT/Pn0pR/src/MAT.jl:56


MAT.MAT_v5.Matlabv5File(IOStream(<file data/dan_field.mat>), false, #undef)

We can get a list of the variables stored in the file:

In [15]:
varnames = names(mf)

│   caller = read_bswap(::IOStream, ::Bool, ::Type{UInt32}, ::Int64) at MAT_v5.jl:90
└ @ MAT.MAT_v5 /home/ctroupin/.julia/packages/MAT/Pn0pR/src/MAT_v5.jl:90
│   caller = read_bswap(::IOStream, ::Bool, ::Type{Int32}, ::Int64) at MAT_v5.jl:90
└ @ MAT.MAT_v5 /home/ctroupin/.julia/packages/MAT/Pn0pR/src/MAT_v5.jl:90
│   caller = read_bswap(::IOStream, ::Bool, ::Type{UInt8}, ::Int64) at MAT_v5.jl:90
└ @ MAT.MAT_v5 /home/ctroupin/.julia/packages/MAT/Pn0pR/src/MAT_v5.jl:90


Base.KeySet for a Dict{String,Int64} with 3 entries. Keys:
  "f"
  "Fe"
  "F"

and to load one of them, use

In [16]:
var1 = read(mf, "f");
@show sizeof(var1);

sizeof(var1) = 20000


  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctroupin/.julia/packages/IJulia/9RcVi/src/kernel.jl:32
  likely near /home/ctrou

When we're done, don't forget to close the file.

In [17]:
close(mf)