# How to read data files?
This notebook describes or points to modules for reading data in different file formats and from different sources.

$^\star$ denotes functions or tools available in `DIVAnd.jl`.

| Format        | Tool           | 
| ------------- |:-------------:| 
| Delimiter-separated values | [readdlm](https://docs.julialang.org/en/stable/stdlib/io-network/#Base.DataFmt.readdlm-Tuple{Any,Char,Type,Char})
| NetCDF        | [NCDatasets.jl](https://github.com/Alexander-Barth/NCDatasets.jl) | 
| ODV  $^\star$         | [ODVspreadsheet.jl](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/ODVspreadsheet.jl) |
| GEBCO bathymetry $^\star$ | [load_mask.jl](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/load_mask.jl)|
| Big files $^\star$    | [loadbigfile](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/load_obs.jl) |
| NetCDF WOD $^\star$   | [loadobs](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/load_obs.jl)
| Mat files     | [MAT.jl](https://github.com/JuliaIO/MAT.jl)|

In [1]:
using DIVAnd



## Delimiter-separated values files 
This include the comma-separated values (CSV), the tab-separated values, among others.    
We show an example with the NAO indices that we obtain from the Climate Data Guide website.

In [2]:
download("https://climatedataguide.ucar.edu/sites/default/files/nao_station_annual.txt", "./nao_station_annual.txt")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1865  100  1865    0     0   1361      0  0:00:01  0:00:01 --:--:--  1360


"./nao_station_annual.txt"

If we use the function without option, the number of column is deduced from the header, which lead to empty data columns:

In [3]:
dataNAO = readdlm("nao_station_annual.txt")

153×5 Array{Any,2}:
     "Hurrell"    "Station-Based"  "Annual"  "NAO"  "Index"
 1865           -0.66              ""        ""     ""     
 1866           -0.2               ""        ""     ""     
 1867           -3.04              ""        ""     ""     
 1868            4.14              ""        ""     ""     
 1869            0.42              ""        ""     ""     
 1870           -2.77              ""        ""     ""     
 1871           -0.85              ""        ""     ""     
 1872           -0.83              ""        ""     ""     
 1873            0.17              ""        ""     ""     
 1874            2.32              ""        ""     ""     
 1875           -2.1               ""        ""     ""     
 1876           -1.85              ""        ""     ""     
    ⋮                                                      
 2005           -1.35              ""        ""     ""     
 2006           -0.2               ""        ""     ""     
 2007            1.3

So we indicate that the first line is the header:

In [4]:
dataNAO = readdlm("nao_station_annual.txt", skipstart=1)

152×2 Array{Float64,2}:
 1865.0  -0.66
 1866.0  -0.2 
 1867.0  -3.04
 1868.0   4.14
 1869.0   0.42
 1870.0  -2.77
 1871.0  -0.85
 1872.0  -0.83
 1873.0   0.17
 1874.0   2.32
 1875.0  -2.1 
 1876.0  -1.85
 1877.0  -0.24
    ⋮         
 2005.0  -1.35
 2006.0  -0.2 
 2007.0   1.35
 2008.0   1.72
 2009.0   0.72
 2010.0  -5.96
 2011.0   2.95
 2012.0  -0.25
 2013.0   0.9 
 2014.0   2.97
 2015.0   4.09
 2016.0   1.7 

## NetCDF

The 2 main modules to read netCDF are:
1. [NetCDF.jl](https://github.com/JuliaGeo/NetCDF.jl)
2. [NCDatasets.jl](https://github.com/Alexander-Barth/NCDatasets.jl)

An example is provided in this [notebook](./03-netCDF.ipynb).

## ODV spreadsheet
ODV spreadsheets constitute one of the standard formats in [SeaDataCloud](https://www.seadatanet.org/). In `DIVAnd`, we provide a tool called [ODVspreadsheet.jl](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/ODVspreadsheet.jl) designed to read such format.

An example is provided in this [notebook](./09-ODV-data-import.ipynb).

## Big files
The so-called big files are intermediate files using by DIVA and DIVAnd. The format is rather simple: a tab-separated file containing the following variables:
1. longitude,
2. latitutde,
3. field value (e.g., temperature, salinity, chlorophyll concentration, ...), 
4. depth,
5. time,
6. measurement identifier.

In the module [load_obs.jl](https://github.com/gher-ulg/DIVAnd.jl/blob/master/src/load_obs.jl), the function `loadbigfile` allows the reading of such file format.    
In the next cell we download a *big file* containing salinity measurements (also used in other examples) and read it using `loadbigfile`.

In [5]:
fname = "Salinity.bigfile"
if !isfile(fname)
    download("https://b2drop.eudat.eu/s/Bv9Fj0YGC0zp2vn/download",fname)
else
    info("Data file already downloaded")
end
obsval,obslon,obslat,obsdepth,obstime,obsid = loadbigfile(fname);

[1m[36mINFO: [39m[22m[36mData file already downloaded
[39m[1m[36mINFO: [39m[22m[36mLoading data from 'big file' Salinity.bigfile
[39m

## Bathymetry
The General Bathymetric Chart of the Oceans [GEBCO](https://www.gebco.net/) (in netCDF) is directly read with `DIVAnd` using the function `load_bath`.  

First make sure we have a bathymetry file.

In [4]:
bathname = "gebco_30sec_16.nc"
if !isfile(bathname)
    download("https://b2drop.eudat.eu/s/ACcxUEZZi6a4ziR/download",bathname)
else
    info("Bathymetry file already downloaded")
end

Then we have to define the grid on which we need the bathymetry and apply the function.

In [5]:
lonr = -10:0.5:36.
latr = 37:0.5:48
bx,by,b = load_bath(bathname,true,lonr,latr);

`bx` and `by` are the same as lonr and latr.    
`b` contains the bathymetry values.

A complete example is provided in this [notebook](./06-topography.ipynb). 

## Mat files
The [MAT](https://github.com/JuliaIO/MAT.jl) module was designed to read the files created by MATLAB (.mat extension).      We use the same .mat file as in [04-OI-variational-analysis-introduction](./04-OI-variational-analysis-introduction.ipynb).

In [1]:
using MAT

[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /home/ctroupin/.julia/v0.6/lib/v0.6/Blosc.ji for module Blosc.
[39m[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /home/ctroupin/.julia/v0.6/lib/v0.6/MAT.ji for module MAT.
[39m

In [3]:
matfile = "dan_field.mat"
mf = matopen(matfile)

MAT.MAT_v5.Matlabv5File(IOStream(<file dan_field.mat>), false, #undef)

We can get a list of the variables stored in the file:

In [4]:
varnames = names(mf)

Base.KeyIterator for a Dict{String,Int64} with 3 entries. Keys:
  "f"
  "Fe"
  "F"

and to load one of them, use

In [6]:
var1 = read(mf, "f");
@show sizeof(var1);

sizeof(var1) = 20000


When we're done, don't forget to close the file.

In [7]:
close(mf)