# How to read data files?
This notebook describes or points to modules for reading data in different file formats and from different sources.

| Format        | Tool           | 
| ------------- |:-------------:| 
| Delimiter-separated values | [readdlm](https://docs.julialang.org/en/stable/stdlib/io-network/#Base.DataFmt.readdlm-Tuple{Any,Char,Type,Char}), [CSV](https://juliadata.github.io/CSV.jl/stable/)
| NetCDF        | [NCDatasets.jl](https://github.com/Alexander-Barth/NCDatasets.jl) | 
| ODV  $^\star$ | [ODVspreadsheet.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/ODVspreadsheet.jl) |
| ODV netCDF $^\star$   | [NCODV.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/NCODV.jl) | 
| GEBCO bathymetry $^\star$ | [load_bath.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/load_bath.jl)|
| Big files $^\star$    | [loadbigfile](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/load_obs.jl) |
| NetCDF WOD $^\star$   | [loadobs](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/load_obs.jl)
| Mat files     | [MAT.jl](https://github.com/JuliaIO/MAT.jl)| 
| GRIB          | [GRIB.jl](https://github.com/weech/GRIB.jl)|

The $^\star$ symbol denotes functions or tools available in `DIVAnd.jl`.

In [2]:
using DIVAnd
using DelimitedFiles
using CSV

┌ Info: Precompiling CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]
└ @ Base loading.jl:1273


## Delimiter-separated values files 
This include the comma-separated values (CSV), the tab-separated values, among others.    
We show an example with the NAO indices that we obtain from the [Climate Data Guide](https://climatedataguide.ucar.edu/) website.

In [2]:
naodatafile = "../data/nao_station_annual.txt"
if isfile(naodatafile)
    @info("File already downloaded")
else
    # file originally from https://climatedataguide.ucar.edu/sites/default/files/nao_station_annual.txt
    download("https://dox.ulg.ac.be/index.php/s/zYVldQgtso1nMZg/download", naodatafile);
end

┌ Info: File already downloaded
└ @ Main In[2]:3


If we use the function without option, the number of column is deduced from the header, which lead to empty data columns:

In [3]:
dataNAO = DelimitedFiles.readdlm(naodatafile, )

155×5 Array{Any,2}:
     "Hurrell"    "Station-Based"  "Annual"  "NAO"  "Index"
 1865           -0.66              ""        ""     ""     
 1866           -0.2               ""        ""     ""     
 1867           -3.04              ""        ""     ""     
 1868            4.14              ""        ""     ""     
 1869            0.42              ""        ""     ""     
 1870           -2.77              ""        ""     ""     
 1871           -0.85              ""        ""     ""     
 1872           -0.83              ""        ""     ""     
 1873            0.17              ""        ""     ""     
 1874            2.32              ""        ""     ""     
 1875           -2.1               ""        ""     ""     
 1876           -1.85              ""        ""     ""     
    ⋮                                                      
 2007            1.35              ""        ""     ""     
 2008            1.72              ""        ""     ""     
 2009            0.7

So we indicate that the first line is the header using the option *skipstart*:

In [4]:
dataNAO = DelimitedFiles.readdlm(naodatafile, skipstart=1);
dataNAO

154×2 Array{Float64,2}:
 1865.0  -0.66
 1866.0  -0.2 
 1867.0  -3.04
 1868.0   4.14
 1869.0   0.42
 1870.0  -2.77
 1871.0  -0.85
 1872.0  -0.83
 1873.0   0.17
 1874.0   2.32
 1875.0  -2.1 
 1876.0  -1.85
 1877.0  -0.24
    ⋮         
 2007.0   1.35
 2008.0   1.72
 2009.0   0.72
 2010.0  -5.96
 2011.0   2.95
 2012.0  -0.25
 2013.0   0.9 
 2014.0   2.97
 2015.0   4.09
 2016.0   1.7 
 2017.0   1.14
 2018.0   2.83

**Note:** if you have a process files in which the decimal separators is comma instead of dots, specific options are available in the module [`CSV`](https://juliadata.github.io/CSV.jl/stable/).

## NetCDF

The 2 main modules available for the reading and writing if netCDF files are:
1. [NetCDF.jl](https://github.com/JuliaGeo/NetCDF.jl)
2. [NCDatasets.jl](https://github.com/Alexander-Barth/NCDatasets.jl)

For this workshop we will mainly use `NCDatasets.jl`, described in this [notebook](../1-Intro/03-netCDF.ipynb).

### Bathymetry
The General Bathymetric Chart of the Oceans [GEBCO](https://www.gebco.net/) (in netCDF) is directly read with `DIVAnd` using the function `load_bath`.  

First make sure we have a bathymetry file.

In [7]:
bathname = "../data/gebco_30sec_16.nc"
if !isfile(bathname)
    download("https://dox.ulg.ac.be/index.php/s/RSwm4HPHImdZoQP/download",bathname)
else
    @info("Bathymetry file already downloaded")
end

┌ Info: Bathymetry file already downloaded
└ @ Main In[7]:5


Then we have to define the grid on which we need the bathymetry and apply the function.

In [8]:
lonr = -10:0.5:36.
latr = 37:0.5:48
bx,by,b = load_bath(bathname,true,lonr,latr);

`bx` and `by` are the same as lonr and latr.    
`b` contains the bathymetry values.

A complete example is provided in this [notebook](./06-topography.ipynb). 

## ODV spreadsheet
ODV spreadsheets constitute one of the standard formats in [SeaDataCloud](https://www.seadatanet.org/).        
In `DIVAnd`, we provide:
* [ODVspreadsheet.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/ODVspreadsheet.jl) designed to read such format and
* [NCODV.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/NCODV.jl) to read the ODV netCDF files.

An example is provided in this the notebook [09-ODV-data-import.ipynb](./09-ODV-data-import.ipynb).

## Big files
The so-called big files are intermediate files using by DIVA and DIVAnd. The format is rather simple: a tab-separated file containing the following variables:
1. longitude,
2. latitude,
3. field value (e.g., temperature, salinity, chlorophyll concentration, ...), 
4. depth,
5. time,
6. measurement identifier.

In the module [load_obs.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/load_obs.jl), the function `loadbigfile` allows the reading of such file format.    
In the next cell we download a *big file* containing salinity measurements (also used in other examples) and read it using `loadbigfile`.

In [6]:
fname = "../data/Salinity.bigfile"
if !isfile(fname)
    download("https://dox.ulg.ac.be/index.php/s/k0f7FxA7l5FIgu9/download",fname)
else
    @info("Data file already downloaded")
end
obsval,obslon,obslat,obsdepth,obstime,obsid = loadbigfile(fname);
@show(length(obsval));

┌ Info: Data file already downloaded
└ @ Main In[6]:5
┌ Info: Loading data from 'big file' ../data/Salinity.bigfile
└ @ DIVAnd /home/ctroupin/ULiege/Tools/divand.jl/src/load_obs.jl:10


length(obsval) = 139230


## Mat files
We use the same .mat file as in [04-OI-variational-analysis-introduction](../1-Intro/04-OI-variational-analysis-introduction.ipynb).

In [1]:
using MAT

In [3]:
matfile = "../data/dan_field.mat"
mf = matopen(matfile);

We can get a list of the variables stored in the file:

In [4]:
varnames = names(mf)

Base.KeySet for a Dict{String,Int64} with 3 entries. Keys:
  "f"
  "Fe"
  "F"

and to load one of them, use

In [5]:
var1 = read(mf, "f");
@show sizeof(var1);

sizeof(var1) = 20000


When we're done, don't forget to close the file (especially if we process a large amount of files).

In [6]:
close(mf)

## GRIB files
The package is not (yet) registered (as of January 2020), so you need to get it from GitHub.

In [14]:
using Pkg
Pkg.add(PackageSpec(path="https://github.com/weech/GRIB.jl"))
using GRIB

[32m[1m  Updating[22m[39m git-repo `https://github.com/weech/GRIB.jl`
[?25l[2K[?25h[32m[1m  Updating[22m[39m git-repo `https://github.com/weech/GRIB.jl`
[?25l[2K[?25h[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/Projects/Diva-Workshops/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `~/Projects/Diva-Workshops/Manifest.toml`
[90m [no changes][39m


In [12]:
gribfile = "../data/test.grib"
if !isfile(gribfile)
    download("https://github.com/weech/GRIB.jl/raw/master/test/samples/regular_latlon_surface.grib2", 
    gribfile)
else
    @info("File already downloaded")
end

┌ Info: File already downloaded
└ @ Main In[12]:6


The module `GRIB.jl` works only on Linux and Mac.

In [24]:
if !Sys.iswindows()
    GribFile(gribfile) do f
       # Get the first message from f
       msg = Message(f)
       lons, lats, values = data(msg)
       @info(length(lons))
    end
end

┌ Info: 496
└ @ Main In[24]:5


Note that developments on GRIB in Julia are taking place:     
https://discourse.julialang.org/t/new-package-to-map-grib-files-to-the-unidatas-common-data-model-v4-following-the-cf-conventions/3237