# How to read data files?
This notebook describes or points to modules for reading data in different file formats and from different sources.       
The $^\star$ symbol denotes functions or tools available in `DIVAnd.jl`.

| Format        | Tool           | 
| ------------- |:-------------:| 
| Delimiter-separated values | [readdlm](https://docs.julialang.org/en/stable/stdlib/io-network/#Base.DataFmt.readdlm-Tuple{Any,Char,Type,Char}), [CSV](https://juliadata.github.io/CSV.jl/stable/)
| NetCDF        | [NCDatasets.jl](https://github.com/Alexander-Barth/NCDatasets.jl) | 
| ODV  $^\star$ | [ODVspreadsheet.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/ODVspreadsheet.jl) |
| ODV netCDF $^\star$   | [NCODV.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/NCODV.jl) | 
| GEBCO bathymetry $^\star$ | [load_bath.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/load_bath.jl)|
| Big files $^\star$    | [loadbigfile](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/load_obs.jl) |
| NetCDF WOD $^\star$   | [loadobs](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/load_obs.jl)
| Mat files     | [MAT.jl](https://github.com/JuliaIO/MAT.jl)| 
| GRIB          | [GRIB.jl](https://github.com/weech/GRIB.jl)|
| GeoJSON       | [GeoJSON.jl](https://github.com/JuliaGeo/GeoJSON.jl)|
| GeoTIFF       | [TIFFDatasets](https://alexander-barth.github.io/TIFFDatasets.jl/stable/)|

In [None]:
import Pkg
Pkg.activate("../..")
Pkg.instantiate()
using DIVAnd
using DelimitedFiles
using CSV
include("../config.jl")

## Delimiter-separated values files 
This include the comma-separated values (CSV), the tab-separated values, among others.    
We show an example with the NAO indices that we obtain from the [Climate Data Guide](https://climatedataguide.ucar.edu/) website.

In [None]:
download_check(naodatafile, naodatafileURL)

If we use the function without option, the number of columns is deduced from the header, which lead to empty data columns:

In [None]:
dataNAO = DelimitedFiles.readdlm(naodatafile)

So we indicate that the first line is the header using the option *skipstart*:

In [None]:
dataNAO = DelimitedFiles.readdlm(naodatafile, skipstart = 1);
dataNAO

**Note:** if you have a process files in which the decimal separators is comma instead of dots, specific options are available in the module [`CSV`](https://juliadata.github.io/CSV.jl/stable/).

## NetCDF

For this workshop we will mainly use [`NCDatasets.jl`](https://github.com/JuliaGeo/NCDatasets.jl), described in this [notebook](../1-Intro/1-03-netCDF.ipynb).

### Bathymetry
The General Bathymetric Chart of the Oceans [GEBCO](https://www.gebco.net/) (in netCDF) is directly read with `DIVAnd` using the function `load_bath`.  

First make sure we have a bathymetry file.

In [None]:
bathname = gebco16file
download_check(gebco16file, gebco16fileURL)

Then we have to define the grid on which we need the bathymetry and apply the function.

In [None]:
lonr = -10:0.5:36.0
latr = 37:0.5:48
bx, by, b = load_bath(bathname, true, lonr, latr);

`bx` and `by` are the same as lonr and latr.    
`b` contains the bathymetry values.

A complete example is provided in the notebook [`2-01-topography`](2-01-topography.ipynb)

## ODV spreadsheet
ODV spreadsheets constitute one of the standard formats defined in [SeaDataNet](https://www.seadatanet.org/).        
In `DIVAnd`, we provide:
* [ODVspreadsheet.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/ODVspreadsheet.jl) designed to read such format and
* [NCODV.jl](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/NCODV.jl) to read the ODV netCDF files.

An example is provided in this the notebook [`2-04-ODV-data-import.ipynb`](./2-04-ODV-data-import.ipynb).

## Big files
The so-called big files are intermediate files using by DIVA and DIVAnd. The format is rather simple: a tab-separated file containing the following variables:
1. longitude,
2. latitude,
3. field value (e.g., temperature, salinity, chlorophyll concentration, ...), 
4. depth,
5. time,
6. measurement identifier.

In the module [`load_obs.jl`](https://github.com/gher-uliege/DIVAnd.jl/blob/master/src/load_obs.jl), the function `loadbigfile` allows the reading of such file format.    
In the next cell we download a *big file* containing salinity measurements (also used in other examples) and read it using `loadbigfile`.

In [None]:
fname = salinitybigfile
download_check(salinitybigfile, salinitybigfileURL)

obsval, obslon, obslat, obsdepth, obstime, obsid = loadbigfile(fname);
@show(length(obsval));

## Mat files
We use the same `.mat` file as in [04-OI-variational-analysis-introduction](../1-Intro/1-04-OI-variational-analysis-introduction.ipynb).

In [None]:
using MAT

In [None]:
download_check(danfile, danfileURL)
mf = matopen(danfile);

We can get a list of the variables stored in the file:

In [None]:
varnames = names(mf)

and to load one of them, use

In [None]:
var1 = read(mf, "f");
@show sizeof(var1);

When we're done, don't forget to close the file (especially if we process a large amount of files).

In [None]:
close(mf)

## GRIB files

In [None]:
using GRIB

In [None]:
download_check(gribfile, gribfileURL)

The module `GRIB.jl` works only on Linux and Mac.

In [None]:
if !Sys.iswindows()
    GribFile(gribfile) do f
       # Get the first message from f
       msg = Message(f)
       lons, lats, values = data(msg)
       @info(length(lons))
    end
end

## GeoJSON
The sample file has been generated and downloaded from https://geojson.io.

In [None]:
using GeoJSON

In [None]:
download_check(geojsonfile, geojsonfileURL)

jsonbytes = read(geojsonfile);
fc = GeoJSON.read(jsonbytes)
@show typeof(fc)

The geoJSON file contains 2 features, each of them consisting of a 2D Polygon, from which we can extract the coordinates.

In [None]:
polygon1 = fc[1]
polygon1.geometry[1]

## GeoTIFF
GeoTIFF allows georeferencing information to be embedded within an image file.        
The test image was extracted from https://worldview.earthdata.nasa.gov/.

In [None]:
download_check(geotifffile, geotifffileURL)

We present two ways to read those files:
1. With [`GeoArrays`](https://www.evetion.nl/GeoArrays.jl/stable/)
2. With [`TIFFDatasets`](https://alexander-barth.github.io/TIFFDatasets.jl/stable/).

The advantage of the latter is that is works similarly to [`NCDatasets`](https://github.com/JuliaGeo/NCDatasets.jl), used to read the netCDF files.

In [None]:
using GeoArrays
geoarray = GeoArrays.read(geotifffile)
coordinates = collect(GeoArrays.coords(geoarray))
lats = [cc[2] for cc in coordinates[1, :]]
lons = [cc[1] for cc in coordinates[:, 1]]
img = reverse(geoarray.A[:, :, 1]', dims = 1);

In [None]:
using TIFFDatasets
ds = TIFFDataset(geotifffile)
lons2 = ds["lon"][:, 1]
lats2 = ds["lat"][1, :]
img2 = ds["band1"][:, :]
close(ds)