<img src="https://mintproject.github.io/MINT_USERGUIDE/Figures/mint-logo-vertical.png" width="100">

# netCDF tutorial

## Table of content
[Purpose](#purpose)  
[Example data](#example)  
[Package requirements](#package)  
[Getting familiar with NetCDF](#netcdf)  
[Importing Variables](#var)

## <a name='purpose'>Purpose</a>

This interactive Jupyter Notebook guides the reader through the steps of opening, reading, and importing variables from a file in the NetCDF format. 

To know more about NetCDF, visit [https://www.unidata.ucar.edu/software/netcdf/](https://www.unidata.ucar.edu/software/netcdf/). 

## <a name='example'>Example data</a>

This Notebook uses a monthly file from the FLDAS FLDAS_NOAH01_C_EA_M.001 resource, which can be accessed from the MINT Data Catalog.

## <a name="package"> Package requirements </a>
This tutorial uses the [xarray](#http://xarray.pydata.org/en/stable/) Python package.  
Installation instructions are available [here](#http://xarray.pydata.org/en/stable/installing.html)

Import packages

In [None]:
import xarray as xr
import numpy as np
import pandas as pd

## <a name='netcdf'> Getting familiar with NetCDF </a>

To open a dataset:

In [None]:
# Set the file name
file = 'FLDAS_NOAH01_C_EA_M.A201501.001.nc'
# Open with xarray
nc_fid = xr.open_dataset(file)

The content of the netCDF is stored in a dictionary-like structure that contains:
- dimensions of the variables within the dataset: latitude,longitude,time and bounds
- coordinates
- the variables in the file and associated dimensions. In this case the data is oragnized in arrays of dimenstion (time, Y,X)
- File attributes: information about the file, including conventions, history, title...

In [None]:
nc_fid

To get the values of the coordinates, use:

In [None]:
lon = nc_fid.coords['X'].values
lat = nc_fid.coords['Y'].values

# <a name='var'> Importing Variables </a>

Variables are one layer down the top dictionary. To access them:  
`netcdfname.VarName` or `netcdfname['VarName']`. The second method is useful when the variable name contains a '.'

Let's look at the precipitation variable:

In [None]:
nc_fid.Rainf_f_tavg

Each variables contain:
- coordinates (same as coordinates from the file)
- values, stored in a numpy array
- attributes, including a standard name, long name, units...

To access the values, use the following command:

In [None]:
P = nc_fid.Rainf_f_tavg.values

P

You can slice the array as any numpy array. For instance, to get all the data between 6 and 8°N and 23 and 27°E, use:

In [None]:
min_lat = 6
max_lat = 8
min_lon = 23
max_lon = 27

# Get the bounding box indices
idx_x = np.arange(np.where(lon>min_lon)[0][0],np.where(lon<max_lon)[0][-1],1)
idx_y = np.arange(np.where(lat>min_lat)[0][0],np.where(lat<max_lat)[0][-1],1)

P_slice = P[:,idx_y,:]
P_slice = P[:,:,idx_x]

P_slice

To take the mean of the array:

In [None]:
P_slice_mean = np.nanmean(P_slice)
P_slice_mean