# Exercise 0.3 - NetCDF files (using `netCDF4`)
prepared by M.Hauser

We need to read (and write) netCDF files. There are several modules which are able to do this, one of them is the `netCDF4` library from [Unidata](http://unidata.github.io/netcdf4-python/).

In [1]:
import netCDF4 as nc
import numpy as np

We will open a netCDF file with the growing season lenght (GSL) from 1956 to 2005. GSL is a climate index indicating conditions favourable for plant growth. It is defined as the number of consecutive days per year with a temperature above 5° C.

The data is described in Donat et al., ([2013](http://onlinelibrary.wiley.com/doi/10.1002/jgrd.50150/abstract)), and was obtained from http://www.climdex.org/. 
We will use this dataset in a later exercise.

The data has already undergone some postprocessing - see [prepare_HadEX2_GSL](./data/prepare_HadEX2_GSL.ipynb)

Let's look at the structure of the file first

> `ncdump` needs to be installed for this to work

In [2]:
fN = '../data/HadEX2_GSL.nc'

In [3]:
# you can also use ncdump to show the structure of the file
! ncdump -h {fN}

netcdf HadEX2_GSL {
dimensions:
	lon = 96 ;
	lat = 73 ;
	time = 50 ;
variables:
	double lon(lon) ;
		lon:_FillValue = NaN ;
	double lat(lat) ;
		lat:_FillValue = NaN ;
	int time(time) ;
		time:units = "days since 1956-01-01 00:00:00" ;
		time:calendar = "proleptic_gregorian" ;
	float GSL(time, lat, lon) ;
		GSL:_FillValue = -99.9f ;
	double trend(lat, lon) ;
		trend:_FillValue = NaN ;
	double p_val(lat, lon) ;
		p_val:_FillValue = NaN ;

// global attributes:
		:data = "Growing season length" ;
		:source = "HadEX2 (http://www.climdex.org/)" ;
		:reference = "Donat et al., 2013" ;
}


## Opening the dataset

In [4]:
ncf = nc.Dataset(fN)

print(ncf)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    data: Growing season length
    source: HadEX2 (http://www.climdex.org/)
    reference: Donat et al., 2013
    dimensions(sizes): lon(96), lat(73), time(50)
    variables(dimensions): float64 [4mlon[0m(lon), float64 [4mlat[0m(lat), int32 [4mtime[0m(time), float32 [4mGSL[0m(time,lat,lon), float64 [4mtrend[0m(lat,lon), float64 [4mp_val[0m(lat,lon)
    groups: 



## Print all variables on the dataset

In [5]:
# get all variables
print(ncf.variables.keys())

odict_keys(['lon', 'lat', 'time', 'GSL', 'trend', 'p_val'])


## Get a variable

You can get a variable from a netCDF file like so:

In [6]:
# get a variable from the file
ncf.variables['lon']

<class 'netCDF4._netCDF4.Variable'>
float64 lon(lon)
    _FillValue: nan
unlimited dimensions: 
current shape = (96,)
filling on

However, you did not load the data, yet - but a special kind of data structure called `netCDF4.variable`. It also contains some metadata, e.g. the netCDF4 variable also contains the `units` attributes:

In [None]:
ncf.variables['time'].units

To load the actual data (as a numpy array), you have to index it:

In [7]:
# get data of lon from the file
lon = ncf.variables['lon'][:]
# this is a numpy array
lon

array([  0.  ,   3.75,   7.5 ,  11.25,  15.  ,  18.75,  22.5 ,  26.25,
        30.  ,  33.75,  37.5 ,  41.25,  45.  ,  48.75,  52.5 ,  56.25,
        60.  ,  63.75,  67.5 ,  71.25,  75.  ,  78.75,  82.5 ,  86.25,
        90.  ,  93.75,  97.5 , 101.25, 105.  , 108.75, 112.5 , 116.25,
       120.  , 123.75, 127.5 , 131.25, 135.  , 138.75, 142.5 , 146.25,
       150.  , 153.75, 157.5 , 161.25, 165.  , 168.75, 172.5 , 176.25,
       180.  , 183.75, 187.5 , 191.25, 195.  , 198.75, 202.5 , 206.25,
       210.  , 213.75, 217.5 , 221.25, 225.  , 228.75, 232.5 , 236.25,
       240.  , 243.75, 247.5 , 251.25, 255.  , 258.75, 262.5 , 266.25,
       270.  , 273.75, 277.5 , 281.25, 285.  , 288.75, 292.5 , 296.25,
       300.  , 303.75, 307.5 , 311.25, 315.  , 318.75, 322.5 , 326.25,
       330.  , 333.75, 337.5 , 341.25, 345.  , 348.75, 352.5 , 356.25])

Note: if you only need a subset of the data you can index it here: `ncf.variables['lon'][:10]`. This only loads the first ten elements from the file.

### Exercise

* get the values of the latitude

In [8]:
# lat = ...

# lat

### Solution

In [9]:
lat = ncf.variables['lat'][:]
lat

array([-90. , -87.5, -85. , -82.5, -80. , -77.5, -75. , -72.5, -70. ,
       -67.5, -65. , -62.5, -60. , -57.5, -55. , -52.5, -50. , -47.5,
       -45. , -42.5, -40. , -37.5, -35. , -32.5, -30. , -27.5, -25. ,
       -22.5, -20. , -17.5, -15. , -12.5, -10. ,  -7.5,  -5. ,  -2.5,
         0. ,   2.5,   5. ,   7.5,  10. ,  12.5,  15. ,  17.5,  20. ,
        22.5,  25. ,  27.5,  30. ,  32.5,  35. ,  37.5,  40. ,  42.5,
        45. ,  47.5,  50. ,  52.5,  55. ,  57.5,  60. ,  62.5,  65. ,
        67.5,  70. ,  72.5,  75. ,  77.5,  80. ,  82.5,  85. ,  87.5,
        90. ])

In [10]:
# load the trend
trend_masked = ncf.variables['trend'][:]

trend_masked

masked_array(
  data=[[--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        ...,
        [--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --]],
  mask=[[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],
  fill_value=nan)

Trend_masked is also a numpy array - a masked array. Masked arrays have one array with the actual data (e.g. \[0, 1, 2\], and one array that indicates if this value is masked (= invalid, e.g. \[True, False, False\]). This would correspond to a non-masked array that looks like \[NaN, 1, 2\].

In [11]:
# example

ma = np.ma.array([0., 1, 2], mask=[True, False, False], fill_value=np.NaN)
ma

masked_array(data=[--, 1.0, 2.0],
             mask=[ True, False, False],
       fill_value=nan)

In [12]:
# masked arrays can be converted to NaN arrays as:
trend = np.asarray(trend_masked)
trend

array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])