# Opening and reading NetCDF and GRIB File

## Table of Contents
1. [What is the purpose of this Notebook](#purpose)
2. [Example data](#data)
3. [Opening and reading NetCDF files](#netcdf)  
    3.1. [Package requirements](#packagenetCDF)  
    3.2. [Opening and closing a NetCDF file](#opennetCDF)  
    3.3. [Getting the list of atttributes, dimensions, and variables](#dimensions)  
    3.4. [Importing variables](#varnetCDF)  
4. [Opening and reading GRIB files](#GRIB)  
    4.1. [Package requirements](#packageGRIB)  
    4.2. [Opening a GRIB file](#openGRIB)  
    4.3. [Reading and importing variables](#varGRIB)    

## <a name="purpose"> What is the purpose of this Notebook?</a>

This interactive Jupyter Notebook guides the reader through the steps of opening, reading, and importing variables from a file in the NetCDF or GRIB format.  

To know more about NetCDF, visit [https://www.unidata.ucar.edu/software/netcdf/](https://www.unidata.ucar.edu/software/netcdf/).  
To know more about GRIB, visit [https://www.wmo.int/pages/prog/www/WDM/Guides/Guide-binary-2.html](https://www.wmo.int/pages/prog/www/WDM/Guides/Guide-binary-2.html).

## <a name="data"> Example data</a>

This Notebook makes use of the same dataset downloaded in both the NetCDF and GRIB format. Please note that the data is originally in GRIB format and transformed upon download.  

The data consists of monthly (obtained from daily means) Total Precipitation for the year 2010 obtained from ECMWF ERA-20CM. The data can be downloaded [here](http://apps.ecmwf.int/datasets/data/era20cm-edmo/levtype=sfc/?month_years=2010&number=0&param=228.128) (Using ensemble 0). The two files are included in the folder containing this Notebook.

<div class="alert alert-warning" role="alert" style="margin: 10px">
<p>**NOTE**</p>
<p> You need to register to access these data from the ECMWF portal</p>
</div>  

 

## <a name="netcdf"> Opening and reading NetCDF files </a>
### <a name="packagenetCDF"> Package requirements </a>

This example makes use of the [netCDF4 Python package](https://github.com/Unidata/netcdf4-python), which is available through Pypi: `pip install netcdf4`

### <a name="opennetCDF"> Opening and closing a NetCDF file </a>

To create a netCDF file from python, you simply call the Dataset constructor. This is also the method used to open an existing netCDF file. If the file is open for write access (**mode='w', 'r+' or 'a'**), you may write any type of data including new dimensions, groups, variables and attributes. netCDF files come in five flavors (**NETCDF3_CLASSIC**, **NETCDF3_64BIT_OFFSET**, **NETCDF3_64BIT_DATA**, **NETCDF4_CLASSIC**, and **NETCDF4**). **NETCDF3_CLASSIC** was the original netcdf binary format, and was limited to file sizes less than 2 Gb. **NETCDF3_64BIT_OFFSET** was introduced in version 3.6.0 of the library, and extended the original binary format to allow for file sizes greater than 2 Gb. **NETCDF3_64BIT_DATA** is a new format that requires version 4.4.0 of the C library - it extends the **NETCDF3_64BIT_OFFSET** binary format to allow for unsigned/64 bit integer data types and 64-bit dimension sizes. **NETCDF3_64BIT** is an alias for **NETCDF3_64BIT_OFFSET**. **NETCDF4_CLASSIC** files use the version 4 disk format (HDF5), but omits features not found in the version 3 API. They can be read by netCDF 3 clients only if they have been relinked against the netCDF 4 library. They can also be read by HDF5 clients. **NETCDF4** files use the version 4 disk format (HDF5) and use the new features of the version 4 API. The **netCDF4** module can read and write files in any of these formats. When creating a new file, the format may be specified using the format keyword in the Dataset constructor. The default format is **NETCDF4**. To see how a given file is formatted, you can examine the data_model attribute. Closing the netCDF file is accomplished via the close method of the Dataset instance. ([source](http://unidata.github.io/netcdf4-python/#section1)).

In [5]:
# Open a dataset
from netCDF4 import Dataset
dataset = Dataset("/Volumes/Data HD/Documents/MINT/Climate/netCDFTutorial/test.nc")
print(dataset.data_model)

NETCDF3_64BIT_OFFSET


Typing `dataset` will give you an overview of what's in the file.

In [6]:
dataset

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_64BIT_OFFSET data model, file format NETCDF3):
    Conventions: CF-1.6
    history: 2018-02-26 22:26:35 GMT by grib_to_netcdf-2.6.0: grib_to_netcdf /data/data01/scratch/_mars-atls18-a562cefde8a29a7288fa0b8b7f9413f7-u5uxdA.grib -o /data/data02/scratch/_grib2netcdf-atls00-a82bacafb5c306db76464bc7e824bb75-DDcsCC.nc -utime
    dimensions(sizes): longitude(320), latitude(161), time(12)
    variables(dimensions): float32 [4mlongitude[0m(longitude), float32 [4mlatitude[0m(latitude), int32 [4mtime[0m(time), int16 [4mtp[0m(time,latitude,longitude)
    groups: 

In [None]:
# Close a dataset
dataset.close()

### <a name='dimensions'> Getting the list of attributes, dimensions, and variables </a>

The following function has been updated to Python 3.5 ([source](http://schubert.atmos.colostate.edu/~cslocum/netcdf_example.html))

In [7]:
import datetime as dt  # Python standard library datetime  module
import numpy as np

def ncdump(nc_fid, verb=True):
    '''
    ncdump outputs dimensions, variables and their attribute information.
    The information is similar to that of NCAR's ncdump utility.
    ncdump requires a valid instance of Dataset.

    Parameters
    ----------
    nc_fid : netCDF4.Dataset
        A netCDF4 dateset object
    verb : Boolean
        whether or not nc_attrs, nc_dims, and nc_vars are printed

    Returns
    -------
    nc_attrs : list
        A Python list of the NetCDF file global attributes
    nc_dims : list
        A Python list of the NetCDF file dimensions
    nc_vars : list
        A Python list of the NetCDF file variables
    '''
    def print_ncattr(key):
        """
        Prints the NetCDF file attributes for a given key

        Parameters
        ----------
        key : unicode
            a valid netCDF4.Dataset.variables key
        """
        try:
            print("\t\ttype:", repr(nc_fid.variables[key].dtype))
            for ncattr in nc_fid.variables[key].ncattrs():
                print('\t\t%s:' % ncattr,\
                      repr(nc_fid.variables[key].getncattr(ncattr)))
        except KeyError:
            print("\t\tWARNING: %s does not contain variable attributes" % key)

    # NetCDF global attributes
    nc_attrs = nc_fid.ncattrs()
    if verb:
        print("NetCDF Global Attributes:")
        for nc_attr in nc_attrs:
            print('\t%s:' % nc_attr, repr(nc_fid.getncattr(nc_attr)))
    nc_dims = [dim for dim in nc_fid.dimensions]  # list of nc dimensions
    # Dimension shape information.
    if verb:
        print("NetCDF dimension information:")
        for dim in nc_dims:
            print("\tName:", dim) 
            print("\t\tsize:", len(nc_fid.dimensions[dim]))
            print_ncattr(dim)
    # Variable information.
    nc_vars = [var for var in nc_fid.variables]  # list of nc variables
    if verb:
        print("NetCDF variable information:")
        for var in nc_vars:
            if var not in nc_dims:
                print('\tName:', var)
                print("\t\tdimensions:", nc_fid.variables[var].dimensions)
                print("\t\tsize:", nc_fid.variables[var].size)
                print_ncattr(var)
    return nc_attrs, nc_dims, nc_vars

Now get the attributes, dimensions and variables contained in the netCDF file.

In [8]:
dataset_attrs, dataset_dims, dataset_vars = ncdump(dataset, verb=True)

NetCDF Global Attributes:
	Conventions: 'CF-1.6'
	history: '2018-02-26 22:26:35 GMT by grib_to_netcdf-2.6.0: grib_to_netcdf /data/data01/scratch/_mars-atls18-a562cefde8a29a7288fa0b8b7f9413f7-u5uxdA.grib -o /data/data02/scratch/_grib2netcdf-atls00-a82bacafb5c306db76464bc7e824bb75-DDcsCC.nc -utime'
NetCDF dimension information:
	Name: longitude
		size: 320
		type: dtype('float32')
		units: 'degrees_east'
		long_name: 'longitude'
	Name: latitude
		size: 161
		type: dtype('float32')
		units: 'degrees_north'
		long_name: 'latitude'
	Name: time
		size: 12
		type: dtype('int32')
		units: 'hours since 1900-01-01 00:00:0.0'
		long_name: 'time'
		calendar: 'gregorian'
NetCDF variable information:
	Name: tp
		dimensions: ('time', 'latitude', 'longitude')
		size: 618240
		type: dtype('int16')
		scale_factor: 6.624043998318357e-07
		add_offset: 0.021704342564889928
		_FillValue: -32767
		missing_value: -32767
		units: 'm'
		long_name: 'Total precipitation'


The global attibutes tells us that the name of the variables follow the CF (Climate and Forecast) metadata conventions, which have been mapped to the GSN ontology. 

To learn more about the CF convention: [http://cfconventions.org](http://cfconventions.org)

The CF standard Name Table can be viewed here: [http://cfconventions.org/Data/cf-standard-names/49/build/cf-standard-name-table.html](http://cfconventions.org/Data/cf-standard-names/49/build/cf-standard-name-table.html)

### <a name='varnetCDF'> Importing Variables </a>