# Opening and reading NetCDF and GRIB File

## Table of Contents
1. [What is the purpose of this Notebook](#purpose)
2. [Example data](#data)
3. [Opening and reading NetCDF files](#netcdf)  
    3.1. [Package requirements](#packagenetCDF)  
    3.2. [Opening and closing a NetCDF file](#opennetCDF)  
    3.3. [Getting the list of attributes, dimensions, and variables](#dimensions)  
    3.4. [Importing variables](#varnetCDF) 
4. [Opening and reading datasets across multiple NetCDF files](#MFnetcdf)
    4.1. [Package requirements](#packageMFnetcdf)
    4.2. [Opening multiple NetCDF files](#opennetcdfs)
    4.3. [Printing the list of attributes, dimensions, and variables](#MFdimensions)
    4.4. [Importing Variables](#varnetCDFs)
5. [Opening and reading GRIB files](#GRIB)  
    5.1. [Package requirements](#packageGRIB)  
    5.2. [Opening a GRIB file](#openGRIB)  
    5.3. [Getting the list of variables and associated properties](#properties) 
    5.4. [Importing variables](#varGRIB)

## <a name="purpose"> What is the purpose of this Notebook?</a>

This interactive Jupyter Notebook guides the reader through the steps of opening, reading, and importing variables from a file in the NetCDF or GRIB format.  

To know more about NetCDF, visit [https://www.unidata.ucar.edu/software/netcdf/](https://www.unidata.ucar.edu/software/netcdf/).  
To know more about GRIB, visit [https://www.wmo.int/pages/prog/www/WDM/Guides/Guide-binary-2.html](https://www.wmo.int/pages/prog/www/WDM/Guides/Guide-binary-2.html).

## <a name="data"> Example data</a>

This Notebook makes use of the same dataset downloaded in both the NetCDF and GRIB format. Please note that the data is originally in GRIB format and transformed upon download.  

The data consists of monthly (obtained from daily means) Total Precipitation for the year 2010 obtained from ECMWF ERA-20CM. The data can be downloaded [here](http://apps.ecmwf.int/datasets/data/era20cm-edmo/levtype=sfc/?month_years=2010&number=0&param=228.128) (Using ensemble 0). The two files are included in the folder containing this Notebook.

<div class="alert alert-warning" role="alert" style="margin: 10px">
<p>**NOTE**</p>
<p> You need to register to access these data from the ECMWF portal</p>
</div>  

 

## <a name="netcdf"> Opening and reading NetCDF files </a>
### <a name="packagenetCDF"> Package requirements </a>

This example makes use of the [netCDF4 Python package](https://github.com/Unidata/netcdf4-python), which is available through Pypi: `pip install netcdf4`

For a single dataset, use the module `Dataset`.

### <a name="opennetCDF"> Opening and closing a NetCDF file </a>

To create a netCDF file from python, you simply call the Dataset constructor. This is also the method used to open an existing netCDF file. If the file is open for write access (**mode='w', 'r+' or 'a'**), you may write any type of data including new dimensions, groups, variables and attributes. netCDF files come in five flavors (**NETCDF3_CLASSIC**, **NETCDF3_64BIT_OFFSET**, **NETCDF3_64BIT_DATA**, **NETCDF4_CLASSIC**, and **NETCDF4**). **NETCDF3_CLASSIC** was the original netcdf binary format, and was limited to file sizes less than 2 Gb. **NETCDF3_64BIT_OFFSET** was introduced in version 3.6.0 of the library, and extended the original binary format to allow for file sizes greater than 2 Gb. **NETCDF3_64BIT_DATA** is a new format that requires version 4.4.0 of the C library - it extends the **NETCDF3_64BIT_OFFSET** binary format to allow for unsigned/64 bit integer data types and 64-bit dimension sizes. **NETCDF3_64BIT** is an alias for **NETCDF3_64BIT_OFFSET**. **NETCDF4_CLASSIC** files use the version 4 disk format (HDF5), but omits features not found in the version 3 API. They can be read by netCDF 3 clients only if they have been relinked against the netCDF 4 library. They can also be read by HDF5 clients. **NETCDF4** files use the version 4 disk format (HDF5) and use the new features of the version 4 API. The **netCDF4** module can read and write files in any of these formats. When creating a new file, the format may be specified using the format keyword in the Dataset constructor. The default format is **NETCDF4**. To see how a given file is formatted, you can examine the data_model attribute. Closing the netCDF file is accomplished via the close method of the Dataset instance. ([source](http://unidata.github.io/netcdf4-python/#section1)).

In [5]:
# Open a dataset
from netCDF4 import Dataset
dataset = Dataset("/Volumes/Data HD/Documents/MINT/Climate/netCDFTutorial/test.nc")
print(dataset.data_model)

NETCDF3_64BIT_OFFSET


Typing `dataset` will give you an overview of what's in the file.

In [6]:
dataset

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_64BIT_OFFSET data model, file format NETCDF3):
    Conventions: CF-1.6
    history: 2018-02-26 22:26:35 GMT by grib_to_netcdf-2.6.0: grib_to_netcdf /data/data01/scratch/_mars-atls18-a562cefde8a29a7288fa0b8b7f9413f7-u5uxdA.grib -o /data/data02/scratch/_grib2netcdf-atls00-a82bacafb5c306db76464bc7e824bb75-DDcsCC.nc -utime
    dimensions(sizes): longitude(320), latitude(161), time(12)
    variables(dimensions): float32 [4mlongitude[0m(longitude), float32 [4mlatitude[0m(latitude), int32 [4mtime[0m(time), int16 [4mtp[0m(time,latitude,longitude)
    groups: 

In [7]:
# Close a dataset
dataset.close()

### <a name='dimensions'> Getting the list of attributes, dimensions, and variables </a>

The following function has been updated to Python 3.5 ([source](http://schubert.atmos.colostate.edu/~cslocum/netcdf_example.html))

In [4]:
import datetime as dt  # Python standard library datetime  module
import numpy as np

def ncdump(nc_fid, verb=True):
    '''
    ncdump outputs dimensions, variables and their attribute information.
    The information is similar to that of NCAR's ncdump utility.
    ncdump requires a valid instance of Dataset.

    Parameters
    ----------
    nc_fid : netCDF4.Dataset
        A netCDF4 dateset object
    verb : Boolean
        whether or not nc_attrs, nc_dims, and nc_vars are printed

    Returns
    -------
    nc_attrs : list
        A Python list of the NetCDF file global attributes
    nc_dims : list
        A Python list of the NetCDF file dimensions
    nc_vars : list
        A Python list of the NetCDF file variables
    '''
    def print_ncattr(key):
        """
        Prints the NetCDF file attributes for a given key

        Parameters
        ----------
        key : unicode
            a valid netCDF4.Dataset.variables key
        """
        try:
            print("\t\ttype:", repr(nc_fid.variables[key].dtype))
            for ncattr in nc_fid.variables[key].ncattrs():
                print('\t\t%s:' % ncattr,\
                      repr(nc_fid.variables[key].getncattr(ncattr)))
        except KeyError:
            print("\t\tWARNING: %s does not contain variable attributes" % key)

    # NetCDF global attributes
    nc_attrs = nc_fid.ncattrs()
    if verb:
        print("NetCDF Global Attributes:")
        for nc_attr in nc_attrs:
            print('\t%s:' % nc_attr, repr(nc_fid.getncattr(nc_attr)))
    nc_dims = [dim for dim in nc_fid.dimensions]  # list of nc dimensions
    # Dimension shape information.
    if verb:
        print("NetCDF dimension information:")
        for dim in nc_dims:
            print("\tName:", dim) 
            print("\t\tsize:", len(nc_fid.dimensions[dim]))
            print_ncattr(dim)
    # Variable information.
    nc_vars = [var for var in nc_fid.variables]  # list of nc variables
    if verb:
        print("NetCDF variable information:")
        for var in nc_vars:
            if var not in nc_dims:
                print('\tName:', var)
                print("\t\tdimensions:", nc_fid.variables[var].dimensions)
                print("\t\tsize:", nc_fid.variables[var].size)
                print_ncattr(var)
    return nc_attrs, nc_dims, nc_vars

Now get the attributes, dimensions and variables contained in the netCDF file.

In [5]:
dataset_attrs, dataset_dims, dataset_vars = ncdump(dataset, verb=True)

NetCDF Global Attributes:
	Conventions: 'CF-1.6'
	history: '2018-02-26 22:26:35 GMT by grib_to_netcdf-2.6.0: grib_to_netcdf /data/data01/scratch/_mars-atls18-a562cefde8a29a7288fa0b8b7f9413f7-u5uxdA.grib -o /data/data02/scratch/_grib2netcdf-atls00-a82bacafb5c306db76464bc7e824bb75-DDcsCC.nc -utime'
NetCDF dimension information:
	Name: longitude
		size: 320
		type: dtype('float32')
		units: 'degrees_east'
		long_name: 'longitude'
	Name: latitude
		size: 161
		type: dtype('float32')
		units: 'degrees_north'
		long_name: 'latitude'
	Name: time
		size: 12
		type: dtype('int32')
		units: 'hours since 1900-01-01 00:00:0.0'
		long_name: 'time'
		calendar: 'gregorian'
NetCDF variable information:
	Name: tp
		dimensions: ('time', 'latitude', 'longitude')
		size: 618240
		type: dtype('int16')
		scale_factor: 6.624043998318357e-07
		add_offset: 0.021704342564889928
		_FillValue: -32767
		missing_value: -32767
		units: 'm'
		long_name: 'Total precipitation'


The global attibutes tells us that the name of the variables follow the CF (Climate and Forecast) metadata conventions, which have been mapped to the GSN ontology. 

To learn more about the CF convention: [http://cfconventions.org](http://cfconventions.org)

The CF standard Name Table can be viewed here: [http://cfconventions.org/Data/cf-standard-names/49/build/cf-standard-name-table.html](http://cfconventions.org/Data/cf-standard-names/49/build/cf-standard-name-table.html)

We need to build a function that looks for the standard_name automatically and matches to the GSN name. If the standard_name is not available, we will need a mapping.


### <a name='varnetCDF'> Importing Variables </a>

The following function automatically opens a file with the netCDF extension, searches the long name against a list of key variables chosen by the user or a computer, and returns the values into a dictionary with keys as long names.

In [1]:
def getNcVar(nc_file, keys):
    ''' Extract variables from a netCDF file.
    
    This function gets the variable contained in a netCDF file 
    and return them into Python nested dictionaries. The first
    dictionary's key contains the longname, while the
    second dictionary contains values, standard name (CF),
    units and the missing data flag.
    
    Args:
        nc_file (str): A name (path) of a netCDF file
        keys (list): A list of keys to fetch the variables according
            to the CF standard
    
    Returns:
        dict_out (dict): A dictionary containing the standard names as keys and
            the associated data as values.
    '''
    from netCDF4 import Dataset
    #Open the netCDF file
    nc_fid = Dataset(nc_file)
    # Get the variable names
    nc_vars = [var for var in nc_fid.variables]
    # Get the longnames for each variables
    nc_vars_longname = []
    #Get the units
    nc_vars_units =[]
    # Get the standard name
    nc_vars_standardname=[]
    #Add corrections if needed
    nc_vars_scale_factor=[]
    nc_vars_add_offset=[]
    # Check the missing value tags
    nc_vars_missing_value=[]
    
    for vars in nc_vars:
        if 'long_name' in nc_fid.variables[vars].ncattrs():
            nc_vars_longname.append(nc_fid.variables[vars].getncattr('long_name'))
        else:
            nc_vars_longname.append(vars)
        if 'units' in nc_fid.variables[vars].ncattrs():
            nc_vars_units.append(nc_fid.variables[vars].getncattr('units'))
        else:
            nc_vars_units.append('NA')
        if 'standard_name' in nc_fid.variables[vars].ncattrs():
            nc_vars_standardname.append(nc_fid.variables[vars].getncattr('standard_name'))
        else:
            nc_vars_standardname.append('NA')    
        if 'scale_factor' in nc_fid.variables[vars].ncattrs():
            nc_vars_scale_factor.append(nc_fid.variables[vars].getncattr('scale_factor'))
        else:
            nc_vars_scale_factor.append(1)
        if 'add_offset' in nc_fid.variables[vars].ncattrs():
            nc_vars_add_offset.append(nc_fid.variables[vars].getncattr('add_offset'))
        else:
            nc_vars_add_offset.append(0) 
        if 'missing_value' in nc_fid.variables[vars].ncattrs(): 
            nc_vars_missing_value.append(nc_fid.variables[vars].getncattr('missing_value'))
        else:
            nc_vars_missing_value.append('NA')
    # Check for the list against the desired variables and output.
    dict_out ={}
    for name in nc_vars_longname:
        if name in keys:
            f = {'values':[],'units':[],'missing_value':[],'standard_name':{}}
            idx = nc_vars_longname.index(name)
            f['values']=(nc_fid.variables[nc_vars[idx]][:]*nc_vars_scale_factor[idx])\
                +nc_vars_add_offset[idx]
            f['units']=nc_vars_units[idx]
            f['missing_value'] = nc_vars_missing_value[idx]
            f['standard_name'] = nc_vars_standardname[idx]
            dict_out[name] = f
               
    return dict_out

Example:

In [2]:
nc_file = "/Volumes/Data HD/Documents/MINT/Climate/netCDFTutorial/test.nc"
keys = ['latitude','longitude','time','Total precipitation'] 

dict_out = getNcVar(nc_file, keys)

dict_out

{'Total precipitation': {'missing_value': -32767,
  'standard_name': 'NA',
  'units': 'm',
  'values': array([[[0.02170434, 0.02170434, 0.02170434, ..., 0.02170434,
           0.02170434, 0.02170434],
          [0.02170434, 0.02170434, 0.02170434, ..., 0.02170434,
           0.02170434, 0.02170434],
          [0.02170434, 0.02170434, 0.02170434, ..., 0.02170434,
           0.02170434, 0.02170434],
          ...,
          [0.02170434, 0.02170434, 0.02170434, ..., 0.02170434,
           0.02170434, 0.02170434],
          [0.02170434, 0.02170434, 0.02170434, ..., 0.02170434,
           0.02170434, 0.02170434],
          [0.02170434, 0.02170434, 0.02170434, ..., 0.02170434,
           0.02170434, 0.02170434]],
  
         [[0.02170434, 0.02170434, 0.02170434, ..., 0.02170434,
           0.02170434, 0.02170434],
          [0.02170434, 0.02170434, 0.02170434, ..., 0.02170434,
           0.02170434, 0.02170434],
          [0.02170434, 0.02170434, 0.02170434, ..., 0.02170434,
           0.021

## <a name='MFnetcdf'>Opening and reading datasets across multiple NetCDF files</a>

### <a name='packageMFnetcdf'> Package requirements </a>

This example makes use of the netCDF4 Python package, which is available through Pypi: pip install netcdf4

For a dataset spanning multiple NetCDF files, use the package `MFDataset`.

### <a name='opennetcdfs'> Opening multiple NetCDF files </a>

If you want to read data from a variable that spans multiple netCDF files, you can use the `MFDataset` class to read the data as if it were contained in a single file. Instead of using a single filename to create a `Dataset` instance, create a `MFDataset` instance with either a list of filenames, or a string with a wildcard (which is then converted to a sorted list of files using the python glob module). Variables in the list of files that share the same unlimited dimension are aggregated together, and can be sliced across multiple files. To illustrate this, let's first create a bunch of netCDF files with the same variable (with the same unlimited dimension). The files must in be in **NETCDF3_64BIT_OFFSET**, **NETCDF3_64BIT_DATA**, **NETCDF3_CLASSIC** or **NETCDF4_CLASSIC** format (**NETCDF4** formatted multi-file datasets are not supported) ([source](http://unidata.github.io/netcdf4-python/#section8)).

In [8]:
from netCDF4 import MFDataset
# Just get a list of netCDF files. 
root = "/Volumes/Data HD/Documents/MINT/Climate/netCDFTutorial"
files = ["Oct2010.nc","Nov2010.nc","Dec2010.nc"]

file_names =[]
for name in files:
    file_names.append(root+"/"+name)
    
#Open the file and get the keys for this example
nc_fid = MFDataset(file_names)

nc_fid

<class 'netCDF4._netCDF4.MFDataset'>
root group (NETCDF3_64BIT_OFFSET data model, file format NETCDF3):
    Conventions: CF-1.6
    history: 2018-03-16 21:28:54 GMT by grib_to_netcdf-2.6.0: grib_to_netcdf /data/data02/scratch/_mars-atls04-a82bacafb5c306db76464bc7e824bb75-Fev7A1.grib -o /data/data02/scratch/_grib2netcdf-atls13-a82bacafb5c306db76464bc7e824bb75-9RHCwe.nc -utime
    dimensions = ('longitude', 'latitude', 'time')
    variables = ('longitude', 'latitude', 'time', 'sp', 'tcc', 'u10', 'v10', 't2m', 'd2m', 'al', 'tp', 'skt', 'fsr')
    groups = ()

### <a name='MFdimensions'> Printing the list of attributes, dimensions and variables </a>

The function dumps the netCDF attributes, dimensions and variables.

In [9]:
def MFncdump(nc_fid):
    """
    MFncdump prints dimensions, variables and their attribute info
    
    Args:
        nc_fid: a netCDF file    
    """
        
    # Global attributes
    print("NetCDF Global Attributes: ")
    for name in nc_fid.ncattrs():
        print('\t'+name+": "+getattr(nc_fid,name))

    #Dimension shape information
    print("NetCDF dimension information: ")
    nc_dims = [dim for dim in nc_fid.dimensions]
    for dim in nc_dims:
        print('\t'+dim+': ')
        for attrname in nc_fid.variables[dim].ncattrs():
            print('\t\t'+attrname+": "+getattr(nc_fid.variables[dim], attrname))
    #Variables information
    print("NetCDF variables information: ")
    for name, variable in nc_fid.variables.items():
        print('\t'+name+': ')
        for attrname in variable.ncattrs():
            print('\t\t'+attrname+': '+str(getattr(variable,attrname)))

Example:

In [10]:
MFncdump(nc_fid)

NetCDF Global Attributes: 
	Conventions: CF-1.6
	history: 2018-03-16 21:28:54 GMT by grib_to_netcdf-2.6.0: grib_to_netcdf /data/data02/scratch/_mars-atls04-a82bacafb5c306db76464bc7e824bb75-Fev7A1.grib -o /data/data02/scratch/_grib2netcdf-atls13-a82bacafb5c306db76464bc7e824bb75-9RHCwe.nc -utime
NetCDF dimension information: 
	longitude: 
		units: degrees_east
		long_name: longitude
	latitude: 
		units: degrees_north
		long_name: latitude
	time: 
		units: hours since 1900-01-01 00:00:0.0
		long_name: time
		calendar: gregorian
NetCDF variables information: 
	longitude: 
		units: degrees_east
		long_name: longitude
	latitude: 
		units: degrees_north
		long_name: latitude
	time: 
		units: hours since 1900-01-01 00:00:0.0
		long_name: time
		calendar: gregorian
	sp: 
		scale_factor: 0.22979262356370073
		add_offset: 89749.19760368821
		_FillValue: -32767
		missing_value: -32767
		units: Pa
		long_name: Surface pressure
		standard_name: surface_air_pressure
	tcc: 
		scale_factor: 1.525959474

### <a name='varnetCDFs'> Importing variables </a>

The following function automatically opens files with the netCDF extension, searches the long name against a list of key variables chosen by the user or a computer, and returns the values into a dictionary with keys as long names.

In [11]:
def getMFNcVar(nc_files, keys):
    ''' Extract variables from a dataset across multiple netCDF files.
    
    This function gets the variable contained in a netCDF file 
    and return them into Python nested dictionaries. The first
    dictionary's key contains the longname, while the
    second dictionary contains values, standard name (CF),
    units and the missing data flag.
    
    Args:
        nc_files (list): A list of netCDF files containing the dataset
        keys (list): A list of keys to fetch the variables according
            to the CF standard
    
    Returns:
        dict_out (dict): A dictionary containing the standard names as keys and
            the associated data as values.
    '''
    # Import the package
    from netCDF4 import MFDataset
    # Open the netCDF files
    nc_fid = MFDataset(nc_files)
    # Get the variable names
    nc_vars = [var for var in nc_fid.variables]
    
    #Make empty lists to collect the info
    #longname (should be using the CF conventions)
    nc_vars_longname=[]
    #Units
    nc_vars_units=[]
    # Get the standard name
    nc_vars_standardname=[]
    #Corrections
    nc_vars_scale_factor=[]
    nc_vars_add_offset=[]
    #Missing values
    nc_vars_missing_value=[]
    
    for vars in nc_vars:
        if 'long_name' in nc_fid.variables[vars].ncattrs():
            nc_vars_longname.append(getattr(nc_fid.variables[vars],'long_name'))
        else:
            nc_vars_longname.append(vars)
        if 'units' in nc_fid.variables[vars].ncattrs():
            nc_vars_units.append(getattr(nc_fid.variables[vars],'units'))
        else:
            nc_vars_units.append('NA')
        if 'standard_name' in nc_fid.variables[vars].ncattrs():
            nc_vars_standardname.append(getattr(nc_fid.variables[vars],'standard_name'))
        else:
            nc_vars_standardname.append("NA")    
        if 'scale_factor' in nc_fid.variables[vars].ncattrs():
            nc_vars_scale_factor.append(getattr(nc_fid.variables[vars],'scale_factor'))
        else:
            nc_vars_scale_factor.append(1)
        if 'add_offset' in nc_fid.variables[vars].ncattrs():
            nc_vars_add_offset.append(getattr(nc_fid.variables[vars],'add_offset'))
        else:
            nc_vars_add_offset.append(0) 
        if 'missing_value' in nc_fid.variables[vars].ncattrs(): 
            nc_vars_missing_value.append(getattr(nc_fid.variables[vars],'missing_value'))
        else:
            nc_vars_missing_value.append('NA')
    # Check for the list against the desired variables and output.
    dict_out ={}
    for name in nc_vars_longname:
        if name in keys:
            f = {'values':[],'units':[],'missing_value':[], 'standard_name':{}}
            idx = nc_vars_longname.index(name)
            f['values']=(nc_fid.variables[nc_vars[idx]][:]*nc_vars_scale_factor[idx])\
                +nc_vars_add_offset[idx]
            f['units']=nc_vars_units[idx]
            f['missing_value'] = nc_vars_missing_value[idx]
            f['standard_name'] = nc_vars_standardname[idx]
            dict_out[name] = f
    
    return dict_out  

As an example the keys will be searched again the netCDF files long name

In [13]:
from netCDF4 import MFDataset
# Just get a list of netCDF files. 
root = "/Volumes/Data HD/Documents/MINT/Climate/netCDFTutorial"
files = ["Oct2010.nc","Nov2010.nc","Dec2010.nc"]

file_names =[]
for name in files:
    file_names.append(root+"/"+name)
    
#Open the file and get the keys for this example
nc_fid = MFDataset(file_names)
keys=[]
nc_vars = [var for var in nc_fid.variables]
for vars in nc_vars:
    keys.append(getattr(nc_fid.variables[vars],'long_name'))
    
# Run the function
dict_out=getMFNcVar(file_names,keys)

dict_out

{'10 metre U wind component': {'missing_value': -32767,
  'standard_name': 'NA',
  'units': 'm s**-1',
  'values': array([[[-2.59051159, -2.59051945, -2.59052728, ..., -2.59071312,
           -2.59071436, -2.59071559],
          [-2.5905114 , -2.59051779, -2.59052422, ..., -2.59070276,
           -2.59070523, -2.59070767],
          [-2.59051201, -2.59051721, -2.59052244, ..., -2.59069344,
           -2.59069692, -2.59070036],
          ...,
          [-2.59074356, -2.59074239, -2.59074122, ..., -2.59175969,
           -2.59178496, -2.5918129 ],
          [-2.59075051, -2.59075044, -2.59075041, ..., -2.59175161,
           -2.59177928, -2.59180874],
          [-2.59075746, -2.59075853, -2.59075957, ..., -2.59174482,
           -2.5917746 , -2.59180549]],
  
         [[-2.59081699, -2.59082946, -2.590842  , ..., -2.59071712,
           -2.59071432, -2.59071156],
          [-2.59080517, -2.59081878, -2.59083235, ..., -2.59070825,
           -2.59070627, -2.59070429],
          [-2.590794

## <a name='GRIB'> Opening and reading GRIB files </a>

### <a name='packageGRIB'> Package requirement </a>

This example makes use of the [pygrib](https://github.com/jswhit/pygrib), which is available through conda: `conda install -c conda-forge pygrib`

### <a name = 'openGRIB'> Opening a GRIB file </a>

In [5]:
import pygrib

file = "/Volumes/Data HD/Documents/MINT/Climate/netCDFTutorial/test.grib"

grbs = pygrib.open(file)

### <a name='properties'> Getting the list of variables and associated properties </a>

The variables `grbs` from above contain information at each time step. Therefore, one needs to iterate over `grbs` to get the values and other associated properties at each time step.

For instance, let's take the first value packed in grbs.

In [6]:
grb = grbs[1]

grb

1:Total precipitation:m (avgfc):reduced_gg:surface:level 0:fcst time 3 hrs (avgfc):from 201001010000

To know the available metadata associated with each variable:

In [4]:
grb.keys()

['parametersVersion',
 'UseEcmfConventions',
 'GRIBEX_boustrophedonic',
 'hundred',
 'globalDomain',
 'GRIBEditionNumber',
 'eps',
 'offsetSection0',
 'section0Length',
 'totalLength',
 'editionNumber',
 'WMO',
 'productionStatusOfProcessedData',
 'section1Length',
 'wrongPadding',
 'table2Version',
 'centre',
 'centreDescription',
 'generatingProcessIdentifier',
 'gridDefinition',
 'indicatorOfParameter',
 'parameterName',
 'parameterUnits',
 'indicatorOfTypeOfLevel',
 'pressureUnits',
 'typeOfLevelECMF',
 'typeOfLevel',
 'level',
 'yearOfCentury',
 'month',
 'day',
 'hour',
 'minute',
 'second',
 'unitOfTimeRange',
 'P1',
 'P2',
 'timeRangeIndicator',
 'numberIncludedInAverage',
 'numberMissingFromAveragesOrAccumulations',
 'centuryOfReferenceTimeOfData',
 'subCentre',
 'paramIdECMF',
 'paramId',
 'cfNameECMF',
 'cfName',
 'cfVarNameECMF',
 'cfVarName',
 'unitsECMF',
 'units',
 'nameECMF',
 'name',
 'decimalScaleFactor',
 'setLocalDefinition',
 'dataDate',
 'year',
 'dataTime',
 'jul

### <a name='varGRIB'> Importing variables </a>

This function opens a file with a GRIB extension and places all the values for the variables into a dictionary that can then be exported

In [3]:
def getGribVar(grib_file, keys):
    ''' Extract variables from a GRIB file.
    
    This function gets the variable contained in a GRIB file 
    and return them into Python nested dictionaries. The first
    dictionary's key contains the longname, while the
    second dictionary contains values, the standard CF name,
    units and the missing data flag.
    
    Args:
        grib_file (str): A name (path) of a GRIB file
        keys (list): A list of keys to fetch the variables according
            to the CF standard
    
    Returns:
        dict_out (dict): A dictionary containing the standard names as keys and
            the associated data as values.
    '''
    import pygrib    
    
    grbs = pygrib.open(grib_file)
    
    dict_out={}
    for key in keys:
        vars_values = []
        vars_time = []
        for grb in grbs:
            # Grab the various metadata
            if grb.parameterName == key:
                vars_time.append(grb.validDate)
                vars_values.append((grb.values*grb.scaleValuesBy)+grb.offsetValuesBy)
        # Pack into the dictionary
        if 'latitude' not in dict_out.keys():
            lats,longs = grb.latlons()
            lats = lats[:,0]
            longs = longs[0,:]
            dict_out['latitude']={'values':lats}
            dict_out['longitude']={'values':longs}
            dict_out['time']={'values':vars_time}
        dict_out[key] = {'values':vars_values, 'units':grb.parameterUnits,'missing_values':grb.missingValue, 'standard_name':grb.cfName}    
        
        return dict_out

Example:

In [7]:
dict_out =  getGribVar(file,keys)  

dict_out

{'latitude': {'missing_values': 9999,
  'standard_name': 'unknown',
  'units': 'm',
  'values': []},
 'longitude': {'values': array([  0.   ,   1.125,   2.25 ,   3.375,   4.5  ,   5.625,   6.75 ,
           7.875,   9.   ,  10.125,  11.25 ,  12.375,  13.5  ,  14.625,
          15.75 ,  16.875,  18.   ,  19.125,  20.25 ,  21.375,  22.5  ,
          23.625,  24.75 ,  25.875,  27.   ,  28.125,  29.25 ,  30.375,
          31.5  ,  32.625,  33.75 ,  34.875,  36.   ,  37.125,  38.25 ,
          39.375,  40.5  ,  41.625,  42.75 ,  43.875,  45.   ,  46.125,
          47.25 ,  48.375,  49.5  ,  50.625,  51.75 ,  52.875,  54.   ,
          55.125,  56.25 ,  57.375,  58.5  ,  59.625,  60.75 ,  61.875,
          63.   ,  64.125,  65.25 ,  66.375,  67.5  ,  68.625,  69.75 ,
          70.875,  72.   ,  73.125,  74.25 ,  75.375,  76.5  ,  77.625,
          78.75 ,  79.875,  81.   ,  82.125,  83.25 ,  84.375,  85.5  ,
          86.625,  87.75 ,  88.875,  90.   ,  91.125,  92.25 ,  93.375,
          94