![NASA](http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg)
![DEVELOP](../../DEVELOP_logo.png)

---

# matplotlib and HDF

### Goddard Space Flight Center

#### March 1, 2017

---

---

# matplotlib

---

* [matplotlib Website](http://matplotlib.org)
* Visualization package of Python (2D/3d) meant to be a replacement for Matlab
* Version 2.0 is most recent (lots of fixes!)
* Earth Science visualizations - typically mixture of matplotlib and other packages for projections ([Basemap](http://matplotlib.org/basemap/), [cartopy](http://scitools.org.uk/cartopy/))
* [PlotLy](http://plot.ly) - javascript alternative
* [How to make beautiful data visualizations in Python with matplotlib](http://www.randalolson.com/2014/06/28/how-to-make-beautiful-data-visualizations-in-python-with-matplotlib/)

### Basic Example

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

#plt.plot(x, y, 'red-circle')
plt.plot(2.5, 4.1, 'ro')

_Note:_ The statement __`%matplotlib inline`__ is only for the notebook and tells the interpreter to display the plot within the notebook.

If you were to put this in a .py script, you will want either of the following:

* __`plt.show()`__ if you are running a script and want to view the plot interactively.
* __`plt.savefig('filename.png')`__ if you want to save the image to a file.

# Drawing a Line

In [None]:
x = [0,1,2,3,4,5]
y = [1,2,3,4,5,6]
plt.plot(x, y)

### Adding Features (Legend)

In [None]:
# plot with legend
plt.plot(x, y, label='a line')
plt.legend(loc=0)

### Multiple Lines

In [None]:
import numpy as np

x = np.linspace(-np.pi, np.pi, 256, endpoint=True)
cos, sin = np.cos(x), np.sin(x)

plt.plot(x, cos, label='cosine')
plt.plot(x, sin, label='sine')
plt.legend()

### Multiple Plots, Single Figure

In [None]:
plt.figure(1)
plt.subplot(211)
plt.plot(x, cos, color='C0') # notice the color name

plt.subplot(212)
plt.plot(x, sin, color='C1')

---

---

# NOTICE

---

__Run this line from your Anaconda Prompt (March 1, 2017)__
```bash
conda update hdf4
```

---

---

# Heirarchical Data Format (HDF)

---

Two types of HDF files:
* [HDF5](http://www.hdfgroup.org): A structured file of groups of fields of variables of dimensions. ([h5py](http://www.h5py.org))
* [netCDF](http://www.unidata.ucar.edu/software/netcdf/): A simpler HDF file with only one group (netCDF3 and older do not support groups) of fields of variables of dimensions. ([netCDF4](http://unidata.github.io/netcdf4-python/))

### Creating Files

In [1]:
import netCDF4 as nc

f = nc.Dataset('test.nc', 'w')
f.close()

In [2]:
import h5py as h5

f = h5.File('test.h5', 'w')
f.close()

### Looking inside

---

From Anaconda Prompt:

```bash
ncdump -h filename.nc

ncdump -h filename.h5
```

### Structure

---

* Groups
* Dimensions
* Variables
* Attributes

### Groups

In [3]:
# netCDF
rootgrp = nc.Dataset('test.nc', 'a')
fcstgrp = rootgrp.createGroup('forecasts')
anlgrp = rootgrp.createGroup('analyses')

# we can also create them like folders
fcst1 = rootgrp.createGroup('/forecasts/model1')
fcst2 = rootgrp.createGroup('/forecasts/model2')

rootgrp.close()

In [4]:
# HDF
rootgrp = h5.File('test.h5', 'a')
fcstgrp = rootgrp.create_group('forecasts')
anlgrp = rootgrp.create_group('analyses')

# can again also create like folders
fcst1 = rootgrp.create_group('/forecasts/model1')
fcst2 = rootgrp.create_group('/forecasts/model2')

rootgrp.close()

In [5]:
# let's look inside these:
f = nc.Dataset('test.nc', 'r')
print(f)
#print('\n')
#print(f.groups)
f.close()

print('\n')

f = h5.File('test.h5', 'r')
print(f)
#print(f.keys())
f.close()

<type 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): 
    variables(dimensions): 
    groups: forecasts, analyses



<HDF5 file "test.h5" (mode r)>


### Dimensions

In [6]:
# netCDF (netCDF3 can only have 1 unlimmited dimension)
rootgrp = nc.Dataset('test.nc', 'a')
level = rootgrp.createDimension('level', None) # or 0
time = rootgrp.createDimension('time', None) # or 0
lat = rootgrp.createDimension('lat', 73)
lon = rootgrp.createDimension('lon', 144)
rootgrp.close()

In [7]:
# HDF doesn't really have dimensions

In [8]:
# let's look inside these:
f = nc.Dataset('test.nc', 'r')
print(f.dimensions)
f.close()

OrderedDict([(u'level', <type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
), (u'time', <type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
), (u'lat', <type 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73
), (u'lon', <type 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144
)])


### Variables

In [10]:
import numpy as np
from numpy.random import uniform

In [11]:
# netCDF
rootgrp = nc.Dataset('test.nc', 'a')
times = rootgrp.createVariable('time', 'f8', ('time',))
levels = rootgrp.createVariable('level', 'i4', ('level',))
latitudes = rootgrp.createVariable('latitude', 'f4', ('lat',))
longitudes = rootgrp.createVariable('longitude', 'f4', ('lon',))

temp = rootgrp.createVariable('temp', 'f4', ('time','level','lat','lon',))

# write some sample data
latitudes[:] = np.arange(-90, 91, 2.5)
longitudes[:] = np.arange(-180, 180, 2.5)
levels[:] = [1000., 850., 700., 500., 300., 250., 200., 150., 100., 50.]
temp[0:5, 0:10, :, :] = uniform(size=(5,10,len(latitudes), len(longitudes)))
rootgrp.close()

In [12]:
# HDF (has datasets)
rootgrp = h5.File('test.h5', 'a')
# maxshape is only if you need unlimited dimensions
temp = rootgrp.create_dataset('temp', (5,10, 73, 144,), maxshape=(None, None, 73, 144), dtype='f4')
# here, you just assign data to temp
temp[:] = uniform(size=(5, 10, 73, 144))
rootgrp.close()

In [13]:
# let's look inside these:
rootgrp = nc.Dataset('test.nc', 'a')
print(rootgrp.variables.keys())
print(rootgrp.variables['temp'][0,0,[0,1,2,3],[0,1,2,3]])
rootgrp.close()

rootgrp = h5.File('test.h5', 'r')
print(rootgrp['temp'])
print(rootgrp['temp'][0, 0, 0:4, 0:4])
rootgrp.close()

[u'time', u'level', u'latitude', u'longitude', u'temp']
[[ 0.64189696  0.93760121  0.57146317  0.80668706]
 [ 0.61306697  0.62045872  0.08019656  0.50558686]
 [ 0.541444    0.43847674  0.64054775  0.87512076]
 [ 0.54317957  0.51097864  0.32800439  0.41129491]]
<HDF5 dataset "temp": shape (5, 10, 73, 144), type "<f4">
[[ 0.41212252  0.04802058  0.88053972  0.18834125]
 [ 0.06324196  0.02606479  0.8399592   0.9500618 ]
 [ 0.8730737   0.61149395  0.36391962  0.07006889]
 [ 0.80869991  0.72067267  0.46312308  0.82502639]]


# Attributes

---

This is just the metadata for the variables/groups.