![NASA](http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg)
![DEVELOP](../../DEVELOP_logo.png)

---

# Hierarchical Data Format

### Goddard Space Flight Center

#### October 27, 2016

# Purpose

---

Usually, in programming you have either ASCII or binary data. But, what if you want something structured? For Earth Science, we turn to Hierarchical Data Format files to store our data.

# Two Roads

---

* HDF: A structured file of groups of fields of variables of dimensions.
* netCDF: A simpler HDF file with only one group __(netCDF3 and older do not support groups)__ of fields of variables of dimensions.

# Creating the files

---

In [None]:
import netCDF4 as nc

f = nc.Dataset('test.nc', 'w')
f.close()

In [None]:
import h5py as h5

f = h5.File('test.h5', 'w')
f.close()

# Looking inside

---

```bash
ncdump -h filename.nc

ncdump -h filename.h5
```

# Structure

---

* Groups
* Dimensions
* Variables
* Attributes

# Groups

---

In [None]:
# netCDF
rootgrp = nc.Dataset('test.nc', 'a')
fcstgrp = rootgrp.createGroup('forecasts')
anlgrp = rootgrp.createGroup('analyses')

# we can also create them like folders
fcst1 = rootgrp.createGroup('/forecasts/model1')
fcst2 = rootgrp.createGroup('/forecasts/model2')

rootgrp.close()

In [None]:
# HDF
rootgrp = h5.File('test.h5', 'a')
fcstgrp = rootgrp.create_group('forecasts')
anlgrp = rootgrp.create_group('analyses')

# can again also create like folders
fcst1 = rootgrp.create_group('/forecasts/model1')
fcst2 = rootgrp.create_group('/forecasts/model2')

rootgrp.close()

In [None]:
# let's look inside these:
f = nc.Dataset('test.nc', 'r')
print(f)
#print('\n')
#print(f.groups)
f.close()

print('\n')

f = h5.File('test.h5', 'r')
print(f)
#print(f.keys())
f.close()

# Dimensions

---

In [None]:
# netCDF (netCDF3 can only have 1 unlimmited dimension)
rootgrp = nc.Dataset('test.nc', 'a')
level = rootgrp.createDimension('level', None) # or 0
time = rootgrp.createDimension('time', None) # or 0
lat = rootgrp.createDimension('lat', 73)
lon = rootgrp.createDimension('lon', 144)
rootgrp.close()

In [None]:
# HDF doesn't really have dimensions

In [None]:
# let's look inside these:
f = nc.Dataset('test.nc', 'r')
print(f.dimensions)
f.close()

# Variables

---

In [None]:
import numpy as np
from numpy.random import uniform

In [None]:
# netCDF
rootgrp = nc.Dataset('test.nc', 'a')
times = rootgrp.createVariable('time', 'f8', ('time',))
levels = rootgrp.createVariable('level', 'i4', ('level',))
latitudes = rootgrp.createVariable('latitude', 'f4', ('lat',))
longitudes = rootgrp.createVariable('longitude', 'f4', ('lon',))

temp = rootgrp.createVariable('temp', 'f4', ('time','level','lat','lon',))

# write some sample data
latitudes[:] = np.arange(-90, 91, 2.5)
longitudes[:] = np.arange(-180, 180, 2.5)
levels[:] = [1000., 850., 700., 500., 300., 250., 200., 150., 100., 50.]
temp[0:5, 0:10, :, :] = uniform(size=(5,10,len(latitudes), len(longitudes)))
rootgrp.close()

In [None]:
# HDF (has datasets)
rootgrp = h5.File('test.h5', 'a')
# maxshape is only if you need unlimited dimensions
temp = rootgrp.create_dataset('temp', (5,10, 73, 144,), maxshape=(None, None, 73, 144), dtype='f4')
# here, you just assign data to temp
temp[:] = uniform(size=(5, 10, 73, 144))
rootgrp.close()

In [None]:
# let's look inside these:
rootgrp = nc.Dataset('test.nc', 'a')
print(rootgrp.variables.keys())
print(rootgrp.variables['temp'][0,0,[0,1,2,3],[0,1,2,3]])
rootgrp.close()

rootgrp = h5.File('test.h5', 'r')
print(rootgrp['temp'])
print(rootgrp['temp'][0, 0, 0:4, 0:4])
rootgrp.close()

# Attributes

---

This is just the metadata for the variables/groups.