# Converting HDF5 to CSV

While HDF5 is a format used for storing data values, CSV files are very easy to read and understand. Further, you can directly import them in `pandas` and use them as needed.

In this notebook, we'll explore the **January, 2020 GPM data**, identify the values we want to record and create a CSV file.

## Load libraries

We need the `h5py` package to read the HDF5 file. Further, we'll use the `pandas` package to create a final dataset and save it to a CSV file.

In [1]:
import h5py
import pandas as pd

## Load dataset

We have one data file inside **/data** directory. I'll read the same using the `h5py` package.

In [3]:
dataset = h5py.File('data/gpm_jan_2020.HDF5', 'r')

## Explore dataset

Once the dataset is loaded in, it acts like a Python dictionary. So, we'll start by looking at the various key value pairs and based on them, identify all the values we want to keep.

In [4]:
dataset.keys()

<KeysViewHDF5 ['Grid']>

It appears the HDF5 file has a **Grid** inside it. So, let's see the key value pairs inside it.

In [6]:
grid = dataset['Grid']
grid.keys()

<KeysViewHDF5 ['nv', 'lonv', 'latv', 'time', 'lon', 'lat', 'time_bnds', 'lon_bnds', 'lat_bnds', 'precipitation', 'randomError', 'gaugeRelativeWeighting', 'probabilityLiquidPrecipitation', 'precipitationQualityIndex']>

We observe that there are a lot of values in this data file. Here, I'm most interested in the `lon`, `lat` and `precipitation` values. Let's take a brief look at them.

### Longitude

In [21]:
print("Longitude data: {}".format(grid['lon']))
print("Longitude data attributes: {}".format(list(grid['lon'].attrs)))

Longitude data: <HDF5 dataset "lon": shape (3600,), type "<f4">
Longitude data attributes: ['DimensionNames', 'Units', 'units', 'standard_name', 'LongName', 'bounds', 'axis', 'CLASS', 'REFERENCE_LIST']


The shape indicates this field has 3600 values.

### Latitude

In [18]:
grid['lat'].attrs['units']

b'degrees_north'

The shape indicates this field has 1800 values.

### Precipitation

In [16]:
print(grid['precipitation'].attrs['units'])

b'mm/hr'