# Converting HDF5 to CSV

While HDF5 is a format used for storing data values, CSV files are very easy to read and understand. Further, you can directly import them in `pandas` and use them as needed.

In this notebook, we'll explore the **2017.h5** from the ERA5 test dataset, identify the values we want to record and create a CSV file.

## Load libraries

We need the `h5py` package to read the HDF5 file. Further, we'll use `numpy` to work with arrays and `pandas` package to create a final dataset and save it to a CSV file.

In [3]:
import h5py
import numpy as np
import pandas as pd

## Load dataset

We have one data file inside **/data** directory. I'll read the same using the `h5py` package.

In [3]:
dataset = h5py.File('../data/globus/ERA5_test/2017.h5', 'r')

## Explore dataset

Once the dataset is loaded in, it acts like a Python dictionary. So, we'll start by looking at the various key value pairs and based on them, identify all the values we want to keep.

In [4]:
dataset.keys()

<KeysViewHDF5 ['fields']>

It appears the HDF5 file has a **fields** inside it. So, let's see the key value pairs inside it.

In [6]:
fields = dataset['fields']
fields

<HDF5 dataset "fields": shape (1460, 21, 721, 1440), type "<f4">

### fields

In [12]:
print("fields data: {}".format(fields))
print("fields data attributes: {}".format(list(fields.attrs)))

fields data: <HDF5 dataset "fields": shape (1460, 21, 721, 1440), type "<f4">
fields data attributes: []


21 layers of data temperature etc


In [13]:
365*24/6

1460.0

In [16]:
fields[0,0].shape # first day first feature

(721, 1440)

In [17]:
fields[0,0,0]

array([-0.6474845, -0.6474845, -0.6474845, ..., -0.6474845, -0.6474845,
       -0.6474845], dtype=float32)

In [18]:
fields[0,0,0,0]

-0.6474845