<h1>Table of Contents<span class="tocSkip"></span></h1>


# Introduction
<hr style="border:2px solid black"> </hr>


**What?** How to read csv files



# Imports
<hr style="border:2px solid black"> </hr>

In [1]:
import h5py
import numpy as np

# Creating dataset
<hr style="border:2px solid black"> </hr>

## Creating

In [2]:
d1 = np.random.random(size = (1000,20))
d2 = np.random.random(size = (1000,200))

In [3]:
hf = h5py.File('./data/example.h5', 'w')

In [4]:
hf

<HDF5 file "example.h5" (mode r+)>

In [5]:
hf.create_dataset('dataset_1', data=d1)
hf.create_dataset('dataset_2', data=d2)

<HDF5 dataset "dataset_2": shape (1000, 200), type "<f8">

In [6]:
hf

<HDF5 file "example.h5" (mode r+)>

In [7]:
hf.close()

## Reading

In [8]:
hf = h5py.File('./data/example.h5', 'r')

In [9]:
hf.keys()

<KeysViewHDF5 ['dataset_1', 'dataset_2']>

In [10]:
hf.get('dataset_1')

<HDF5 dataset "dataset_1": shape (1000, 20), type "<f8">

In [11]:
hf.close()

# Creating group
<hr style="border:2px solid black"> </hr>

    
- Groups are the basic container mechanism in a HDF5 file, allowing hierarchical organisation of the data.
- Groups are created similarly to datasets, and datsets are then added using the group object.



## Creating

In [12]:
d1 = np.random.random(size = (100,33))
d2 = np.random.random(size = (100,333))
d3 = np.random.random(size = (100,3333))

In [13]:
hf = h5py.File('data.h5', 'w')

In [14]:
g1 = hf.create_group('group1')

In [15]:
g1.create_dataset('data1',data=d1)
g1.create_dataset('data2',data=d1)

<HDF5 dataset "data2": shape (100, 33), type "<f8">

In [16]:
g2 = hf.create_group('group2/subfolder')

In [17]:
g2.create_dataset('data3',data=d3)

<HDF5 dataset "data3": shape (100, 3333), type "<f8">

In [18]:
group2 = hf.get('group2/subfolder')

In [19]:
group2.items()

ItemsViewHDF5(<HDF5 group "/group2/subfolder" (1 members)>)

In [20]:
group1 = hf.get('group1')

In [21]:
group1.items()

ItemsViewHDF5(<HDF5 group "/group1" (2 members)>)

# Compression
<hr style="border:2px solid black"> </hr>

    
- To save on disk space, while sacrificing read speed, you can compress the data. Just add the compression argument, which can be either gzip, lzf or szip. gzip is the most portable, as it’s available with every HDF5 install, lzf is the fastest but doesn’t compress as effectively as gzip, and szip is a NASA format that is patented up; if you don’t know about it, chances are your organisation doesn’t have the patent, so avoid.
- For gzip you can also specify the additional compression_opts argument, which sets the compression level. The default is 4, but it can be an integer between 0 and 9.



In [22]:
hf = h5py.File('./data/example_compressed.h5', 'w')

hf.create_dataset('dataset_1', data=d1, compression="gzip", compression_opts=9)
hf.create_dataset('dataset_2', data=d2, compression="gzip", compression_opts=9)

hf.close()

# References
<hr style="border:2px solid black"> </hr>


- [h5py: reading and writing HDF5 files in Python](https://www.christopherlovell.co.uk/blog/2016/04/27/h5py-intro.html)



# Requirements
<hr style="border:2px solid black"> </hr>

In [23]:
%load_ext watermark
%watermark -v -iv -m

Python implementation: CPython
Python version       : 3.9.7
IPython version      : 7.29.0

Compiler    : Clang 10.0.0 
OS          : Darwin
Release     : 22.5.0
Machine     : x86_64
Processor   : i386
CPU cores   : 12
Architecture: 64bit

h5py    : 3.6.0
autopep8: 1.6.0
json    : 2.0.9
numpy   : 1.22.1

