# Reading from HDF5
___

HDF5 is the **Hierarchical data format**. Each file is composed of:
* *groups*, containing zero or more data sets and metadata, i.e.
  * a *group header* with group name and list of attributes
  * a *group symbol table* with a list of objects in group
* *datasets*, or multidimensional arrays of data elements with metadata, composed of
  * a *header* with name, datatype, dataspace, and storage layout
  * a *data array* with the data
  
These are used for storing large datasets (like model weights in deep learning libraries!).

___

Useful links:

* [HDF Group](http://www.hdfgroup.org/)
* [H5 Vignette](http://www.bioconductor.org/packages/release/bioc/vignettes/rhdf5/inst/doc/rhdf5.pdf)

In [1]:
library(rhdf5)

In [3]:
created_file = h5createFile("example.h5")

In [5]:
created_file

Very good.

In [6]:
created = h5createGroup("example.h5", "foo")
created = h5createGroup("example.h5", "bar")
created = h5createGroup("example.h5", "foo/foobar")

The return value for each of these calls is simply returning either a "TRUE" or "FALSE depending on whether the operation was carried out succesfully or not.

In [7]:
?h5createGroup

0,1
h5createGroup {rhdf5},R Documentation

0,1
file,"The filename (character) of the file in which the dataset will be located. For advanced programmers it is possible to provide an object of class H5IdComponent representing a H5 location identifier (file or group). See H5Fcreate, H5Fopen, H5Gcreate, H5Gopen to create an object of this kind."
group,"The name of the new group. The name can contain a hierarchy of groupnames, e.g. 'group1/group2/newgroup', but the function will fail if the top level group do not exists."


In [8]:
h5ls("example.h5")

Unnamed: 0,group,name,otype,dclass,dim
0,/,bar,H5I_GROUP,,
1,/,foo,H5I_GROUP,,
2,/foo,foobar,H5I_GROUP,,


## Basic input and output

In [19]:
A = matrix(1:10,nr=5,nc=2)
h5write(A, "example.h5", "foo/A")
print(A)
h5ls("example.h5")

     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10


Unnamed: 0,group,name,otype,dclass,dim
0,/,bar,H5I_GROUP,,
1,/,df,H5I_DATASET,COMPOUND,5
2,/,foo,H5I_GROUP,,
3,/foo,A,H5I_DATASET,INTEGER,5 x 2
4,/foo,foobar,H5I_GROUP,,
5,/foo/foobar,B,H5I_DATASET,FLOAT,5 x 2 x 2


In [20]:
B = array(seq(0.1,2.0,by=0.1),dim=c(5,2,2))
attr(B, "scale") <- "liter"
print(B)
h5write(B, "example.h5", "foo/foobar/B")
h5ls("example.h5")

, , 1

     [,1] [,2]
[1,]  0.1  0.6
[2,]  0.2  0.7
[3,]  0.3  0.8
[4,]  0.4  0.9
[5,]  0.5  1.0

, , 2

     [,1] [,2]
[1,]  1.1  1.6
[2,]  1.2  1.7
[3,]  1.3  1.8
[4,]  1.4  1.9
[5,]  1.5  2.0

attr(,"scale")
[1] "liter"


Unnamed: 0,group,name,otype,dclass,dim
0,/,bar,H5I_GROUP,,
1,/,df,H5I_DATASET,COMPOUND,5
2,/,foo,H5I_GROUP,,
3,/foo,A,H5I_DATASET,INTEGER,5 x 2
4,/foo,foobar,H5I_GROUP,,
5,/foo/foobar,B,H5I_DATASET,FLOAT,5 x 2 x 2


Or we can write to the top-level group...

In [12]:
df = data.frame(1L:5L, seq(0,1,length.out=5),
               c("ab","cde","fghi","a","s"), stringsAsFactors=FALSE)
print(df)

  X1L.5L seq.0..1..length.out...5. c..ab....cde....fghi....a....s..
1      1                      0.00                               ab
2      2                      0.25                              cde
3      3                      0.50                             fghi
4      4                      0.75                                a
5      5                      1.00                                s


In [13]:
h5write(df,"example.h5","df")
h5ls("example.h5")

Unnamed: 0,group,name,otype,dclass,dim
0,/,bar,H5I_GROUP,,
1,/,df,H5I_DATASET,COMPOUND,5
2,/,foo,H5I_GROUP,,
3,/foo,A,H5I_DATASET,INTEGER,5 x 2
4,/foo,foobar,H5I_GROUP,,
5,/foo/foobar,B,H5I_DATASET,FLOAT,5 x 2 x 2


In [14]:
readA = h5read("example.h5","foo/A")
readB = h5read("example.h5","foo/foobar/B")
readdf = h5read("example.h5","df")

## Writing and Reading Chunks

In [21]:
print(A)

     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10


In [16]:
h5write(c(12,13,14),"example.h5","foo/A",index=list(1:3,1))
h5read("example.h5", "foo/A")

0,1
12,6
13,7
14,8
4,9
5,10


In [None]:
k