http://www.pytables.org/usersguide/tutorials.html

There are eight kinds of types supported by PyTables:

* bool: Boolean (true/false) types. Supported precisions: 8 (default) bits.
* int: Signed integer types. Supported precisions: 8, 16, 32 (default) and 64 bits.
* uint: Unsigned integer types. Supported precisions: 8, 16, 32 (default) and 64 bits.
* float: Floating point types. Supported precisions: 16, 32, 64 (default) bits and extended precision floating point (see note on floating point types).
* complex: Complex number types. Supported precisions: 64 (32+32), 128 (64+64, default) bits and extended precision complex (see note on floating point types).
* string: Raw string types. Supported precisions: 8-bit positive multiples.
* time: Data/time types. Supported precisions: 32 and 64 (default) bits.
* enum: Enumerated types. Precision depends on base type.

In [1]:
from tables import *
import numpy as np
from array import array

Having determined our columns and their types, we can now declare a new Particle class that will contain all this information:

In [2]:
class Particle(IsDescription):
    name      = StringCol(16)   # 16-character String
    idnumber  = Int64Col()      # Signed 64-bit integer
    ADCcount  = UInt16Col()     # Unsigned short integer
    TDCcount  = UInt8Col()      # unsigned byte
    grid_i    = Int32Col()      # 32-bit integer
    grid_j    = Int32Col()      # 32-bit integer
    pressure  = Float32Col()    # float  (single-precision)
    energy    = Float64Col()    # double (double-precision)

Create a file tutorial1.h5

In [3]:
h5file = open_file("tutorial1.h5", mode = "w", title = "Test file")

Now, to better organize our data, we will create a group called detector that branches from the root node. We will save our particle data table in this group:

In [4]:
group = h5file.create_group("/", 'detector', 'Detector information')

Let’s now create a Table (see The Table class) object as a branch off the newly-created group. We do that by calling the File.create_table() method of the h5file object:

In [7]:
table = h5file.create_table(group, 'readout', Particle, "Readout example")

We create the Table instance under group. We assign this table the node name “readout”. The Particle class declared before is the description parameter (to define the columns of the table) and finally we set “Readout example” as the Table title. With all this information, a new Table instance is created and assigned to the variable table.

In [8]:
print(h5file)

tutorial1.h5 (File) 'Test file'
Last modif.: 'Tue Dec  5 09:09:07 2017'
Object Tree: 
/ (RootGroup) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'



In [9]:
h5file

File(filename=tutorial1.h5, title='Test file', mode='w', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'
  description := {
  "ADCcount": UInt16Col(shape=(), dflt=0, pos=0),
  "TDCcount": UInt8Col(shape=(), dflt=0, pos=1),
  "energy": Float64Col(shape=(), dflt=0.0, pos=2),
  "grid_i": Int32Col(shape=(), dflt=0, pos=3),
  "grid_j": Int32Col(shape=(), dflt=0, pos=4),
  "idnumber": Int64Col(shape=(), dflt=0, pos=5),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=6),
  "pressure": Float32Col(shape=(), dflt=0.0, pos=7)}
  byteorder := 'little'
  chunkshape := (1394,)

The time has come to fill this table with some values. First we will get a pointer to the Row (see The Row class) instance of this table instance:

In [10]:
particle = table.row

We write data simply by assigning the Row instance the values for each row as if it were a dictionary (although it is actually an extension class), using the column names as keys.

In [38]:
for i in range(10):
    particle['name']  = 'Particle: %6d' % (i)
    particle['TDCcount'] = i % 256
    particle['ADCcount'] = (i * 256) % (1 << 16)
    particle['grid_i'] = i
    particle['grid_j'] = 10 - i
    particle['pressure'] = float(i*i)
    particle['energy'] = float(particle['pressure'] ** 4)
    particle['idnumber'] = i * (2 ** 34)
    # Insert a new particle record
    particle.append()

This code should be easy to understand. The lines inside the loop just assign values to the different columns in the Row instance particle (see The Row class). A call to its append() method writes this information to the table I/O buffer.

After we have processed all our data, we should flush the table’s I/O buffer if we want to write all this data to disk. We achieve that by calling the table.flush() method:

In [39]:
table.flush()

## Reading (and selecting) data in a table
Ok. We have our data on disk, and now we need to access it and select from specific columns the values we are interested in. See the example below:

In [40]:
table = h5file.root.detector.readout
pressure = [x['pressure'] for x in table.iterrows() if x['TDCcount'] > 3 and 20 <= x['pressure'] < 50]
pressure

[25.0, 36.0, 49.0, 25.0, 36.0, 49.0]

PyTables do offer other, more powerful ways of performing selections which may be more suitable if you have very large tables or if you need very high query speeds. They are called in-kernel and indexed queries, and you can use them through Table.where() and other related methods.

Let’s use an in-kernel selection to query the name column for the same set of cuts:

In [41]:
names = [ x['name'] for x in table.where("""(TDCcount > 3) & (20 <= pressure) & (pressure < 50)""") ]
names

[b'Particle:      5',
 b'Particle:      6',
 b'Particle:      7',
 b'Particle:      5',
 b'Particle:      6',
 b'Particle:      7']

## Creating new array objects
In order to separate the selected data from the mass of detector data, we will create a new group columns branching off the root group. Afterwards, under this group, we will create two arrays that will contain the selected data. First, we create the group:

In [42]:
gcolumns = h5file.create_group(h5file.root, "columns", "Pressure and Name")

In [45]:
h5file.create_array(gcolumns, 'pressure', array(pressure), "Pressure column selection")

TypeError: array() argument 1 must be a unicode character, not list

## Closing the file and looking at its content
To finish this first tutorial, we use the close method of the h5file File object to close the file before exiting Python:

In [46]:
h5file.close()

To browse data checkout https://anaconda.org/conda-forge/vitables