In [37]:
import os
import numpy as np

import tables

from tables import *

In [38]:
tmp_path = 'tmp'

# Declaring a Column Descriptor
Now, imagine that we have a particle detector and we want to create a table object in order to save data retrieved from it. You need first to define the table, the number of columns it has, what kind of object is contained in each column, and so on.

Our particle detector has a TDC (Time to Digital Converter) counter with a dynamic range of 8 bits and an ADC (Analogical to Digital Converter) with a range of 16 bits. For these values, we will define 2 fields in our record object called TDCcount and ADCcount. We also want to save the grid position in which the particle has been detected, so we will add two new fields called grid_i and grid_j. Our instrumentation also can obtain the pressure and energy of the particle. The resolution of the pressure-gauge allows us to use a single-precision float to store pressure readings, while the energy value will need a double-precision float. Finally, to track the particle we want to assign it a name to identify the kind of the particle it is and a unique numeric identifier. So we will add two more fields: name will be a string of up to 16 characters, and idnumber will be an integer of 64 bits (to allow us to store records for extremely large numbers of particles).

Having determined our columns and their types, we can now declare a new Particle class that will contain all this information:

In [39]:
class Particle(IsDescription):
    name      = StringCol(16)   # 16-character String
    idnumber  = Int64Col()      # Signed 64-bit integer
    ADCcount  = UInt16Col()     # Unsigned short integer
    TDCcount  = UInt8Col()      # unsigned byte
    grid_i    = Int32Col()      # 32-bit integer
    grid_j    = Int32Col()      # 32-bit integer
    pressure  = Float32Col()    # float  (single-precision)
    energy    = Float64Col()    # double (double-precision)

In [40]:
class GabesRandomGroup(IsDescription):
    name        = StringCol(16)
    array       = Int8Col()
    atomic_int  = Int8Col()
    regular_int = Int8Col()

This definition class is self-explanatory. Basically, you declare a class variable for each field you need. As its value you assign an instance of the appropriate Col subclass, according to the kind of column defined (the data type, the length, the shape, etc). See the The Col class and its descendants for a complete description of these subclasses. See also Supported data types in PyTables for a list of data types supported by the Col constructor.

From now on, we can use Particle instances as a descriptor for our detector data table. We will see later on how to pass this object to construct the table. But first, we must create a file where all the actual data pushed into our table will be saved.

# Creating a PyTables file from scratch
Use the top-level **`open_file()`** function to create a PyTables file:

In [41]:
h5file = open_file(
    os.path.join(tmp_path, 'tutorial1.h5'), 
    mode="w", 
    title="Test file",
    root_uep="/"
)

h5file

File(filename=tmp/tutorial1.h5, title='Test file', mode='w', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) 'Test file'

**`open_file()`** is one of the objects imported by the **`from tables import *`** statement. Here, we are saying that we want to create a new file in the current working directory called “tutorial1.h5” in “w”rite mode and with an descriptive title string (“Test file”). This function attempts to open the file, and if successful, returns the File (see The File Class) object instance h5file. The root of the object tree is specified in the instance’s root attribute.

# Creating a new group
Now, to better organize our data, we will create a group called detector that branches from the root node. We will save our particle data table in this group:

In [42]:
group = h5file.create_group(
    where="/",
    name='detector',
    title='Detector information',
    filters=None,
    createparents=False)
group

/detector (Group) 'Detector information'
  children := []

In [43]:
gabes_group = h5file.create_group(
    where='/',
    name='gabes_group',
    title='gabes random group')
gabes_group

/gabes_group (Group) 'gabes random group'
  children := []

Here, we have taken the File instance h5file and invoked its File.create_group() method to create a new group called detector branching from “/” (another way to refer to the h5file.root object we mentioned above). This will create a new Group (see The Group class) object instance that will be assigned to the variable group.

# Creating a new table
Let’s now create a Table (see The **[Table class](https://www.pytables.org/usersguide/libref/structured_storage.html#tableclassdescr)**) object as a branch off the newly-created group. We do that by calling the **`File.create_table()`** method of the h5file object:

In [44]:
table = h5file.create_table(
    where=group,
    name='readout',
    description=Particle,
    title="Readout example",
    filters=None,
    expectedrows=10000,
    chunkshape=None,
    byteorder=None,
    createparents=False,
    obj=None,
    track_times=True,
)
table

/detector/readout (Table(0,)) 'Readout example'
  description := {
  "ADCcount": UInt16Col(shape=(), dflt=0, pos=0),
  "TDCcount": UInt8Col(shape=(), dflt=0, pos=1),
  "energy": Float64Col(shape=(), dflt=0.0, pos=2),
  "grid_i": Int32Col(shape=(), dflt=0, pos=3),
  "grid_j": Int32Col(shape=(), dflt=0, pos=4),
  "idnumber": Int64Col(shape=(), dflt=0, pos=5),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=6),
  "pressure": Float32Col(shape=(), dflt=0.0, pos=7)}
  byteorder := 'little'
  chunkshape := (1394,)

In [45]:
gabes_table = h5file.create_table(
    where=gabes_group,
    name='gabes_table',
    description=GabesRandomGroup,
    title="Gabe Table Example",
    filters=None,
    expectedrows=10000,
    chunkshape=None,
    byteorder=None,
    createparents=False,
    obj=None,
    track_times=True,
)
gabes_table

/gabes_group/gabes_table (Table(0,)) 'Gabe Table Example'
  description := {
  "array": Int8Col(shape=(), dflt=0, pos=0),
  "atomic_int": Int8Col(shape=(), dflt=0, pos=1),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=2),
  "regular_int": Int8Col(shape=(), dflt=0, pos=3)}
  byteorder := 'little'
  chunkshape := (3449,)

We create the Table instance under group. We assign this table the node name “readout”. The Particle class declared before is the description parameter (to define the columns of the table) and finally we set “Readout example” as the Table title. With all this information, a new Table instance is created and assigned to the variable table.

If you are curious about how the object tree looks right now, simply print the File instance variable h5file, and examine the output:

In [46]:
print(h5file)

tmp/tutorial1.h5 (File) 'Test file'
Last modif.: 'Mon Sep 23 10:38:05 2019'
Object Tree: 
/ (RootGroup) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'
/gabes_group (Group) 'gabes random group'
/gabes_group/gabes_table (Table(0,)) 'Gabe Table Example'



As you can see, a dump of the object tree is displayed. It’s easy to see the Group and Table objects we have just created. If you want more information, just type the variable containing the File instance:

In [47]:
print(h5file)
print('---------------------------------------')
h5file

tmp/tutorial1.h5 (File) 'Test file'
Last modif.: 'Mon Sep 23 10:38:05 2019'
Object Tree: 
/ (RootGroup) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'
/gabes_group (Group) 'gabes random group'
/gabes_group/gabes_table (Table(0,)) 'Gabe Table Example'

---------------------------------------


File(filename=tmp/tutorial1.h5, title='Test file', mode='w', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'
  description := {
  "ADCcount": UInt16Col(shape=(), dflt=0, pos=0),
  "TDCcount": UInt8Col(shape=(), dflt=0, pos=1),
  "energy": Float64Col(shape=(), dflt=0.0, pos=2),
  "grid_i": Int32Col(shape=(), dflt=0, pos=3),
  "grid_j": Int32Col(shape=(), dflt=0, pos=4),
  "idnumber": Int64Col(shape=(), dflt=0, pos=5),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=6),
  "pressure": Float32Col(shape=(), dflt=0.0, pos=7)}
  byteorder := 'little'
  chunkshape := (1394,)
/gabes_group (Group) 'gabes random group'
/gabes_group/gabes_table (Table(0,)) 'Gabe Table Example'
  description := {
  "array": Int8Col(shape=(), dflt=0, pos=0),
  "atomic_int": Int8Col(shape=(), dflt=0, pos=1),
  "name": S

In [48]:
group

/detector (Group) 'Detector information'
  children := ['readout' (Table)]

In [49]:
gabes_group

/gabes_group (Group) 'gabes random group'
  children := ['gabes_table' (Table)]

More detailed information is displayed about each object in the tree. Note how Particle, our table descriptor class, is printed as part of the readout table description information. In general, you can obtain much more information about the objects and their children by just printing them. That introspection capability is very useful, and I recommend that you use it extensively.

The time has come to fill this table with some values. First we will get a pointer to the Row (see The Row class) instance of this table instance:

In [50]:
particle = table.row
particle

/detector/readout.row (Row), pointing to row #0

The row attribute of table points to the Row instance that will be used to write data rows into the table. We write data simply by assigning the Row instance the values for each row as if it were a dictionary (although it is actually an extension class), using the column names as keys.

Below is an example of how to write rows:

In [51]:
for i in range(10):
    particle['name']  = 'Particle: %6d' % (i)
    particle['TDCcount'] = i % 256
    particle['ADCcount'] = (i * 256) % (1 << 16)
    particle['grid_i'] = i
    particle['grid_j'] = 10 - i
    particle['pressure'] = float(i*i)
    particle['energy'] = float(particle['pressure'] ** 4)
    particle['idnumber'] = i * (2 ** 34)
    # Insert a new particle record
    particle.append()

# Flush the table buffer
table.flush()

It’s the same method we used to fill a new table. PyTables knows that this table is on disk, and when you add new records, they are appended to the end of the table.

If you look carefully at the code you will see that we have used the table.row attribute to create a table row and fill it with the new values. Each time that its append() method is called, the actual row is committed to the output buffer and the row pointer is incremented to point to the next table record. When the buffer is full, the data is saved on disk, and the buffer is reused again for the next cycle.

Caveat emptor: Do not forget to always call the flush() method after a write operation, or else your tables will not be updated!

Let’s have a look at some rows in the modified table and verify that our new data has been appended:

In [52]:
print('%19s'%table.colnames[6], end='')
print('%14s'%table.colnames[7], end='')
print('%14s'%table.colnames[2], end='')
print('%10s'%table.colnames[3], end='')
print('%9s'%table.colnames[4], end='')
print('%12s'%table.colnames[1], end='\n')
    
for r in table.iterrows():
    print("%-16s | %11.1f | %11.4g | %6d | %6d | %8d \|" % 
          (r['name'], r['pressure'], r['energy'], 
           r['grid_i'], r['grid_j'], r['TDCcount']))

               name      pressure        energy    grid_i   grid_j    TDCcount
b'Particle:      0' |         0.0 |           0 |      0 |     10 |        0 \|
b'Particle:      1' |         1.0 |           1 |      1 |      9 |        1 \|
b'Particle:      2' |         4.0 |         256 |      2 |      8 |        2 \|
b'Particle:      3' |         9.0 |        6561 |      3 |      7 |        3 \|
b'Particle:      4' |        16.0 |   6.554e+04 |      4 |      6 |        4 \|
b'Particle:      5' |        25.0 |   3.906e+05 |      5 |      5 |        5 \|
b'Particle:      6' |        36.0 |    1.68e+06 |      6 |      4 |        6 \|
b'Particle:      7' |        49.0 |   5.765e+06 |      7 |      3 |        7 \|
b'Particle:      8' |        64.0 |   1.678e+07 |      8 |      2 |        8 \|
b'Particle:      9' |        81.0 |   4.305e+07 |      9 |      1 |        9 \|


# Reading (and selecting) data in a table
Ok. We have our data on disk, and now we need to access it and select from specific columns the values we are interested in. See the example below:

In [53]:
table = h5file.root.detector.readout
pressure = [x['pressure'] for x in table.iterrows() if x['TDCcount'] > 3 and 20 <= x['pressure'] < 50]
pressure

[25.0, 36.0, 49.0]

The first line creates a “shortcut” to the readout table deeper on the object tree. As you can see, we use the natural naming schema to access it. We also could have used the h5file.get_node() method, as we will do later on.
You will recognize the last two lines as a Python list comprehension. It loops over the rows in table as they are provided by the Table.iterrows() iterator. The iterator returns values until all the data in table is exhausted. These rows are filtered using the expression:

```python 
['TDCcount'] > 3 and 20 <= x['pressure'] < 50
```

So, we are selecting the values of the pressure column from filtered records to create the final list and assign it to pressure variable.
We could have used a normal for loop to accomplish the same purpose, but I find comprehension syntax to be more compact and elegant.
PyTables do offer other, more powerful ways of performing selections which may be more suitable if you have very large tables or if you need very high query speeds. They are called in-kernel and indexed queries, and you can use them through Table.where() and other related methods.
Let’s use an in-kernel selection to query the name column for the same set of cuts:

In [54]:
names = [ x['name'] for x in table.where("""(TDCcount > 3) & (20 <= pressure) & (pressure < 50)""") ]
names

[b'Particle:      5', b'Particle:      6', b'Particle:      7']

In-kernel and indexed queries are not only much faster, but as you can see, they also look more compact, and are among the greatests features for PyTables, so be sure that you use them a lot. See [Condition Syntax](https://www.pytables.org/usersguide/condition_syntax.html#condition-syntax) and [Accelerating your searches](https://www.pytables.org/usersguide/optimization.html#searchoptim) for more information on in-kernel and indexed selections.

**Note:**
A special care should be taken when the query condition includes string literals. Indeed Python 2 string literals are string of bytes while Python 3 strings are unicode objects.
With reference to the above definition of Particle it has to be noted that the type of the “name” column do not change depending on the Python version used (of course). It always corresponds to strings of bytes.
Any condition involving the “name” column should be written using the appropriate type for string literals in order to avoid TypeErrors.
Suppose one wants to get rows corresponding to specific particle names.
The code below will work fine in Python 2 but will fail with a TypeError in Python 3:

```python
condition = '(name == "Particle:      5") | (name == "Particle:      7")'
for record in table.where(condition):  # TypeError in Python3
...     # do something with "record"
```
The reason is that in Python 3 “condition” implies a comparison between a string of bytes (“name” column contents) and an unicode literals.
The correct way to write the condition is:
```python
condition = '(name == b"Particle:      5") | (name == b"Particle:      7")'
```
That’s enough about selections for now. The next section will show you how to save these selected results to a file.

# Creating new array objects
In order to separate the selected data from the mass of detector data, we will create a new group columns branching off the root group. Afterwards, under this group, we will create two arrays that will contain the selected data. First, we create the group:

In [55]:
gcolumns = h5file.create_group(h5file.root, "columns", "Pressure and Name")

Note that this time we have specified the first parameter using natural naming (h5file.root) instead of with an absolute path string (“/”).
Now, create the first of the two Array objects we’ve just mentioned:

In [56]:
h5file.create_array(where=gcolumns, 
                    name='pressure', 
                    obj=pressure, 
                    title="Pressure column selection")

/columns/pressure (Array(3,)) 'Pressure column selection'
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'python'
  byteorder := 'little'
  chunkshape := None

We already know the first two parameters of the File.create_array() methods (these are the same as the first two in create_table): they are the parent group where Array will be created and the Array instance name. The third parameter is the object we want to save to disk. In this case, it is a NumPy array that is built from the selection list we created before. The fourth parameter is the title.
Now, we will save the second array. It contains the list of strings we selected before: we save this object as-is, with no further conversion:

In [57]:
h5file.create_array(gcolumns, 'name', names, "Name column selection")

/columns/name (Array(3,)) 'Name column selection'
  atom := StringAtom(itemsize=16, shape=(), dflt=b'')
  maindim := 0
  flavor := 'python'
  byteorder := 'irrelevant'
  chunkshape := None

As you can see, `File.create_array()` accepts names (which is a regular Python list) as an object parameter. Actually, it accepts a variety of different regular objects (see create_array()) as parameters. The flavor attribute (see the output above) saves the original kind of object that was saved. Based on this flavor, PyTables will be able to retrieve exactly the same object from disk later on.

Note that in these examples, the `create_array` method returns an Array instance that is not assigned to any variable. Don’t worry, this is intentional to show the kind of object we have created by displaying its representation. The Array objects have been attached to the object tree and saved to disk, as you can see if you print the complete object tree:

In [60]:
print(h5file)

tmp/tutorial1.h5 (File) 'Test file'
Last modif.: 'Mon Sep 23 10:38:09 2019'
Object Tree: 
/ (RootGroup) 'Test file'
/columns (Group) 'Pressure and Name'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector (Group) 'Detector information'
/detector/readout (Table(10,)) 'Readout example'
/gabes_group (Group) 'gabes random group'
/gabes_group/gabes_table (Table(0,)) 'Gabe Table Example'



# Closing the file and looking at its content
To finish this first tutorial, we use the close method of the h5file File object to close the file before exiting Python:

In [80]:
h5file.close()

You have now created your first PyTables file with a table and two arrays. You can examine it with any generic HDF5 tool, such as h5dump or h5ls. Here is what the tutorial1.h5 looks like when read with the h5ls program. 

In [82]:
! h5ls -rd tmp/tutorial1.h5

/                        Group
/columns                 Group
/columns/name            Dataset {3}
    Data:
        (0) "Particle:      5", "Particle:      6", "Particle:      7"
/columns/pressure        Dataset {3}
    Data:
        (0) 25, 36, 49
/detector                Group
/detector/readout        Dataset {10/Inf}
    Data:
        (0) {0, 0, 0, 0, 10, 0, "Particle:      0", 0},
        (1) {256, 1, 1, 1, 9, 17179869184, "Particle:      1", 1},
        (2) {512, 2, 256, 2, 8, 34359738368, "Particle:      2", 4},
        (3) {768, 3, 6561, 3, 7, 51539607552, "Particle:      3", 9},
        (4) {1024, 4, 65536, 4, 6, 68719476736, "Particle:      4", 16},
        (5) {1280, 5, 390625, 5, 5, 85899345920, "Particle:      5", 25},
        (6) {1536, 6, 1679616, 6, 4, 103079215104, "Particle:      6", 36},
        (7) {1792, 7, 5764801, 7, 3, 120259084288, "Particle:      7", 49},
        (8) {2048, 8, 16777216, 8, 2, 137438953472, "Particle:      8", 64},
        (

Here’s the output as displayed by the “ptdump” PyTables utility (located in utils/ directory).

In [87]:
! ptdump tmp/tutorial1.h5

/ (RootGroup) 'Test file'
/columns (Group) 'Pressure and Name'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector (Group) 'Detector information'
/detector/readout (Table(10,)) 'Readout example'
/gabes_group (Group) 'gabes random group'
/gabes_group/gabes_table (Table(0,)) 'Gabe Table Example'


You can pass the -v or -d options to ptdump if you want more verbosity. Try them out!