In [None]:
import numpy as np

from tables import *

Now it’s time for a more real-life example (i.e. with errors in the code). We will create two groups that branch directly from the root node, Particles and Events. Then, we will put three tables in each group. In Particles we will put tables based on the Particle descriptor and in Events, the tables based the Event descriptor.

Afterwards, we will provision the tables with a number of records. Finally, we will read the newly-created table `/Events/TEvent3` and select some values from it, using a comprehension list.

Look at the next script (you can find it in `examples/tutorial2.py`). It appears to do all of the above, but it contains some small bugs. Note that this Particle class is not directly related to the one defined in last tutorial; this class is simpler (note, however, the multidimensional columns called pressure and temperature).

We also introduce a new manner to describe a Table as a structured NumPy dtype (or even as a dictionary), as you can see in the Event description. See `File.create_table()` about the different kinds of descriptor objects that can be passed to this method:

# Describe a particle record (Group)

In [None]:
# Describe a particle record
class Particle(IsDescription):
    name        = StringCol(itemsize=16)  # 16-character string
    lati        = Int32Col()              # integer
    longi       = Int32Col()              # integer
    pressure    = Float32Col(shape=(2,3)) # array of floats (single-precision)
    temperature = Float64Col(shape=(2,3)) # array of doubles (double-precision)

In [None]:
# Native NumPy dtype instances are also accepted
Event = np.dtype([
    ("name"     , "S16"),
    ("TDCcount" , np.uint8),
    ("ADCcount" , np.uint16),
    ("xcoord"   , np.float32),
    ("ycoord"   , np.float32)
    ])


# And dictionaries too (this defines the same structure as above)
# Event = {
#     "name"     : StringCol(itemsize=16),
#     "TDCcount" : UInt8Col(),
#     "ADCcount" : UInt16Col(),
#     "xcoord"   : Float32Col(),
#     "ycoord"   : Float32Col(),
#     }

In [None]:
# Open a file in "w"rite mode
fileh = open_file("tmp/tutorial2.h5", mode = "w")

# Get the HDF5 root group
root = fileh.root

# Create the groups:
for groupname in ("Particles", "Events"):
    group = fileh.create_group(root, groupname)

# Create and fill the tables in Particles group

In [None]:
# Now, create and fill the tables in Particles group
gparticles = root.Particles

# Create 3 new tables
for tablename in ("TParticle1", "TParticle2", "TParticle3"):
    # Create a table
    table = fileh.create_table("/Particles", tablename, Particle, "Particles: "+tablename)
    # table = fileh.create_table(gparticles, tablename, Particle, "Particles: "+tablename)


    # Get the record object associated with the table:
    particle = table.row


    # Fill the table with 257 particles
    for i in range(257):
        # First, assign the values to the Particle record
        particle['name'] = 'Particle: %6d' % (i)
        particle['lati'] = i
        particle['longi'] = 10 - i


        ########### Detectable errors start here. Play with them!
        # particle['pressure'] = (i*np.arange(2*3)).reshape((2,4))  # Incorrect
        particle['pressure'] = (i*np.arange(2*3)).reshape((2,3)) # Correct
        ########### End of errors


        particle['temperature'] = (i**2)     # Broadcasting


        # This injects the Record values
        particle.append()


    # Flush the table buffers
    table.flush()

# Now, go for the Events group

In [None]:
# Now, go for Events:
for tablename in ("TEvent1", "TEvent2", "TEvent3"):
    # Create a table in Events group
    table = fileh.create_table(root.Events, tablename, Event, "Events: "+tablename)

    # Get the record object associated with the table:
    event = table.row

    # Fill the table with 257 events
    for i in range(257):
        # First, assign the values to the Event record
        event['name']  = 'Event: %6d' % (i)
        event['TDCcount'] = i % (1<<8)   # Correct range

        ########### Detectable errors start here. Play with them!
        # event['xcoor'] = float(i**2)     # Wrong spelling
        event['xcoord'] = float(i**2)   # Correct spelling
        # event['ADCcount'] = "sss"        # Wrong type
        event['ADCcount'] = i * 2       # Correct type
        ########### End of errors

        event['ycoord'] = float(i)**4

        # This injects the Record values
        event.append()

    # Flush the buffers
    table.flush()

# Read the records from table "/Events/TEvent3" and select some

In [None]:
# Read the records from table "/Events/TEvent3" and select some
table = root.Events.TEvent3
e = [ p['TDCcount'] for p in table if p['ADCcount'] < 20 and 4 <= p['TDCcount'] < 15 ]
p = table.read(len(table)-1, len(table))
print("Last record ==>", p)
print("Selected values ==>", e)
print("Total selected records ==> ", len(e))

In [None]:
# Finally, close the file (this also will flush all the remaining buffers!)
fileh.close()

# Using links for more convenient access to nodes
Links are special nodes that can be used to create additional paths to your existing nodes. PyTables supports three kinds of links: hard links, soft links (aka symbolic links) and external links.

Hard links let the user create additional paths to access another node in the same file, and once created, they are indistinguishable from the referred node object, except that they have different paths in the object tree. For example, if the referred node is, say, a Table object, then the new hard link will become a Table object itself. From this point on, you will be able to access the same Table object from two different paths: the original one and the new hard link path. If you delete one path to the table, you will be able to reach it via the other path.

Soft links are similar to hard links, but they keep their own personality. When you create a soft link to another node, you will get a new SoftLink object that refers to that node. However, in order to access the referred node, you need to dereference it.

Finally, external links are like soft links, with the difference that these are meant to point to nodes in external files instead of nodes in the same file. They are represented by the ExternalLink class and, like soft links, you need to dereference them in order to get access to the pointed node.

# Interactive example
Now we are going to learn how to deal with links. You can find the code used in this section in examples/links.py.

First, let’s create a file with some group structure:

In [1]:
import numpy as np

from tables import *

In [2]:
f1 = open_file('tmp/links1.h5', 'w')
g1 = f1.create_group('/', 'g1')
g2 = f1.create_group(g1, 'g2')

In [3]:
# Now, we will put some datasets on the /g1 and /g1/g2 groups:
a1 = f1.create_carray(g1, 'a1', Int64Atom(), shape=(10000,))
t1 = f1.create_table(g2, 't1', {'f1': IntCol(), 'f2': FloatCol()}, )

In [4]:
# We can start the party now. We are going to create a new group, say /gl, 
# where we will put our links and will start creating one hard link too:
gl = f1.create_group('/', 'gl')
ht = f1.create_hard_link(gl, 'ht', '/g1/g2/t1')  # ht points to t1
print("%s is a hard link to: %s" % (ht, t1))

/gl/ht (Table(0,)) '' is a hard link to: /g1/g2/t1 (Table(0,)) ''


You can see how we’ve created a hard link in `/gl/ht` which is pointing to the existing table in `/g1/g2/t1`. Have look at how the hard link is represented; it looks like a table, and actually, it is an real table. We have two different paths to access that table, the original `/g1/g2/t1` and the new one `/gl/ht`. If we remove the original path we still can reach the table by using the new path:

In [5]:
t1.remove()
print("table continues to be accessible in: ``%s``" % f1.get_node('/gl/ht'))

table continues to be accessible in: ``/gl/ht (Table(0,)) ''``


In [6]:
# So far so good. Now, let’s create a couple of soft links:
la1 = f1.create_soft_link(gl, 'la1', '/g1/a1')  # la1 points to a1
print("%s is a soft link to: %s" % (la1, la1.target))

lt = f1.create_soft_link(gl, 'lt', '/g1/g2/t1')  # lt points to t1
print("%s is a soft link to: %s" % (lt, lt.target))

/gl/la1 (SoftLink) -> /g1/a1 is a soft link to: /g1/a1
/gl/lt (SoftLink) -> /g1/g2/t1 (dangling) is a soft link to: /g1/g2/t1


Okay, we see how the first link `/gl/la1` points to the array `/g1/a1`. Notice how the link prints as a SoftLink, and how the referred node is stored in the target instance attribute. The second link (`/gt/lt`) pointing to `/g1/g2/t1` also has been created successfully, but by better inspecting the string representation of it, we see that is labeled as ‘(dangling)’. Why is this? Well, you should remember that we recently removed the `/g1/g2/t1` path to access table t1. When printing it, the object knows that it points to nowhere and reports this. This is a nice way to quickly know whether a soft link points to an exiting node or not.

So, let’s re-create the removed path to t1 table:

In [7]:
t1 = f1.create_hard_link(g2, 't1', '/gl/ht')
print("%s is not dangling anymore" % (lt,))

/gl/lt (SoftLink) -> /g1/g2/t1 is not dangling anymore


and the soft link is pointing to an existing node now.

Of course, for soft links to serve any actual purpose we need a way to get the pointed node. It happens that soft links are callable, and that’s the way to get the referred nodes back:

In [8]:
plt = lt()
print("dereferred lt node: ``%s``" % plt)

pla1 = la1()
print("dereferred la1 node: ``%s``" % pla1)

dereferred lt node: ``/g1/g2/t1 (Table(0,)) ''``
dereferred la1 node: ``/g1/a1 (CArray(10000,)) ''``


Now, plt is a Python reference to the t1 table while pla1 refers to the a1 array. Easy, uh?
Let’s suppose now that a1 is an array whose access speed is critical for our application. One possible solution is to move the entire file into a faster disk, say, a solid state disk so that access latencies can be reduced quite a lot. However, it happens that our file is too big to fit into our shiny new (although small in capacity) SSD disk. A solution is to copy just the a1 array into a separate file that would fit into our SSD disk. However, our application would be able to handle two files instead of only one, adding significantly more complexity, which is not a good thing.

External links to the rescue! As we’ve already said, external links are like soft links, but they are designed to link objects in external files. Back to our problem, let’s copy the a1 array into a different file:

In [9]:
f2 = open_file('tmp/links2.h5', 'w')
new_a1 = a1.copy(f2.root, 'a1')
f2.close()  # close the other file

In [14]:
#And now, we can remove the existing soft link and create the external link in its place:
la1.remove()
la1 = f1.create_external_link(gl, 'la1', 'links2.h5:/a1')
print("%s is an external link to: %s" % (la1, la1.target))

/gl/la1 (ExternalLink) -> links2.h5:/a1 is an external link to: links2.h5:/a1


In [17]:
#Let’s try dereferring it:
new_a1 = la1()  # dereferrencing la1 returns a1 in links2.h5
print("dereferred la1 node:  %s" % new_a1)

dereferred la1 node:  /a1 (CArray(10000,)) ''


In [18]:
# Well, it seems like we can access the external node. But just to make sure 
# that the node is in the other file:
print("new_a1 file:", new_a1._v_file.filename)

new_a1 file: tmp/links2.h5
