# Purpose of *PyTables*

http://www.pytables.org/

Provides a more database-like access to HDF5 data including 

- indexing for fast searches
- fast "in-kernel" queries on dataset contents
- custom system to represent data types
- use of types in HDF5 which have no equivalent in NumPy (like enum types)
- redo/undo feature

Data written with *PyTables* is probably only usable with *PyTables*
(customized format on top of HDF5).


# Comparison to h5py

From h5py's perspective: 

http://docs.h5py.org/en/latest/faq.html#what-s-the-difference-between-h5py-and-pytables

From PyTables' perspective: 

http://www.pytables.org/FAQ.html#how-does-pytables-compare-with-the-h5py-project







# Installation

Conda:

    conda install tables
    
Alternatives:
    
     http://www.pytables.org/usersguide/installation.html

Again installation from source can be a tedious, there are also independent repositories with pre-built binaries
for Windows.

# Using custom data types

Short version of example in PyTables documentation:

In [18]:
from tables import IsDescription, StringCol, Int64Col, Float32Col

In [29]:
class Particle(IsDescription):
    name = StringCol(16)
    idnumber = Int64Col()
    pressure = Float32Col()
    

In [96]:
h5 = tables.open_file("example-pytables.h5", mode='w', title='Demonstration')

In [97]:
group = h5.create_group("/", "detector", "This is data from the detector.") # <-- also title here (see attributes)

In [98]:
h5.flush()

In [99]:
table = h5.create_table(group, 'readout', Particle, 'Readout example')

Getting a table to a row:

In [100]:
particle = table.row

Appending rows..

In [101]:
import random
for i in range(1000000):
    particle['name'] = 'part{:6d}'.format(i)
    particle['idnumber'] = i
    particle['pressure'] = 1000*random.random()
    particle.append()

In [102]:
table.flush()
# h5.flush()

# Querying data

In [114]:
table = h5.root.detector.readout

"Regular query" using list comprehension:

In [115]:
%timeit names = [x['name'] for x in table if x['pressure']>=20 and x['pressure']<50 and x['idnumber']<1000]

1 loop, best of 3: 1.43 s per loop


In [116]:
query = "(20 <= pressure) & (pressure < 50) & (idnumber<1000)"

In [117]:
%timeit names = [ x['name'] for x in table.where(query) ]

1000 loops, best of 3: 915 µs per loop


In [108]:
len(names)

30

During development there was much focus on optimizing these "in-kernel" queries

 http://www.pytables.org/usersguide/optimization.html#searchoptim
 
having hardware architecture in mind and using different compression methods.

In [109]:
table.will_query_use_indexing(query)

frozenset()

In [110]:
table.cols.pressure.create_index()
table.cols.idnumber.create_index()

1000000

In [111]:
table.will_query_use_indexing(query)

frozenset({'idnumber', 'pressure'})

In [112]:
%timeit names = [ x['name'] for x in table.where(query) ]

The slowest run took 58.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 913 µs per loop


In [95]:
h5.close()

# Reading and writing meta data


In [118]:
table.attrs

/detector/readout._v_attrs (AttributeSet), 10 attributes:
   [CLASS := 'TABLE',
    FIELD_0_FILL := 0,
    FIELD_0_NAME := 'idnumber',
    FIELD_1_FILL := b'',
    FIELD_1_NAME := 'name',
    FIELD_2_FILL := 0.0,
    FIELD_2_NAME := 'pressure',
    NROWS := 0,
    TITLE := 'Readout example',
    VERSION := '2.7']

In [122]:
table.attrs.laboratory = "Lab1"
import datetime
table.attrs.time = datetime.datetime.now()
table.attrs.temperature = 15.4

In [123]:
table.attrs

/detector/readout._v_attrs (AttributeSet), 13 attributes:
   [CLASS := 'TABLE',
    FIELD_0_FILL := 0,
    FIELD_0_NAME := 'idnumber',
    FIELD_1_FILL := b'',
    FIELD_1_NAME := 'name',
    FIELD_2_FILL := 0.0,
    FIELD_2_NAME := 'pressure',
    NROWS := 0,
    TITLE := 'Readout example',
    VERSION := '2.7',
    laboratory := 'Lab1',
    temperature := 15.4,
    time := datetime.datetime(2017, 4, 12, 11, 26, 2, 604747)]

# Appending Data

Data is appended to the end..

In [135]:
h5.close()
h5 = tables.open_file("example-pytables.h5", mode='a')
table = h5.root.detector.readout
particle = table.row

In [132]:
for i in range(10):
    particle['name'] = 'NEWpart{:6d}'.format(i)
    particle['idnumber'] = i
    particle['pressure'] = -1
    particle.append()
h5.flush()

In [137]:
arr = table.read()

In [139]:
arr[-12:]

array([(999998, b'part999998',  108.82778168),
       (999999, b'part999999',  321.05886841),
       (     0, b'NEWpart     0',   -1.        ),
       (     1, b'NEWpart     1',   -1.        ),
       (     2, b'NEWpart     2',   -1.        ),
       (     3, b'NEWpart     3',   -1.        ),
       (     4, b'NEWpart     4',   -1.        ),
       (     5, b'NEWpart     5',   -1.        ),
       (     6, b'NEWpart     6',   -1.        ),
       (     7, b'NEWpart     7',   -1.        ),
       (     8, b'NEWpart     8',   -1.        ),
       (     9, b'NEWpart     9',   -1.        )], 
      dtype=[('idnumber', '<i8'), ('name', 'S16'), ('pressure', '<f4')])

# Modifying table on disk

In [143]:
table.cols.pressure[-5:] = (-99,-99,-99,-99,-99)

In [144]:
table[-12:]

array([(999998, b'part999998',  108.82778168),
       (999999, b'part999999',  321.05886841),
       (     0, b'NEWpart     0',   -1.        ),
       (     1, b'NEWpart     1',   -1.        ),
       (     2, b'NEWpart     2',   -1.        ),
       (     3, b'NEWpart     3',   -1.        ),
       (     4, b'NEWpart     4',   -1.        ),
       (     5, b'NEWpart     5',  -99.        ),
       (     6, b'NEWpart     6',  -99.        ),
       (     7, b'NEWpart     7',  -99.        ),
       (     8, b'NEWpart     8',  -99.        ),
       (     9, b'NEWpart     9',  -99.        )], 
      dtype=[('idnumber', '<i8'), ('name', 'S16'), ('pressure', '<f4')])

In [133]:
h5.close()

# More Features...

Not everything shown here, e.g. it is also possible to define *nested structures*:
    
http://www.pytables.org/usersguide/tutorials.html#dealing-with-nested-structures-in-tables
        

# Future


Future plans include 

- usage of *h5py* as foundation for *pytables*
- implementation of column-wise tables in *pytables*

Volunteers needed:

https://github.com/numfocus/volunteer-opportunities/blob/master/pytables-projects.md