A Python interface for the BTable serialization format
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
btable
tests
.gitignore
README.md
requirements.txt
setup.py

README.md

btable-py

A Python interface for the BTable serialization format, providing fast, compact binary serialization for large, sparse, labeled 2D numeric datasets ('binary tables').

A BTable is basically a binary representation of a sparse matrix on disk, and the format is inspired by the Compressed Row Storage (CRS) format, saving space by only storing the indices/values of nonzero cells. It is designed in a strictly row-oriented format for efficient iteration, and is not a library for matrix computation or linear algebra.

Note that BTables are not a drop-in replacement for all datasets stored as e.g. CSV: the increases in efficiency is proportional to the sparsity of the dataset. For a pathological fully-nonzero dataset, the space occupied can be much larger than a CSV!

Examples

import btable

# Writing a table
labels = ["login", "view_item", "purchase"]
rows = [[5.0,3.0,1.0], [2.0,0.0,0.0], [0.0,0.0,0.0]]
btable.write("/path/to/my_table.btable", labels, rows)

# Reading a table
bt = btable.BTable("/path/to/my_table.btable")

print(bt.labels)

for row in bt.rows():
  # Process individual row...
  print(row[0:])