immutable sorted string table library
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
librsf @ c243389


mtbl: immutable sorted string table library

mtbl is a C library implementation of the Sorted String Table (SSTable)
data structure, based on the SSTable implementation in the open source
Google LevelDB library <>. An SSTable is a
file containing an immutable mapping of keys to values. Keys are stored in
sorted order, with an index at the end of the file allowing keys to be
located quickly.

mtbl is not a database library. It does not provide an updateable
key-value data store, but rather exposes primitives for creating,
searching and merging SSTable files. Unlike databases which use the
SSTable data structure internally as part of their data store, management
of SSTable files -- creation, merging, deletion, combining of search
results from multiple SSTables -- is left to the discretion of the mtbl
library user.

mtbl SSTable files consist of a sequence of data blocks containing sorted
key-value pairs, where keys and values are arbitrary byte arrays. Data
blocks are optionally compressed using zlib or the Snappy library
<>. The data blocks are followed by an
index block, allowing for fast searches over the keyspace.

The basic mtbl interface is the writer, which receives a sequence of
key-value pairs in sorted order with no duplicate keys, and writes them to
data blocks in the SSTable output file. An index containing offsets to data
blocks and the last key in each data block is buffered in memory until the
writer object is closed, at which point the index is written to the end of
the SSTable file. This allows SSTable files to be written in a single pass
with sequential I/O operations only.
Once written, SSTable files can be searched using the mtbl reader
interface. Searches can retrieve key-value pairs based on an exact key
match, a key prefix match, or a key range. Results are retrieved using a
simple iterator interface.

The mtbl library also provides two utility interfaces which facilitate a
sort-and-merge workflow for bulk data loading. The sorter interface
receives arbitrarily ordered key-value pairs and provides them in sorted
order, buffering to disk as needed. The merger interface reads from
multiple SSTables simultaneously and provides the key-value pairs from the
combined inputs in sorted order. Since mtbl does not allow duplicate keys
in an SSTable file, both the sorter and merger interfaces require a
caller-provided merge function which will be called to merge multiple
values for the same key. These interfaces also make use of sequential I/O
operations only.