Consider allowing TreeFile to use external blob storage #46

ecton · 2022-03-25T14:31:21Z

Currently, TreeFiles store blobs/chunks in the same file that nodes are written to. When compacting a database, all of the blobs that are alive must be transferred to the new file.

Over time, this is a lot of wasted IOPS if your application is never deleting data. In this day and age, a common way to operate is to "store everything" and only delete once it becomes a problem.

The main idea of this issue is simple:

Change all of the tree file operations to use a new trait ChunkStorage to write non-node chunks. This may require adding a new parameter to each operation.
Allow specifying a ChunkStorage implementation when creating a TreeFile/Roots instance.
If no ChunkStorage is specified, chunks should be written in-line like they are today.
The ChunkStorage implementation can use 63 bits of information to note where the chunk is stored. The 64th bit will be used by Nebari to note that the chunk is stored externally.

The hard part will be compaction. Nebari doesn't keep track of chunks. The way compaction works currently is data is copied when its referenced, otherwise its skipped. To achieve the goal of "not rewriting everything", the ChunkStorage implementation needs to receive enough information to be able to determine on its own how to compact itself, or opt not to. At this time, I'm not sure of a good way to solve this.

More intelligent compaction can be achieved by using TreeFile to implement ChunkStorage. While this causes extra overhead, the TreeFile could return unique "chunk IDs" that are stable, but the actual location on disk can be moved around. This is where the idea of "tiered" storage comes in, as this TreeFile could do many things including:

Embed statistics about read frequency of each key, allowing compaction to group frequently used data closer together, or moving infrequently accessed keys to slower storage.
Subdivide storage into segments that can be defragmented independently.

ecton added the enhancement New feature or request label Mar 25, 2022

ecton changed the title ~~Consider allowing TreeFile to external blob storage~~ Consider allowing TreeFile to use external blob storage Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider allowing TreeFile to use external blob storage #46

Consider allowing TreeFile to use external blob storage #46

ecton commented Mar 25, 2022

Consider allowing TreeFile to use external blob storage #46

Consider allowing TreeFile to use external blob storage #46

Comments

ecton commented Mar 25, 2022