You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, TreeFiles store blobs/chunks in the same file that nodes are written to. When compacting a database, all of the blobs that are alive must be transferred to the new file.
Over time, this is a lot of wasted IOPS if your application is never deleting data. In this day and age, a common way to operate is to "store everything" and only delete once it becomes a problem.
The main idea of this issue is simple:
Change all of the tree file operations to use a new trait ChunkStorage to write non-node chunks. This may require adding a new parameter to each operation.
Allow specifying a ChunkStorage implementation when creating a TreeFile/Roots instance.
If no ChunkStorage is specified, chunks should be written in-line like they are today.
The ChunkStorage implementation can use 63 bits of information to note where the chunk is stored. The 64th bit will be used by Nebari to note that the chunk is stored externally.
The hard part will be compaction. Nebari doesn't keep track of chunks. The way compaction works currently is data is copied when its referenced, otherwise its skipped. To achieve the goal of "not rewriting everything", the ChunkStorage implementation needs to receive enough information to be able to determine on its own how to compact itself, or opt not to. At this time, I'm not sure of a good way to solve this.
More intelligent compaction can be achieved by using TreeFile to implement ChunkStorage. While this causes extra overhead, the TreeFile could return unique "chunk IDs" that are stable, but the actual location on disk can be moved around. This is where the idea of "tiered" storage comes in, as this TreeFile could do many things including:
Embed statistics about read frequency of each key, allowing compaction to group frequently used data closer together, or moving infrequently accessed keys to slower storage.
Subdivide storage into segments that can be defragmented independently.
The text was updated successfully, but these errors were encountered:
Currently, TreeFiles store blobs/chunks in the same file that nodes are written to. When compacting a database, all of the blobs that are alive must be transferred to the new file.
Over time, this is a lot of wasted IOPS if your application is never deleting data. In this day and age, a common way to operate is to "store everything" and only delete once it becomes a problem.
The main idea of this issue is simple:
The hard part will be compaction. Nebari doesn't keep track of chunks. The way compaction works currently is data is copied when its referenced, otherwise its skipped. To achieve the goal of "not rewriting everything", the ChunkStorage implementation needs to receive enough information to be able to determine on its own how to compact itself, or opt not to. At this time, I'm not sure of a good way to solve this.
More intelligent compaction can be achieved by using TreeFile to implement ChunkStorage. While this causes extra overhead, the TreeFile could return unique "chunk IDs" that are stable, but the actual location on disk can be moved around. This is where the idea of "tiered" storage comes in, as this TreeFile could do many things including:
The text was updated successfully, but these errors were encountered: