-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce scope of compression dictionary to single SST #4952
Conversation
This does not include charging memory usage to block cache. I need to catch up on testing before adding another feature. Hopefully we can add that feature in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
I am not sure about automated test. We can try something like the following, though I find it hard to imagine it'll be worth its complexity/maintenance.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a few minor comments. Thanks @ajkr for the improvement.
@ajkr has updated the pull request. Re-import the pull request |
67c5b15
to
17ce894
Compare
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.
So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:
BlockBasedTableBuilder
has a new state when dictionary compression is in-use:kBuffered
. In that state it accumulates uncompressed data in-memory wheneverAdd
is called.BlockBasedTableBuilder::Finish
, aBlockBasedTableBuilder
moves to thekUnbuffered
state. The transition (EnterUnbuffered()
) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In thekUnbuffered
state, aBlockBasedTableBuilder
behaves the same as before -- blocks are compressed/written out as soon as they fill up.max_dict_bytes
orzstd_max_train_bytes
. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.Test Plan:
max_dict_bytes
andzstd_max_train_bytes
: