-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate blob file writing with the flush logic #7345
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@ltamasi has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ltamasi for the PR. Left a few minor comments.
@@ -67,6 +70,7 @@ BlobFileBuilder::BlobFileBuilder( | |||
assert(fs_); | |||
assert(immutable_cf_options_); | |||
assert(file_options_); | |||
assert(blob_file_paths_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also assert blob_file_paths_.empty()
? Is there a case when it is not empty here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will add another assertion to that effect.
@ltamasi has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@@ -79,6 +81,7 @@ class BlobFileBuilder { | |||
std::string column_family_name_; | |||
Env::IOPriority io_priority_; | |||
Env::WriteLifeTimeHint write_hint_; | |||
std::vector<std::string>* blob_file_paths_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: since we assert blob_file_paths_.empty() && blob_file_additions_.empty()
, would it be easier to make these two members belong to BlobFileBuilder
?
Furthermore, in non-debug mode, the assertion becomes no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't make these members is that they outlive BlobFileBuilder
and I wanted to avoid copying them (or having to introduce unconventional "read once" type of accessors that transfer them to the caller).
… blobs during flush
This reverts commit b5399afac76647e59e1e5dfae0ba882d860d50e7.
This reverts commit 669ec298be35a95cdbb4bd8037d2fe2da507c904.
…nd clean them up on failure
…BlobError dtor to prevent object lifecycle issues
@ltamasi has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Similarly to how #7345 integrated blob file writing into the flush process, the patch adds support for writing blob files to the compaction logic. Namely, if `enable_blob_files` is set, large values encountered during compaction are extracted to blob files and replaced with blob indexes. The resulting blob files are then logged to the MANIFEST as part of the compaction job's `VersionEdit` and added to the `Version` alongside any table files written by the compaction. Any errors during blob file building fail the compaction job. There will be a separate follow-up patch to perform blob garbage collection during compactions. In addition, the patch continues to chip away at the mess around computing various compaction related statistics by eliminating some code duplication and by making the `num_output_files` and `bytes_written` stats more consistent for flushes, compactions, and recovery. Pull Request resolved: #7573 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D24404696 Pulled By: ltamasi fbshipit-source-id: 21216af3a172ad3ce8f85d11cd30923784ae426c
Summary: The patch adds support for writing blob files during flush by integrating `BlobFileBuilder` with the flush logic, most importantly, `BuildTable` and `CompactionIterator`. If `enable_blob_files` is set, large values are extracted to blob files and replaced with references. The resulting blob files are then logged to the MANIFEST as part of the flush job's `VersionEdit` and added to the `Version`, similarly to table files. Errors related to writing blob files fail the flush, and any blob files written by such jobs are immediately deleted (again, similarly to how SST files are handled). In addition, the patch extends the logging and statistics around flushes to account for the presence of blob files (e.g. `InternalStats::CompactionStats::bytes_written`, which is used for calculating write amplification, now considers the blob files as well). Pull Request resolved: facebook#7345 Test Plan: Tested using `make check` and `db_bench`. Reviewed By: riversand963 Differential Revision: D23506369 Pulled By: ltamasi fbshipit-source-id: 646885f22dfbe063f650d38a1fedc132f499a159
Summary: Similarly to how facebook#7345 integrated blob file writing into the flush process, the patch adds support for writing blob files to the compaction logic. Namely, if `enable_blob_files` is set, large values encountered during compaction are extracted to blob files and replaced with blob indexes. The resulting blob files are then logged to the MANIFEST as part of the compaction job's `VersionEdit` and added to the `Version` alongside any table files written by the compaction. Any errors during blob file building fail the compaction job. There will be a separate follow-up patch to perform blob garbage collection during compactions. In addition, the patch continues to chip away at the mess around computing various compaction related statistics by eliminating some code duplication and by making the `num_output_files` and `bytes_written` stats more consistent for flushes, compactions, and recovery. Pull Request resolved: facebook#7573 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D24404696 Pulled By: ltamasi fbshipit-source-id: 21216af3a172ad3ce8f85d11cd30923784ae426c
Summary:
The patch adds support for writing blob files during flush by integrating
BlobFileBuilder
with the flush logic, most importantly,BuildTable
andCompactionIterator
. Ifenable_blob_files
is set, large values are extractedto blob files and replaced with references. The resulting blob files are then
logged to the MANIFEST as part of the flush job's
VersionEdit
andadded to the
Version
, similarly to table files. Errors related to writingblob files fail the flush, and any blob files written by such jobs are immediately
deleted (again, similarly to how SST files are handled). In addition, the patch
extends the logging and statistics around flushes to account for the presence
of blob files (e.g.
InternalStats::CompactionStats::bytes_written
, which isused for calculating write amplification, now considers the blob files as well).
Test Plan:
Tested using
make check
anddb_bench
.