Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable blob caching for MultiGetBlob in RocksDB #10272

Closed
wants to merge 21 commits into from

Conversation

gangliao
Copy link
Contributor

@gangliao gangliao commented Jun 28, 2022

Summary:

  • Enabled blob caching for MultiGetBlob in RocksDB
  • Refactored MultiGetBlob logic and interface in RocksDB
  • Cleaned up Version::MultiGetBlob() and moved 'blob'-related code snippets into BlobSource
  • Add End-to-end test cases in db_blob_basic_test and also add unit tests in blob_source_test

This task is a part of #10156

Summary:

- Enabled blob caching for MultiGetBlob in RocksDB
- Refactored MultiGetBlob logic and interface in RocksDB
- Cleaned up Version::MultiGetBlob() and moved 'blob'-related code snippets into BlobSource

This task is a part of facebook#10156
@gangliao gangliao mentioned this pull request Jun 28, 2022
14 tasks
table/multiget_context.h Outdated Show resolved Hide resolved
table/multiget_context.h Outdated Show resolved Hide resolved
db/blob/blob_source.h Outdated Show resolved Hide resolved
Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @gangliao !

db/blob/blob_file_reader_test.cc Outdated Show resolved Hide resolved
db/blob/blob_file_reader_test.cc Outdated Show resolved Hide resolved
db/blob/blob_file_reader_test.cc Outdated Show resolved Hide resolved
db/blob/blob_source.cc Outdated Show resolved Hide resolved
db/blob/blob_source.cc Outdated Show resolved Hide resolved
db/blob/blob_source.h Outdated Show resolved Hide resolved
db/blob/blob_source.cc Outdated Show resolved Hide resolved
db/version_set.cc Outdated Show resolved Hide resolved
db/version_set.cc Outdated Show resolved Hide resolved
table/multiget_context.h Outdated Show resolved Hide resolved
db/blob/blob_source.h Outdated Show resolved Hide resolved
db/blob/blob_source.h Outdated Show resolved Hide resolved
db/blob/blob_source.cc Outdated Show resolved Hide resolved
db/blob/blob_file_reader.h Outdated Show resolved Hide resolved
db/blob/blob_source.cc Outdated Show resolved Hide resolved
db/version_set.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @gangliao ! Looks pretty awesome 🎉

if (!IsValidBlobOffset(offset, key_size, value_size, file_size_)) {
*blob_req->status = Status::Corruption("Invalid blob offset");
*blob_reqs[i]->status = Status::Corruption("Invalid blob offset");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bit more to this: with this change, it is no longer guaranteed that read_reqs[i] corresponds to blob_reqs[i] because we only create FSReadRequests for blobs that pass the offset and compression type checks. We would also want to make sure we do not overwrite these Corruptions with something else when we loop over blob_reqs below

Copy link
Contributor Author

@gangliao gangliao Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; this is a tricky situation, and we can likely use unordered map or even a vector to save the original index. In that case, we can still identify the mapping between them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just brainstorming here but we might also be able to use a bitmask to track which requests failed this initial validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be overthinking. We don't need any auxiliary structure since that info already in blob_req[i]->status

db/blob/blob_file_reader.h Outdated Show resolved Hide resolved
db/blob/blob_source.cc Outdated Show resolved Hide resolved
db/blob/blob_source_test.cc Outdated Show resolved Hide resolved
db/version_set.cc Show resolved Hide resolved
db/version_set.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @gangliao !

@facebook-github-bot
Copy link
Contributor

@gangliao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Jul 9, 2022
Summary:
Update HISTORY.md for blob cache.  Implementation can be found from Github issue #10156 (or Github PRs #10155, #10178, #10225, #10198, and #10272).

Pull Request resolved: #10328

Reviewed By: riversand963

Differential Revision: D37732514

Pulled By: gangliao

fbshipit-source-id: 4c942a41c07914bfc8db56a0d3cf4d3e53d5963f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants