Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort per-file blob read requests by offset #8953

Closed

Conversation

riversand963
Copy link
Contributor

RandomAccessFileReader::MultiRead() tries to merge requests in direct IO, assuming input IO requests are
sorted by offsets.

Add a test in direct IO mode.

Test plan:
make check

@riversand963 riversand963 force-pushed the multiget-blob-sort-offset branch 2 times, most recently from 9e0da46 to 4c37b0e Compare September 24, 2021 22:03
@riversand963 riversand963 marked this pull request as ready for review September 24, 2021 22:06
@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the fix @riversand963 !

blobs_in_file.begin(), blobs_in_file.end(),
[](const BlobReadRequest& lhs, const BlobReadRequest& rhs) -> bool {
assert(lhs.first.file_number() == rhs.first.file_number());
return lhs.first.offset() <= rhs.first.offset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably does not make any practical difference but I think it would be nice to use strictly less here

db/blob/db_blob_basic_test.cc Show resolved Hide resolved
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->SetCallBack(
"RandomAccessFileReader::MultiRead:AlignedReqs", [&](void* arg) {
auto* aligned_reqs = reinterpret_cast<std::vector<FSReadRequest>*>(arg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast would be sufficient here

db/blob/db_blob_basic_test.cc Show resolved Hide resolved
Comment on lines 309 to 310
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but it would be nice to move these to right after the MultiGet call so they are executed even in the case of assertion failures.

@@ -127,6 +127,189 @@ TEST_F(DBBlobBasicTest, MultiGetBlobs) {
}
}

TEST_F(DBBlobBasicTest, MultiGetWithDirectIO) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this awesome unit test @riversand963 !

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@riversand963
Copy link
Contributor Author

Thanks @ltamasi for the review!

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@riversand963 riversand963 deleted the multiget-blob-sort-offset branch September 25, 2021 15:33
riversand963 added a commit that referenced this pull request Sep 28, 2021
Summary:
`RandomAccessFileReader::MultiRead()` tries to merge requests in direct IO, assuming input IO requests are
sorted by offsets.

Add a test in direct IO mode.

Pull Request resolved: #8953

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D31183546

Pulled By: riversand963

fbshipit-source-id: 5d043ec68e2daa47a3149066150afd41ee3d73e6
@pdillinger
Copy link
Contributor

But you didn't add a comment on BlobFileReader::MultiGetBlob (nor MultiRead?) that they expect sorted inputs?

@riversand963
Copy link
Contributor Author

But you didn't add a comment on BlobFileReader::MultiGetBlob (nor MultiRead?) that they expect sorted inputs?

@pdillinger oops, we should have added comment and checks for MultiGetblob(). For the record, MultiRead() does have a comment about sorted inputs. Maybe we should add checks to emphasize it as well.

@riversand963
Copy link
Contributor Author

#8972

@pdillinger
Copy link
Contributor

For the record, MultiRead() does have a comment about sorted inputs.

I was looking at FSRandomAccessFile::MultiRead, which still does not have such a comment. (When I approved #8972 I was not in a position to verify this claim.)

@ltamasi
Copy link
Contributor

ltamasi commented Sep 30, 2021

For the record, MultiRead() does have a comment about sorted inputs.

I was looking at FSRandomAccessFile::MultiRead, which still does not have such a comment. (When I approved #8972 I was not in a position to verify this claim.)

It's RandomAccessFileReader::MultiRead that relies on the requests being sorted (that's where the merging happens). I don't think FSRandomAccessFile itself has this precondition

@pdillinger
Copy link
Contributor

Implementations of FSRandomAccessFile::MultiRead are provided by both RocksDB and advanced users. Evidently confusion ensues when it is not clear whether sorting is required for the caller or whether the callee must deal with unsorted requests; "ambiguity means the callee must sort it out" is evidently not good enough. (Sorry if my meta-point "sortedness requirements should be clear" was confused by suggesting we add a comment like "inputs must be sorted.")

I don't think ...

Seems you agree it's ambiguous.

@pdillinger
Copy link
Contributor

... or whether the callee must deal with unsorted requests

Especially if we currently always provide sorted requests but make no promise to continue that in the future.

@ltamasi
Copy link
Contributor

ltamasi commented Sep 30, 2021

I don't think ...

Seems you agree it's ambiguous.

No, what I meant above is that I didn't bother to check :) I did now and neither FSRandomAccessFile::MultiRead nor PosixRandomAccessFile::MultiRead has such a requirement.

Personally, I'm perfectly fine with not having a comment for non-requirements. As a developer, I would assume that as long as I'm not told explicitly that requests have to be sorted, they don't have to be.

@pdillinger
Copy link
Contributor

A non-requirement on the caller is a requirement for the callee. Many implicit requirements (e.g. nullability) are often obvious and relatively low-risk (crash). Sortedness is tricky and high-risk (corruption).

Our job is not to have a plausible claim of "someone else's problem" but to mitigate the risk of problems. :) We have a history of people depending on our implementation details (here "you always passed in a sorted list before") and things breaking when we change implementation details.

@riversand963
Copy link
Contributor Author

I am trying to understand where the confusion and risk is.

  • If you use FSRandomAccessFile, then calling FSRandomAccessFile::MultiRead() does not require sorted input.
  • If you use RandomAccessFileReader::MultiRead(), then you need to sort the input, as specified in the comment and asserted in debug mode.

Besides, RandomAccessFileReader is not a public API. RandomAccessFileReader is a user of and wrapper on top of FSRandomAccessFile. Sorting input is a requirement only at the layer of RandomAccessFileReader() to merge requests in direct IO. Therefore, we can add comment about sorted input here, but not for FSRandomAccessFile::MultiRead.

facebook-github-bot pushed a commit that referenced this pull request Nov 3, 2021
Summary:
* Clarify that RocksDB is not exception safe on many of our callback
and extension interfaces
* Clarify FSRandomAccessFile::MultiRead implementations must accept
non-sorted inputs (see #8953)
* Clarify ConcurrentTaskLimiter and SstFileManager are not (currently)
extensible interfaces
* Mark WriteBufferManager as `final`, so it is then clearly not a
callback interface, even though it smells like one
* Clarify TablePropertiesCollector Status returns are mostly ignored

Pull Request resolved: #9080

Test Plan: comments only (except WriteBufferManager final)

Reviewed By: ajkr

Differential Revision: D31968782

Pulled By: pdillinger

fbshipit-source-id: 11b648ce3ce3c5e5bdc02d2eafc7ea4b864bd1d2
yoori pushed a commit to yoori/rocksdb that referenced this pull request Nov 26, 2023
Summary:
`RandomAccessFileReader::MultiRead()` tries to merge requests in direct IO, assuming input IO requests are
sorted by offsets.

Add a test in direct IO mode.

Pull Request resolved: facebook/rocksdb#8953

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D31183546

Pulled By: riversand963

fbshipit-source-id: 5d043ec68e2daa47a3149066150afd41ee3d73e6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants