-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow MultiGet users to limit cumulative value size #6826
Allow MultiGet users to limit cumulative value size #6826
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
include/rocksdb/options.h
Outdated
// It limits the maximum cumulative value size of the keys in batch while | ||
// reading through MultiGet. Once the cumulative value size exceeds this value | ||
// then all the remaining keys are returned with status Aborted. | ||
// Defualt: 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you keep 0 as a special value, this comment needs to mention that.
Although our internal checks would be more verbose, I suggest instead using std::numeric_limits<uint64_t>::max() so that the behavior is continuous across the range of settings. (Though this case isn't as weird as -1 for files_size_error_margin.)
Maybe a more descriptive name like "value_size_soft_limit"? I could foresee us adding a hard "value_size_limit" that would apply to Get and MultiGet. A soft limit (allowed to exceed but none after that) obviously doesn't make sense for Get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Yes it makes sense to be consistent with different range of settings. I will make the changes.
db/memtable.cc
Outdated
if (read_options.value_size > 0) { | ||
if (range->GetValueSize() + iter->value->size() > | ||
read_options.value_size) { | ||
while (iter != temp_range.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
range
may not cover all keys in the MultiGet batch, just the ones that could be present in memtable. It would be better to move the logic to abort the request and set the status of remaining keys in MultiGetContext
, so it can set status for all remaining keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using MultiGetContext::value_size to hold the value_size of keys processed it now. VersionSet::MultiGet then checks the MultiGetContext::value_size (updated by MemTable::MultiGet) and then sets the remaining keys status to Abort if size exceeds. So this way memtable keys and sst file keys both are processed.
But it makes sense to not even call VersionSet::MultiGet if size exceeds and MemTable::MultiGet should update the remaining keys. So i will change the logic to do so. Thanks.
db/version_set.cc
Outdated
if (read_options.value_size > 0) { | ||
if (file_range.GetValueSize() + iter->value->size() > | ||
read_options.value_size) { | ||
while (iter != file_range.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
15211a2
to
0f1e6f2
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
for (auto range_iter = range->begin(); range_iter != range->end(); | ||
++range_iter) { | ||
range->MarkKeyDone(range_iter); | ||
*(range_iter->s) = Status::Aborted(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code iterates over all remaining keys in range and set their status to Aborted. I added here because it was simple to add here by iterating whereas in MultiGetContext, I needed to change the index_ as well and code got complicated there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
0f1e6f2
to
5b0d3c8
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Looks good in relation to my earlier review comments. :) Deferring to Anand for final review. |
db/db_impl/db_impl.cc
Outdated
} | ||
|
||
// Set all remaining keys status to abort | ||
if (keys_left && s != Status::TimedOut() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just combine this block with the one in lines 2357-2364 as the logic is the same.
db/version_set.cc
Outdated
range->MarkKeyDone(range_iter); | ||
*(range_iter->s) = Status::Aborted(); | ||
} | ||
iter = file_range.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set s = Status::Aborted()
and use that to break out of the nested loops? Or maybe checking for file_range.empty()
would work, like in line 2045.
5b0d3c8
to
d459955
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
d459955
to
15d9fda
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Addressed comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Some comments inline.
@@ -1960,7 +1962,8 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range, | |||
return; | |||
} | |||
uint64_t batch_size = 0; | |||
for (auto iter = file_range.begin(); iter != file_range.end(); ++iter) { | |||
for (auto iter = file_range.begin(); s.ok() && iter != file_range.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: No need to check Status
here since there is a break
on line 2040.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The break on line 2040 exits the outer loop (for different sst files) and this one is for inner loop that iterates over keys in one sst file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's right! Sorry, misread the code.
@@ -929,6 +929,16 @@ void MemTable::MultiGet(const ReadOptions& read_options, MultiGetRange* range, | |||
|
|||
if (found_final_value) { | |||
iter->value->PinSelf(); | |||
range->AddValueSize(iter->value->size()); | |||
if (range->GetValueSize() > read_options.value_size_soft_limit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this block be after range->MarkKeyDone(iter)
is called? Otherwise, we'll discard the value for this key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not marking the key done whose size exceeds the value_size here so that its status can be set to Abort in the iteration of marking remaining keys to Aborted. But its value is still added to the range->AddValueSize.
Logic: So for eg if 4th keys value exceeds the value_size limit then its value is still added to make the range->GetValueSize() exceeds but its status marked Abort. This way in DBImpl::MultiGet, I know that value_size_limit has already exceeded as range->GetValueSize() > read_options.value_size_soft_limit. If I don't exceed the range->value_size_ then I can't exit from the loop in the DBMultiGet::MultiGet as I don't know if limit has exceeded or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, should the option name be change to value_size_limit
, since its a hard limit rather than a soft limit.
@@ -2060,12 +2068,23 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range, | |||
nullptr /* result_operand */, true); | |||
if (LIKELY(iter->value != nullptr)) { | |||
iter->value->PinSelf(); | |||
range->AddValueSize(iter->value->size()); | |||
if (range->GetValueSize() > read_options.value_size_soft_limit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be after calling range->MarkKeyDone(iter)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
ASSERT_TRUE(s[15].IsAborted()); | ||
ASSERT_TRUE(s[17].IsAborted()); | ||
|
||
// 6 kv pairs * 3 bytes per value (i.e. 18) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding another call to MultiGet
with just memtable keys to test the abort logic in MemTable::MultiGet
. Also, I suggest adding a few merge values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will add them. Thank you!
15d9fda
to
888550c
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Added new testcases 1. One with Merge operation 2. Read all keys from memory with aborting few. |
@@ -1960,7 +1962,8 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range, | |||
return; | |||
} | |||
uint64_t batch_size = 0; | |||
for (auto iter = file_range.begin(); iter != file_range.end(); ++iter) { | |||
for (auto iter = file_range.begin(); s.ok() && iter != file_range.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's right! Sorry, misread the code.
@@ -929,6 +929,16 @@ void MemTable::MultiGet(const ReadOptions& read_options, MultiGetRange* range, | |||
|
|||
if (found_final_value) { | |||
iter->value->PinSelf(); | |||
range->AddValueSize(iter->value->size()); | |||
if (range->GetValueSize() > read_options.value_size_soft_limit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, should the option name be change to value_size_limit
, since its a hard limit rather than a soft limit.
@@ -1356,6 +1356,13 @@ struct ReadOptions { | |||
// processing a batch | |||
std::chrono::microseconds deadline; | |||
|
|||
// It limits the maximum cumulative value size of the keys in batch while | |||
// reading through MultiGet. Once the cumulative value size exceeds this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment can probably be reworded a little to match the behavior - "cumulative size is not allowed to exceed the limit".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is still a little confusing as it may be interpreted that the cumulative value size of successful keys can exceed value_size_hard_limit
. Maybe we can say "Once a key causes the cumulative size to exceed this hard limit, that key and all remaining keys are returned with status Aborted.".
888550c
to
9d91df6
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
9d91df6
to
33056e2
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple of minor comments.
@@ -1356,6 +1356,13 @@ struct ReadOptions { | |||
// processing a batch | |||
std::chrono::microseconds deadline; | |||
|
|||
// It limits the maximum cumulative value size of the keys in batch while | |||
// reading through MultiGet. Once the cumulative value size exceeds this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is still a little confusing as it may be interpreted that the cumulative value size of successful keys can exceed value_size_hard_limit
. Maybe we can say "Once a key causes the cumulative size to exceed this hard limit, that key and all remaining keys are returned with status Aborted.".
db/version_set.cc
Outdated
@@ -1933,13 +1933,15 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range, | |||
&storage_info_.file_indexer_, user_comparator(), internal_comparator()); | |||
FdWithKeyRange* f = fp.GetNextFile(); | |||
|
|||
Status s = Status::OK(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Default Status
constructor initializes it to OK.
Did we determine that hard limit was better for our internal customers than soft limit? (Soft limit seems more "efficient" in principle because when we have a k-v that crosses the limit, we've already done almost all of the work to look it up, and many applications will likely repeat the lookup in a subsequent call if we throw it away.) |
Internal discussion landed on going with the soft limit. |
33056e2
to
2ce5af6
Compare
Changed the implementation of value_size to soft-limit. |
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. HISTORY.md comment needs to be updated.
HISTORY.md
Outdated
@@ -19,6 +19,7 @@ | |||
### New Feature | |||
* sst_dump to add a new --readahead_size argument. Users can specify read size when scanning the data. Sst_dump also tries to prefetch tail part of the SST files so usually some number of I/Os are saved there too. | |||
* Generate file checksum in SstFileWriter if Options.file_checksum_gen_factory is set. The checksum and checksum function name are stored in ExternalSstFileInfo after the sst file write is finished. | |||
* Add a value_size_hard_limit in read options which limits the cumulative value size of keys read in batches in MultiGet. Once the cumulative value size of found keys exceeds read_options.value_size_hard_limit, all the remaining keys are returned with status Abort without further finding their values. By default the value_size_hard_limit is std::numeric_limits<uint64_t>::max(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change value_size_hard_limit
to value_size_soft_limit
.
Summary: 1. Add a value_size_soft_limit in read options which limits the cumulative value size of keys read in batches. Once the cumulative size exceeds read_options.value_size_soft_limit, all the remaining keys are returned with status Abort without further fetching any key. 2. Add 3 unit test cases. Test Plan: 1. make check -j64 2. Add new unit test cases Reviewers: Subscribers: Tasks: T64971827 Tags:
2ce5af6
to
1a13645
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@akankshamahajan15 merged this pull request in bcefc59. |
Summary: 1. Add a value_size in read options which limits the cumulative value size of keys read in batches. Once the size exceeds read_options.value_size, all the remaining keys are returned with status Abort without further fetching any key.
2. Add a unit test case MultiGetBatchedValueSizeSimple the reads keys from memory and sst files.
Test Plan: 1. make check -j64
2. Add a new unit test case
Reviewers:
Subscribers:
Tasks: T64971827
Tags: