Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow MultiGet users to limit cumulative value size #6826

Closed

Conversation

akankshamahajan15
Copy link
Contributor

Summary: 1. Add a value_size in read options which limits the cumulative value size of keys read in batches. Once the size exceeds read_options.value_size, all the remaining keys are returned with status Abort without further fetching any key.
2. Add a unit test case MultiGetBatchedValueSizeSimple the reads keys from memory and sst files.

Test Plan: 1. make check -j64
2. Add a new unit test case

Reviewers:

Subscribers:

Tasks: T64971827

Tags:

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

// It limits the maximum cumulative value size of the keys in batch while
// reading through MultiGet. Once the cumulative value size exceeds this value
// then all the remaining keys are returned with status Aborted.
// Defualt: 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep 0 as a special value, this comment needs to mention that.

Although our internal checks would be more verbose, I suggest instead using std::numeric_limits<uint64_t>::max() so that the behavior is continuous across the range of settings. (Though this case isn't as weird as -1 for files_size_error_margin.)

Maybe a more descriptive name like "value_size_soft_limit"? I could foresee us adding a hard "value_size_limit" that would apply to Get and MultiGet. A soft limit (allowed to exceed but none after that) obviously doesn't make sense for Get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Yes it makes sense to be consistent with different range of settings. I will make the changes.

db/memtable.cc Outdated
if (read_options.value_size > 0) {
if (range->GetValueSize() + iter->value->size() >
read_options.value_size) {
while (iter != temp_range.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range may not cover all keys in the MultiGet batch, just the ones that could be present in memtable. It would be better to move the logic to abort the request and set the status of remaining keys in MultiGetContext, so it can set status for all remaining keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using MultiGetContext::value_size to hold the value_size of keys processed it now. VersionSet::MultiGet then checks the MultiGetContext::value_size (updated by MemTable::MultiGet) and then sets the remaining keys status to Abort if size exceeds. So this way memtable keys and sst file keys both are processed.

But it makes sense to not even call VersionSet::MultiGet if size exceeds and MemTable::MultiGet should update the remaining keys. So i will change the logic to do so. Thanks.

if (read_options.value_size > 0) {
if (file_range.GetValueSize() + iter->value->size() >
read_options.value_size) {
while (iter != file_range.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

db/db_impl/db_impl.cc Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Comment on lines +935 to +941
for (auto range_iter = range->begin(); range_iter != range->end();
++range_iter) {
range->MarkKeyDone(range_iter);
*(range_iter->s) = Status::Aborted();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code iterates over all remaining keys in range and set their status to Aborted. I added here because it was simple to add here by iterating whereas in MultiGetContext, I needed to change the index_ as well and code got complicated there.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@pdillinger
Copy link
Contributor

Looks good in relation to my earlier review comments. :) Deferring to Anand for final review.

db/db_impl/db_impl.cc Show resolved Hide resolved
}

// Set all remaining keys status to abort
if (keys_left && s != Status::TimedOut() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just combine this block with the one in lines 2357-2364 as the logic is the same.

range->MarkKeyDone(range_iter);
*(range_iter->s) = Status::Aborted();
}
iter = file_range.end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set s = Status::Aborted() and use that to break out of the nested loops? Or maybe checking for file_range.empty() would work, like in line 2045.

db/version_set.cc Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@akankshamahajan15
Copy link
Contributor Author

Addressed comments

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Some comments inline.

@@ -1960,7 +1962,8 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range,
return;
}
uint64_t batch_size = 0;
for (auto iter = file_range.begin(); iter != file_range.end(); ++iter) {
for (auto iter = file_range.begin(); s.ok() && iter != file_range.end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: No need to check Status here since there is a break on line 2040.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The break on line 2040 exits the outer loop (for different sst files) and this one is for inner loop that iterates over keys in one sst file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's right! Sorry, misread the code.

@@ -929,6 +929,16 @@ void MemTable::MultiGet(const ReadOptions& read_options, MultiGetRange* range,

if (found_final_value) {
iter->value->PinSelf();
range->AddValueSize(iter->value->size());
if (range->GetValueSize() > read_options.value_size_soft_limit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this block be after range->MarkKeyDone(iter) is called? Otherwise, we'll discard the value for this key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not marking the key done whose size exceeds the value_size here so that its status can be set to Abort in the iteration of marking remaining keys to Aborted. But its value is still added to the range->AddValueSize.

Logic: So for eg if 4th keys value exceeds the value_size limit then its value is still added to make the range->GetValueSize() exceeds but its status marked Abort. This way in DBImpl::MultiGet, I know that value_size_limit has already exceeded as range->GetValueSize() > read_options.value_size_soft_limit. If I don't exceed the range->value_size_ then I can't exit from the loop in the DBMultiGet::MultiGet as I don't know if limit has exceeded or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should the option name be change to value_size_limit, since its a hard limit rather than a soft limit.

@@ -2060,12 +2068,23 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range,
nullptr /* result_operand */, true);
if (LIKELY(iter->value != nullptr)) {
iter->value->PinSelf();
range->AddValueSize(iter->value->size());
if (range->GetValueSize() > read_options.value_size_soft_limit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be after calling range->MarkKeyDone(iter)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

ASSERT_TRUE(s[15].IsAborted());
ASSERT_TRUE(s[17].IsAborted());

// 6 kv pairs * 3 bytes per value (i.e. 18)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding another call to MultiGet with just memtable keys to test the abort logic in MemTable::MultiGet. Also, I suggest adding a few merge values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will add them. Thank you!

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@akankshamahajan15
Copy link
Contributor Author

Added new testcases 1. One with Merge operation 2. Read all keys from memory with aborting few.

@@ -1960,7 +1962,8 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range,
return;
}
uint64_t batch_size = 0;
for (auto iter = file_range.begin(); iter != file_range.end(); ++iter) {
for (auto iter = file_range.begin(); s.ok() && iter != file_range.end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's right! Sorry, misread the code.

@@ -929,6 +929,16 @@ void MemTable::MultiGet(const ReadOptions& read_options, MultiGetRange* range,

if (found_final_value) {
iter->value->PinSelf();
range->AddValueSize(iter->value->size());
if (range->GetValueSize() > read_options.value_size_soft_limit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should the option name be change to value_size_limit, since its a hard limit rather than a soft limit.

@@ -1356,6 +1356,13 @@ struct ReadOptions {
// processing a batch
std::chrono::microseconds deadline;

// It limits the maximum cumulative value size of the keys in batch while
// reading through MultiGet. Once the cumulative value size exceeds this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment can probably be reworded a little to match the behavior - "cumulative size is not allowed to exceed the limit".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is still a little confusing as it may be interpreted that the cumulative value size of successful keys can exceed value_size_hard_limit. Maybe we can say "Once a key causes the cumulative size to exceed this hard limit, that key and all remaining keys are returned with status Aborted.".

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple of minor comments.

@@ -1356,6 +1356,13 @@ struct ReadOptions {
// processing a batch
std::chrono::microseconds deadline;

// It limits the maximum cumulative value size of the keys in batch while
// reading through MultiGet. Once the cumulative value size exceeds this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is still a little confusing as it may be interpreted that the cumulative value size of successful keys can exceed value_size_hard_limit. Maybe we can say "Once a key causes the cumulative size to exceed this hard limit, that key and all remaining keys are returned with status Aborted.".

@@ -1933,13 +1933,15 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range,
&storage_info_.file_indexer_, user_comparator(), internal_comparator());
FdWithKeyRange* f = fp.GetNextFile();

Status s = Status::OK();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Default Status constructor initializes it to OK.

@pdillinger
Copy link
Contributor

Did we determine that hard limit was better for our internal customers than soft limit? (Soft limit seems more "efficient" in principle because when we have a k-v that crosses the limit, we've already done almost all of the work to look it up, and many applications will likely repeat the lookup in a subsequent call if we throw it away.)

@pdillinger
Copy link
Contributor

Did we determine that hard limit was better for our internal customers than soft limit? (Soft limit seems more "efficient" in principle because when we have a k-v that crosses the limit, we've already done almost all of the work to look it up, and many applications will likely repeat the lookup in a subsequent call if we throw it away.)

Internal discussion landed on going with the soft limit.

@akankshamahajan15
Copy link
Contributor Author

Changed the implementation of value_size to soft-limit.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. HISTORY.md comment needs to be updated.

HISTORY.md Outdated
@@ -19,6 +19,7 @@
### New Feature
* sst_dump to add a new --readahead_size argument. Users can specify read size when scanning the data. Sst_dump also tries to prefetch tail part of the SST files so usually some number of I/Os are saved there too.
* Generate file checksum in SstFileWriter if Options.file_checksum_gen_factory is set. The checksum and checksum function name are stored in ExternalSstFileInfo after the sst file write is finished.
* Add a value_size_hard_limit in read options which limits the cumulative value size of keys read in batches in MultiGet. Once the cumulative value size of found keys exceeds read_options.value_size_hard_limit, all the remaining keys are returned with status Abort without further finding their values. By default the value_size_hard_limit is std::numeric_limits<uint64_t>::max().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change value_size_hard_limit to value_size_soft_limit.

Summary: 1. Add a value_size_soft_limit in read options which limits the cumulative value size of keys read in batches. Once the cumulative size exceeds read_options.value_size_soft_limit, all the remaining keys are returned with status Abort without further fetching any key.
2. Add 3 unit test cases.

Test Plan: 1. make check -j64
	   2. Add new unit test cases

Reviewers:

Subscribers:

Tasks: T64971827

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 merged this pull request in bcefc59.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants