-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast path for detecting unchanged prefix_extractor #9407
Conversation
Summary: A proposed alternate fix to performance regression associated with SliceTransform::AsString, vs. facebook#9401 Test Plan: performance testing TODO
@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this logic is the same as the original. The original checked to see if the input prefix extractor was the same one as the one used in building the table. This new logic checks to see if the input prefix extractor is the same one as used by the last operation.
What happens if you build a table with one extractor and then do multiple operations using a different extractor? It seems like the results could be different with this new code compared to the original.
No it doesn't. known_good_prefix_extractor_instance_id is only assigned non-zero in BlockBasedTable::Open, and only if |
I considered getting fancy with atomics to optimize more cases, but decided to keep it simple. |
@pdillinger has updated the pull request. You must reimport the pull request before landing. |
@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mrambacher pointed out that we can safely do pointer comparison (rather than instance id stuff) if we plumb through |
@pdillinger has updated the pull request. You must reimport the pull request before landing. |
@pdillinger has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change better, though I am worried there are race conditions if the MutableCFOptions.prefix_extractor changes (though these are probably not new).
I am not a fan of storing a reference to the shared_ptr (as opposed to a copy), as I think the reference will cause problems on updates.
Also, can you run and post the same performance test results with this change as the earlier one?
@@ -211,7 +211,8 @@ class ForwardLevelIterator : public InternalIterator { | |||
Status status_; | |||
InternalIterator* file_iter_; | |||
PinnedIteratorsManager* pinned_iters_mgr_; | |||
const SliceTransform* prefix_extractor_; | |||
// Kept alive by ForwardIterator::sv_->mutable_cf_options | |||
const std::shared_ptr<const SliceTransform>& prefix_extractor_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a const std::shared_ptr & or just a shared_ptr? I am thinking that if someone changed the mutable option, the value stored here would also change whereas it would not if it was a copy (and the ownership/keep alive would not appy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, SuperVersion::mutable_cf_options, Version::mutable_cf_options_ and Compaction::mutable_cf_options_ are actually immutable. For this change, they are generally referenced in a context where something is holding on the that owning entity. For example, if an iterator doesn't hold on to a SuperVersion, it can't guarantee other data stays around.
The UAF case I ran into above is an iterator refresh. I should probably look for more similar to that.
Copying shared_ptr in too many places would likely negate most of the performance gains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UAF case I ran into above is an iterator refresh. I should probably look for more similar to that.
I inspected ArenaWrappedDBIter::Refresh and it looks safe. It re-loads everything including prefix_extractor after changing SuperVersion.
file_options.compaction_readahead_size, | ||
allow_unprepared_value); | ||
result = table_reader->NewIterator( | ||
options, prefix_extractor.get(), arena, skip_filters, caller, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a potential bug to me (not necessarily new/related to this) but it seems like the prefix_extractor can go out of scope while this operation is active potentially causing a crash. What happens if an Iterator is open and the prefix_extractor is changed through a mutable option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to come from a SuperVersion, Version, or Compaction
@@ -451,7 +452,7 @@ Status TableCache::Get(const ReadOptions& options, | |||
} | |||
if (s.ok()) { | |||
get_context->SetReplayLog(row_cache_entry); // nullptr if no cache. | |||
s = t->Get(options, k, get_context, prefix_extractor, skip_filters); | |||
s = t->Get(options, k, get_context, prefix_extractor.get(), skip_filters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the same race as above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Owned & immutable by Version for life of Get
@@ -1011,7 +1012,7 @@ class LevelIterator final : public InternalIterator { | |||
// `prefix_extractor_` may be non-null even for total order seek. Checking | |||
// this variable is not the right way to identify whether prefix iterator | |||
// is used. | |||
const SliceTransform* prefix_extractor_; | |||
const std::shared_ptr<const SliceTransform>& prefix_extractor_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as before. Should this be a ref or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has several other const ref fields. Best I can tell, the lifetime of LevelIterator is bounded within a function call (Version::OverlapWithLevelIterator) or to a Compaction (VersionSet::MakeInputIterator).
@@ -564,7 +564,7 @@ Status VersionEditHandler::LoadTables(ColumnFamilyData* cfd, | |||
cfd->internal_stats(), | |||
version_set_->db_options_->max_file_opening_threads, | |||
prefetch_index_and_filter_in_cache, is_initial_load, | |||
cfd->GetLatestMutableCFOptions()->prefix_extractor.get(), | |||
cfd->GetLatestMutableCFOptions()->prefix_extractor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a case of using prefix_extractor from ColumnFamilyData, but this appears to only be called under DB::Open so not actually mutable
@@ -544,7 +544,7 @@ class Repairer { | |||
InternalIterator* iter = table_cache_->NewIterator( | |||
ropts, file_options_, cfd->internal_comparator(), t->meta, | |||
nullptr /* range_del_agg */, | |||
cfd->GetLatestMutableCFOptions()->prefix_extractor.get(), | |||
cfd->GetLatestMutableCFOptions()->prefix_extractor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a case of using prefix_extractor from ColumnFamilyData, but under Repairer, it should not be mutable (from SetOptions).
@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is potentially dangerous how the prefix extractor is being passed/stored but trust that your analysis is complete and accurate.
I agree there's non-zero risk, and I don't like having the internal APIs in a precarious state either. There is an up side however: if we are to stand by SetOptions as a feature, these changes make it easier to surface UAF bugs involving SetOptions on prefix_extractor because more of them can show without SetOptions--like the one I fixed here in ForwardIterator. |
Summary: Fixes a major performance regression in 6.26, where
extra CPU is spent in SliceTransform::AsString when reads involve
a prefix_extractor (Get, MultiGet, Seek). Common case performance
is now better than 6.25.
This change creates a "fast path" for verifying that the current prefix
extractor is unchanged and compatible with what was used to
generate a table file. This fast path detects the common case by
pointer comparison on the current prefix_extractor and a "known
good" prefix extractor (if applicable) that is saved at the time the
table reader is opened. The "known good" prefix extractor is saved
as another shared_ptr copy (in an existing field, however) to ensure
the pointer is not recycled.
When the prefix_extractor has changed to a different instance but
same compatible configuration (rare, odd), performance is still a
regression compared to 6.25, but this is likely acceptable because
of the oddity of such a case. The performance of incompatible
prefix_extractor is essentially unchanged.
Also fixed a minor case (ForwardIterator) where a prefix_extractor
could be used via a raw pointer after being freed as a shared_ptr,
if replaced via SetOptions.
Test Plan:
Performance
Populate DB with
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=10000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=12
Running head-to-head comparisons simultaneously with
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -use_existing_db -readonly -benchmarks=seekrandom -num=10000000 -duration=20 -disable_wal=1 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=12
Below each is compared by ops/sec vs. baseline which is version 6.25 (multiple baseline runs because of variable machine load)
v6.26: 4833 vs. 6698 (<- major regression!)
v6.27: 4737 vs. 6397 (still)
New: 6704 vs. 6461 (better than baseline in common case)
Disabled fastpath: 4843 vs. 6389 (e.g. if prefix extractor instance changes but is still compatible)
Changed prefix size (no usable filter) in new: 787 vs. 5927
Changed prefix size (no usable filter) in new & baseline: 773 vs. 784