Eliminate unnecessary (slow) block cache Ref()ing in MultiGet #9899

pdillinger · 2022-04-23T05:48:26Z

Summary: When MultiGet() determines that multiple query keys can be
served by examining the same data block in block cache (one Lookup()),
each PinnableSlice referring to data in that data block needs to hold
on to the block in cache so that they can be released at arbitrary
times by the API user. Historically this is accomplished with extra
calls to Ref() on the Handle from Lookup(), with each PinnableSlice
cleanup calling Release() on the Handle, but this creates extra
contention on the block cache for the extra Ref()s and Release()es,
especially because they hit the same cache shard repeatedly.

In the case of merge operands (possibly more cases?), the problem was
compounded by doing an extra Ref()+eventual Release() for each merge
operand for a key reusing a block (which could be the same key!), rather
than one Ref() per key. (Note: the non-shared case with biter was
already one per key.)

This change optimizes MultiGet not to rely on these extra, contentious
Ref()+Release() calls by instead, in the shared block case, wrapping
the cache Release() cleanup in a refcounted object referenced by the
PinnableSlices, such that after the last wrapped reference is released,
the cache entry is Release()ed. Relaxed atomic refcounts should be
much faster than mutex-guarded Ref() and Release(), and much less prone
to a performance cliff when MultiGet() does a lot of block sharing.

Note that I did not use std::shared_ptr, because that would require an
extra indirection object (shared_ptr itself new/delete) in order to
associate a ref increment/decrement with a Cleanable cleanup entry. (If
I assumed it was the size of two pointers, I could do some hackery to
make it work without the extra indirection, but that's too fragile.)

Some details:

Fixed (removed) extra block cache tracing entries in cases of cache
entry reuse in MultiGet, but it's likely that in some other cases traces
are missing (XXX comment inserted)
Moved existing implementations for cleanable.h from iterator.cc to
new cleanable.cc
Improved API comments on Cleanable
Added a public SharedCleanablePtr class to cleanable.h in case others
could benefit from the same pattern (potentially many Cleanables and/or
smart pointers referencing a shared Cleanable)
Add a typedef for MultiGetContext::Mask
Some variable renaming for clarity

Test Plan:
Added unit tests for SharedCleanablePtr.

Greatly enhanced ability of existing tests to detect cache use-after-free.

Release PinnableSlices from MultiGet as they are read rather than in
bulk (in db_test_util wrapper).
In ASAN build, default to using a trivially small LRUCache for block_cache
so that entries are immediately erased when unreferenced. (Updated two
tests that depend on caching.) New ASAN testsuite running time seems
OK to me.

If I introduce a bug into my implementation where we skip the shared
cleanups on block reuse, ASAN detects the bug in
db_basic_test *MultiGet*. If I remove either of the above testing
enhancements, the bug is not detected.

Consider for follow-up work: manipulate or randomize ordering of
PinnableSlice use and release from MultiGet db_test_util wrapper. But in
typical cases, natural ordering gives pretty good functional coverage.

Performance test:
In the extreme (but possible) case of MultiGetting the same or adjacent keys
in a batch, throughput can improve by an order of magnitude.
./db_bench -benchmarks=multireadrandom -db=/dev/shm/testdb -readonly -num=5 -duration=10 -threads=20 -multiread_batched -batch_size=200
Before ops/sec, num=5: 1,384,394
Before ops/sec, num=500: 6,423,720
After ops/sec, num=500: 10,658,794
After ops/sec, num=5: 16,027,257

Also note that previously, with high parallelism, having query keys
concentrated in a single block was worse than spreading them out a bit. Now
concentrated in a single block is faster than spread out, which is hopefully
consistent with natural expectation.

Random query performance: with num=1000000, over 999 x 10s runs running before & after simultaneously (each -threads=12):
Before: multireadrandom [AVG 999 runs] : 1088699 (± 7344) ops/sec; 120.4 (± 0.8 ) MB/sec
After: multireadrandom [AVG 999 runs] : 1090402 (± 7230) ops/sec; 120.6 (± 0.8 ) MB/sec
Possibly better, possibly in the noise.

Summary: When MultiGet() determines that multiple query keys can be served by examining the same data block in block cache (one Lookup()), each PinnableSlice referring to data in that data block needs to hold on to the block in cache so that they can be released at arbitrary times by the API user. Historically this is accomplished with extra calls to Ref() on the Handle from Lookup(), with each PinnableSlice cleanup calling Release() on the Handle, but this creates extra contention on the block cache for the extra Ref()s and Release()es, especially because they hit the same cache shard repeatedly. In the case of merge operands (possibly more cases?), the problem was compounded by doing an extra Ref()+eventual Release() for each merge operand for a key reusing a block (which could be the same key!), rather than one Ref() per key. (Note: the non-shared case with `biter` was already one per key.) This change optimizes MultiGet not to rely on these extra, contentious Ref()+Release() calls by instead, in the shared block case, wrapping the cache Release() cleanup in a refcounted object referenced by the PinnableSlices, such that after the last wrapped reference is released, the cache entry is Release()ed. Relaxed atomic refcounts should be much faster than mutex-guarded Ref() and Release(), and much less prone to a performance cliff when MultiGet() does a lot of block sharing. Note that I did not use std::shared_ptr, because that would require an extra indirection object (shared_ptr itself new/delete) in order to associate a ref increment/decrement with a Cleanable cleanup entry. (If I assumed it was the size of two pointers, I could do some hackery to make it work without the extra indirection, but that's too fragile.) Some details: * Moved existing implementations for cleanable.h from iterator.cc to new cleanable.cc * Improved API comments on Cleanable * Added a public SharedCleanablePtr class to cleanable.h in case others could benefit from the same pattern (potentially many Cleanables and/or smart pointers referencing a shared Cleanable) * Add a typedef for MultiGetContext::Mask * Some variable renaming for clarity Test Plan: existing tests, with ASAN etc. TODO? I'm considering adding some more tests and maybe doing performance test

…in_multiget

ajkr · 2022-04-24T00:39:49Z

util/cleanable.cc

+  return ptr_;  // implicit upcast
+}
+
+void SharedCleanablePtr::RegisterCopyWith(Cleanable *target) {


I'll try using this with the GetMergeOperands() PinnableSlices. It'd clean up https://github.com/ajkr/rocksdb/blob/8fc4fb31ad1793ec3ed43209c2ca274bd6fa6ff4/db/db_impl/db_impl.cc#L1993-L2011 (prototype code).

ajkr · 2022-04-24T02:57:18Z

table/block_based/block_based_table_reader.cc

+        Cleanable* const value_pinner = biter;
+        if (biter->IsValuePinned()) {


Should value_pinner be nullptr when !biter->IsValuePinned()?

Thanks for catching!

facebook-github-bot · 2022-04-25T20:44:27Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-04-25T22:16:31Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-04-25T22:29:37Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

anand1976

Great catch! The fix LGTM. HISTORY.md needs to be updated.

anand1976 · 2022-04-26T05:42:52Z

table/block_based/block_based_table_reader.cc

+            assert(biter->HasCleanups());
+            shared_cleanable.Allocate();
+            biter->DelegateCleanupsTo(&*shared_cleanable);
+            shared_cleanable.RegisterCopyWith(biter);


I was a bit confused about how this PR works until I looked closely at these 2 lines. Essentially, we're swapping the cleanup functions of biter and shared_cleanable (sort of). The cleanup of biter (UnrefWrapper) is then delegated to the value PinnableSlice. The shared_cleanable cleanup function (ForceReleaseCachedEntry) releases the cache handle.

I'll add to HISTORY.md and some comments here.

facebook-github-bot · 2022-04-26T15:53:42Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-04-26T15:54:31Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-04-26T18:26:41Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-04-26T18:36:10Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-04-26T20:12:51Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-04-26T20:18:30Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: There was a bug in the MultiGet enhancement in facebook#9899 with data block hash index, which was not caught because data block hash index was never added to stress tests. This change fixes both issues. Fixes facebook#10186 I intend to pick this into the 7.4.0 release candidate Test Plan: Failure quickly reproduces in crash test with kDataBlockBinaryAndHash, and does not seem to with the fix. Reproducing the failure with a unit test I believe would be too tricky and fragile to be worthwhile.

Summary: There was a bug in the MultiGet enhancement in #9899 with data block hash index, which was not caught because data block hash index was never added to stress tests. This change fixes both issues. Fixes #10186 I intend to pick this into the 7.4.0 release candidate Pull Request resolved: #10220 Test Plan: Failure quickly reproduces in crash test with kDataBlockBinaryAndHash, and does not seem to with the fix. Reproducing the failure with a unit test I believe would be too tricky and fragile to be worthwhile. Reviewed By: anand1976 Differential Revision: D37315647 Pulled By: pdillinger fbshipit-source-id: 9f648265bba867275edc752f7a56611a59401cba

…#10220) Summary: There was a bug in the MultiGet enhancement in facebook#9899 with data block hash index, which was not caught because data block hash index was never added to stress tests. This change fixes both issues. Fixes facebook#10186 I intend to pick this into the 7.4.0 release candidate Pull Request resolved: facebook#10220 Test Plan: Failure quickly reproduces in crash test with kDataBlockBinaryAndHash, and does not seem to with the fix. Reproducing the failure with a unit test I believe would be too tricky and fragile to be worthwhile. Reviewed By: anand1976 Differential Revision: D37315647 Pulled By: pdillinger fbshipit-source-id: 9f648265bba867275edc752f7a56611a59401cba

pdillinger requested a review from anand1976 April 23, 2022 05:48

facebook-github-bot added the CLA Signed label Apr 23, 2022

pdillinger added 2 commits April 23, 2022 08:08

Missing file

2e2377c

Merge branch 'main' of github.com:facebook/rocksdb into no_extra_ref_…

ad42f08

…in_multiget

ajkr reviewed Apr 24, 2022

View reviewed changes

pdillinger added 4 commits April 25, 2022 10:57

Re-instate null value_pinner case; other fixes

acaf22c

Add testing

439e93f

Missing file 'make format'

f9fca75

Remove unsafe assert

c97e04f

Fix some lints

027c31c

Merge remote-tracking branch 'origin/main' into no_extra_ref_in_multiget

a2ccf9c

anand1976 approved these changes Apr 26, 2022

View reviewed changes

pdillinger added 2 commits April 26, 2022 08:52

Add to HISTORY, more comments

989530a

Merge remote-tracking branch 'origin/main' into no_extra_ref_in_multiget

f593cc5

More cleanup and testing of SharedCleanablePtr

e534bc8

Suppress clang-analyze FP

eef05bb

facebook-github-bot closed this in 9d0cae7 Apr 27, 2022

ronag mentioned this pull request Jun 16, 2022

shared_cleanable.get() assert #10186

Closed

pdillinger mentioned this pull request Jun 21, 2022

Add data block hash index to crash test, fix MultiGet issue #10220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate unnecessary (slow) block cache Ref()ing in MultiGet #9899

Eliminate unnecessary (slow) block cache Ref()ing in MultiGet #9899

pdillinger commented Apr 23, 2022 •

edited

Loading

ajkr Apr 24, 2022

ajkr Apr 24, 2022

pdillinger Apr 25, 2022

facebook-github-bot commented Apr 25, 2022

facebook-github-bot commented Apr 25, 2022

facebook-github-bot commented Apr 25, 2022

anand1976 left a comment

anand1976 Apr 26, 2022

pdillinger Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

		Cleanable* const value_pinner = biter;
		if (biter->IsValuePinned()) {

Eliminate unnecessary (slow) block cache Ref()ing in MultiGet #9899

Eliminate unnecessary (slow) block cache Ref()ing in MultiGet #9899

Conversation

pdillinger commented Apr 23, 2022 • edited Loading

ajkr Apr 24, 2022

Choose a reason for hiding this comment

ajkr Apr 24, 2022

Choose a reason for hiding this comment

pdillinger Apr 25, 2022

Choose a reason for hiding this comment

facebook-github-bot commented Apr 25, 2022

facebook-github-bot commented Apr 25, 2022

facebook-github-bot commented Apr 25, 2022

anand1976 left a comment

Choose a reason for hiding this comment

anand1976 Apr 26, 2022

Choose a reason for hiding this comment

pdillinger Apr 26, 2022

Choose a reason for hiding this comment

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

facebook-github-bot commented Apr 26, 2022

pdillinger commented Apr 23, 2022 •

edited

Loading