Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache fragmented range tombstone list for mutable memtables #10547

Closed
wants to merge 7 commits into from

Conversation

cbi42
Copy link
Member

@cbi42 cbi42 commented Aug 21, 2022

Summary: Each read from memtable used to read and fragment all the range tombstones into a FragmentedRangeTombstoneList. #10380 improved the inefficient here by caching a FragmentedRangeTombstoneList with each immutable memtable. This PR extends the caching to mutable memtables. The fragmented range tombstone can be constructed in either read (This PR) or write path (#10584). With both implementation, each DeleteRange() will invalidate the cache, and the difference is where the cache is re-constructed.CoreLocalArray is used to store the cache with each memtable so that multi-threaded reads can be efficient. More specifically, each core will have a shared_ptr to a shared_ptr pointing to the current cache. Each read thread will only update the reference count in its core-local shared_ptr, and this is only needed when reading from mutable memtables.

The choice between write path and read path is not an easy one: they are both improvement compared to no caching in the current implementation, but they favor different operations and could cause regression in the other operation (read vs write). The write path caching in (#10584) leads to a cleaner implementation, but I chose the read path caching here to avoid significant regression in write performance when there is a considerable amount of range tombstones in a single memtable (the number from the benchmark below suggests >1000 with concurrent writers). Note that even though the fragmented range tombstone list is only constructed in DeleteRange() operations, it could block other writes from proceeding, and hence affects overall write performance.

Test plan:

  • TestGet() in stress test is updated in Update TestGet() to verify against expected state #10553 to compare Get() result against expected state: ./db_stress_branch --readpercent=57 --prefixpercent=4 --writepercent=25 -delpercent=5 --iterpercent=5 --delrangepercent=4
  • Perf benchmark: tested read and write performance where a memtable has 0, 1, 10, 100 and 1000 range tombstones.
./db_bench --benchmarks=fillrandom,readrandom --writes_per_range_tombstone=200 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=200000 --reads=100000 --disable_auto_compactions --max_num_range_tombstones=1000

Write perf regressed since the cost of constructing fragmented range tombstone list is shifted from every read to a single write. 6cbe5d8 is included in the last column as a reference to see performance impact on multi-thread reads if CoreLocalArray is not used.

micros/op averaged over 5 runs: first 4 columns are for fillrandom, last 4 columns are for readrandom.

fillrandom main write path caching read path caching memtable V3 (#10308) readrandom main write path caching read path caching memtable V3
0 6.35 6.15 5.82 6.12 2.24 2.26 2.03 2.07
1 5.99 5.88 5.77 6.28 2.65 2.27 2.24 2.5
10 6.15 6.02 5.92 5.95 5.15 2.61 2.31 2.53
100 5.95 5.78 5.88 6.23 28.31 2.34 2.45 2.94
100 25 threads 52.01 45.85 46.18 47.52 35.97 3.34 3.34 3.56
1000 6.0 7.07 5.98 6.08 333.18 2.86 2.7 3.6
1000 25 threads 52.6 148.86 79.06 45.52 473.49 3.66 3.48 4.38
  • Benchmark performance ofreadwhilewriting from Optionally issue DeleteRange in *whilewriting benchmarks #10552, 100 range tombstones are written: ./db_bench --benchmarks=readwhilewriting --writes_per_range_tombstone=500 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=100000 --reads=500000 --disable_auto_compactions --max_num_range_tombstones=10000 --finish_after_writes

readrandom micros/op:

main write path caching read path caching memtable V3
single thread 48.28 1.55 1.52 1.96
25 threads 64.3 2.55 2.67 2.64

@cbi42 cbi42 changed the title Cache tombstone read path Cache fragmented range tombstone list for live memtables Aug 21, 2022
@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cbi42
Copy link
Member Author

cbi42 commented Aug 21, 2022

TODO: the stress test change should be a separate PR.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@cbi42 cbi42 changed the title Cache fragmented range tombstone list for live memtables Cache fragmented range tombstone list for mutable memtables Aug 22, 2022
@cbi42 cbi42 removed the WIP Work in progress label Aug 22, 2022
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cbi42 cbi42 requested a review from ajkr August 23, 2022 03:00
facebook-github-bot pushed a commit that referenced this pull request Aug 23, 2022
Summary:
Optionally issue DeleteRange in `*whilewriting` benchmarks. This happens in `BGWriter` and uses similar logic as in `DoWrite` to issue DeleteRange operations. I added this when I was benchmarking #10547, but this should be an independent PR.

Pull Request resolved: #10552

Test Plan: ran some benchmarks with various delete range options, e.g. `./db_bench --benchmarks=readwhilewriting --writes_per_range_tombstone=100 --writes=200000 --reads=1000000 --disable_auto_compactions --max_num_range_tombstones=10000`

Reviewed By: ajkr

Differential Revision: D38927020

Pulled By: cbi42

fbshipit-source-id: 31ee20cb8127f7173f0816ea0cc2a204ec02aad6
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

db/range_tombstone_fragmenter.h Outdated Show resolved Hide resolved
db/memtable.cc Outdated Show resolved Hide resolved
db/memtable.cc Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Sep 15, 2022
Summary:
fix a data race introduced in #10547 (P5295241720), first reported by pdillinger. The race is between the `std::atomic_load_explicit` in NewRangeTombstoneIteratorInternal and the `std::atomic_store_explicit` in MemTable::Add() that operate on `cached_range_tombstone_`. P5295241720 shows that `atomic_store_explicit` initializes some mutex which `atomic_load_explicit` could be trying to call `lock()` on at the same time. This fix moves the initialization to memtable constructor.

Pull Request resolved: #10680

Test Plan: `USE_CLANG=1 COMPILE_WITH_TSAN=1 make -j24 whitebox_crash_test`

Reviewed By: ajkr

Differential Revision: D39528696

Pulled By: cbi42

fbshipit-source-id: ee740841044438e18ad2b8ea567444dd542dd8e2
ajkr added a commit to ajkr/rocksdb that referenced this pull request Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants