Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide support for subcompactions with user-defined timestamps #10344

Conversation

akankshamahajan15
Copy link
Contributor

@akankshamahajan15 akankshamahajan15 commented Jul 12, 2022

Summary: The subcompaction logic currently picks file boundaries as subcompaction boundaries. This is not compatible with user-defined timestamps because of two issues.
Issue1: ReadOptions.iterate_lower_bound and ReadOptions.iterate_upper_bound contains timestamps which results in assertion failure as BlockBasedTableIterator expects bounds to be without timestamps. As result, because of wrong comparison end key is returned as user_key resulting in assertion failure.
Issue2: Since it might result in two keys that only differ by user timestamp getting processed by two different subcompactions (and thus two different CompactionIterator state machines), which in turn can cause data correction issues.

This PR provide support to reenable subcompactions with user-defined timestamps.

Test Plan: Added new unit test

  • Without fix for Issue1 unit test MultipleSubCompactions fails with error:
db_with_timestamp_compaction_test: ./db/compaction/clipping_iterator.h:247: void rocksdb::ClippingIterat│
or::AssertBounds(): Assertion `!valid_ || !end_ || cmp_->Compare(key(), *end_) < 0' failed.           
Received signal 6 (Aborted)                                                                             │
#0   /usr/local/fbcode/platform009/lib/libc.so.6(gsignal+0x100) [0x7f8fbbbfe530] db_with_timestamp_compaction_test: ./db/compaction/clipping_iterator.h:247: void rocksdb::ClippingIterator::AssertBounds(): Assertion `!valid_ || !end_ || cmp_->Compare(key(), *end_) < 0' failed. 
Aborted (core dumped)

Ran stress test
make crash_test_with_ts -j32

@akankshamahajan15 akankshamahajan15 force-pushed the user_defined_timestamp branch 2 times, most recently from 082b3b6 to f9c357a Compare July 27, 2022 20:51
@akankshamahajan15 akankshamahajan15 changed the title [WIP] Support subcompactions with user-defined timestamps Fix boundaries for subcompactions with user-defined timestamps Jul 27, 2022
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@akankshamahajan15 akankshamahajan15 changed the title Fix boundaries for subcompactions with user-defined timestamps Provide support for subcompactions with user-defined timestamps Jul 28, 2022
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akankshamahajan15 for the PR! A few minor comments.

It would be good if we can add a comment to ReadOptions::iterate_upper_bound and ReadOptions::iterate_lower_bound saying that they do not include user-defined timestamp.

db/compaction/compaction_job.cc Show resolved Hide resolved
db/compaction/compaction_job.cc Outdated Show resolved Hide resolved
db/compaction/compaction_job.cc Outdated Show resolved Hide resolved
db/compaction/compaction_job.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor comments. Otherwise, LGTM.
I don't think this PR has a risk of perf regression, but you can use db_bench waitforcompact benchmark and verify.

db/db_with_timestamp_compaction_test.cc Show resolved Hide resolved
db/db_with_timestamp_compaction_test.cc Outdated Show resolved Hide resolved
@akankshamahajan15
Copy link
Contributor Author

A few more minor comments. Otherwise, LGTM. I don't think this PR has a risk of perf regression, but you can use db_bench waitforcompact benchmark and verify.

For db_bench do I need to pass any other argument

 ./db_bench -db=/tmp/rocksdb_bench_test  -benchmarks="waitforcompaction"
Set seed to 1659117620245278 because --seed was 0
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
Integrated BlobDB: blob cache disabled
RocksDB:    version 7.6.0
Date:       Fri Jul 29 11:00:20 2022
CPU:        32 * Intel Xeon Processor (Skylake)
CPUCache:   16384 KB
Keys:       16 bytes each (+ 0 bytes user-defined timestamp)
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
waitforcompaction(/tmp/rocksdb_bench_test): started
waitforcompaction(/tmp/rocksdb_bench_test): finished
waitforcompaction(/tmp/rocksdb_bench_test): started
waitforcompaction(/tmp/rocksdb_bench_test): finished

It doesn't print the time taken.

Summary: The subcompaction logic currently picks file boundaries as
subcompaction boundaries. This is not compatible with user-defined
timestamps, as ReadOptions.iterate_lower_bound and ReadOptions.iterate_upper_bound
contains timestamps.

Test Plan: Added new unit test
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@riversand963
Copy link
Contributor

A few more minor comments. Otherwise, LGTM. I don't think this PR has a risk of perf regression, but you can use db_bench waitforcompact benchmark and verify.

For db_bench do I need to pass any other argument

 ./db_bench -db=/tmp/rocksdb_bench_test  -benchmarks="waitforcompaction"
Set seed to 1659117620245278 because --seed was 0
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
Integrated BlobDB: blob cache disabled
RocksDB:    version 7.6.0
Date:       Fri Jul 29 11:00:20 2022
CPU:        32 * Intel Xeon Processor (Skylake)
CPUCache:   16384 KB
Keys:       16 bytes each (+ 0 bytes user-defined timestamp)
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
waitforcompaction(/tmp/rocksdb_bench_test): started
waitforcompaction(/tmp/rocksdb_bench_test): finished
waitforcompaction(/tmp/rocksdb_bench_test): started
waitforcompaction(/tmp/rocksdb_bench_test): finished

It doesn't print the time taken.

hmm, how about benchmark "compactall".

TEST_TMPDIR=/dev/shm/ ./db_bench -disable_auto_compactions=1 -benchmarks=fillrandom -threads=32
TEST_TMPDIR=/dev/shm/ -disable_auto_compactions=1 -use_existing_db=1 -benchmarks=compactall -subcompactions=3
``

Benchmark compactall does not report elapsed time either, but you can either add some code or simply use linux `time` command?

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@akankshamahajan15
Copy link
Contributor Author

akankshamahajan15 commented Jul 31, 2022

Db_bench:
Time in Nanos
Without changes:

TEST_TMPDIR=/dev/shm/ ./db_bench -disable_auto_compactions=1 -benchmarks=fillrandom -threads=32
TEST_TMPDIR=/dev/shm/ ./db_bench --disable_auto_compactions=1 -use_existing_db=1 -benchmarks=compactall -subcompactions=3
Set seed to 1659290798535829 because --seed was 0
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
Integrated BlobDB: blob cache disabled
RocksDB:    version 7.6.0
Date:       Sun Jul 31 11:06:40 2022
CPU:        32 * Intel Xeon Processor (Skylake)
CPUCache:   16384 KB
Keys:       16 bytes each (+ 0 bytes user-defined timestamp)
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
Time taken by CompactAll: 4632565417

With Changes:

TEST_TMPDIR=/dev/shm/ ./db_bench -disable_auto_compactions=1 -benchmarks=fillrandom -threads=32
TEST_TMPDIR=/dev/shm/ ./db_bench --disable_auto_compactions=1 -use_existing_db=1 -benchmarks=compactall -subcompactions=3
Set seed to 1659292183163276 because --seed was 0
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
Integrated BlobDB: blob cache disabled
RocksDB:    version 7.6.0
Date:       Sun Jul 31 11:29:43 2022
CPU:        32 * Intel Xeon Processor (Skylake)
CPUCache:   16384 KB
Keys:       16 bytes each (+ 0 bytes user-defined timestamp)
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
Time taken by CompactAll: 4685684075

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants