-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix FIFO causing overlapping seqnos in L0 files due to overlapped seqnos between ingested files and memtable's #10777
Conversation
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
bd0a275
to
e3c745f
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
e3c745f
to
c0ed7f2
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
c0ed7f2
to
c1121da
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
c1121da
to
df03338
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
df03338
to
cded3e7
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Please mention both bug fixes in HISTORY.md and remove unrelated info from the description
// SST files of seqnos overlap with memtables' seqnos. Such SST file can | ||
// exist when it's ingested to L0. | ||
if (prev_file != nullptr && | ||
f->fd.largest_seqno <= earliest_memtable_seqno) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it check prev_file's largest_seqno?
cded3e7
to
e78552c
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
e78552c
to
b119fa0
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
Update: fix a corner case where we didn't pass in earliest_memtable_seqno in the path of FIFO's CompactRange |
b119fa0
to
9c21e01
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
9c21e01
to
694697c
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
694697c
to
d90f3c1
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@hx235 has updated the pull request. You must reimport the pull request before landing. |
8fca84b
to
94d2ae1
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
94d2ae1
to
8cb4715
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
8cb4715
to
cfdb48e
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ajkr - added UT. I noticed the merged PR e267909#diff-d8fb3d50749aa69b378de447e3d9cf2f48abe0281437f010b5d61365a7b813fd. I took a look and don't think it will impact our PR but will still stress-run over the weekend before landing, just in case. |
cfdb48e
to
ebbe9a0
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ebbe9a0
to
8c65521
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
// And in this case, Flush() will return Status::Corruption() caught by | ||
// `force_consistency_checks=1` | ||
EXPECT_EQ(3, NumTableFilesAtLevel(0)); | ||
EXPECT_OK(Flush()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether this should fail or not without the fix is an open question (see my other complaints about the existence of the smallest_seqno validation). It might be interesting to check Get(Key(1)) returned wrong result before the fix (without the smallest_seqno validation) and the correct result after the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right - good idea. Will fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #10776
"PickCompactionToWarm", | ||
"CompactRange", "CompactFile")); | ||
|
||
TEST_P(DBCompactionTestFIFOCheckConsistencyWithParam, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test case!
Summary: **Context/Summary:** #10777 mistakenly added a history entry under 7.8 release but the PR is not included in 7.8. This mistake was due to rebase and merge didn't realize it was a conflict when "## Unreleased" was changed to "## 7.8.0 (10/22/2022)". Pull Request resolved: #10898 Test Plan: Make check Reviewed By: akankshamahajan15 Differential Revision: D40861001 Pulled By: hx235 fbshipit-source-id: b2310c95490f6ebb90834a210c965a74c9560b51
…pped seqnos between ingested files and memtable's (facebook#10777)" This reverts commit fc74abb.
…rash test (#11063) Summary: **Context/Summary:** #10777 was reverted (#10999) due to internal blocker and replaced with a better fix #10922. However, the revert also reverted the `Options::CompactionOptionsFIFO::allow_compaction` stress/crash coverage added by the PR. It's an useful coverage cuz setting `Options::CompactionOptionsFIFO::allow_compaction=true` will [increase](https://github.com/facebook/rocksdb/blob/7.8.fb/db/version_set.cc#L3255) the compaction score of L0 files for FIFO and then trigger more FIFO compaction. This speed up discovery of bug related to FIFO compaction like #10955. To see the speedup, compare the failure occurrence in following commands with `Options::CompactionOptionsFIFO::allow_compaction=true/false` ``` --fifo_allow_compaction=1 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --async_io=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=8.869062094789008 --bottommost_compression_type=none --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_style=2 --compaction_ttl=0 --compression_max_dict_buffer_bytes=8589934591 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=xpress --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=1048576 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --file_checksum_impl=xxh64 --flush_one_in=1000000 --format_version=4 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=10 --index_type=2 --ingest_external_file_one_in=100 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=False --log2_keys_per_lock=10 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=2 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=40000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=7 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=1000 --readahead_size=0 --readpercent=15 --recycle_log_file_num=1 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --subcompactions=2 --sync=0 --sync_fault_injection=0 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=1 --use_full_merge_v1=1 --use_merge=0 --use_multiget=0 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=65 ``` Therefore this PR is adding it back to stress/crash test. Pull Request resolved: #11063 Test Plan: Rehearsal stress test to make sure stress/crash test is stable Reviewed By: ajkr Differential Revision: D42283650 Pulled By: hx235 fbshipit-source-id: 132e6396ab6e24d8dcb8fe51c62dd5211cdf53ef
Context:
Same as #5958 (comment) but apply the fix to FIFO Compaction case
Repro:
Summary:
FIFO only does intra-L0 compaction in the following four cases. For other cases, FIFO drops data instead of compacting on data, which is irrelevant to the overlapping seqno issue we are solving.
total size < compaction_options_fifo.max_table_files_size
andcompaction_options_fifo.allow_compaction == true
FindIntraL0Compaction
https://github.com/facebook/rocksdb/pull/5958/files#diff-c261f77d6dd2134333c4a955c311cf4a196a08d3c2bb6ce24fd6801407877c89R56fifo.allow_compaction
in stress test to surface the overlapping seqno issue we are fixing here.compaction_options_fifo.age_for_warm > 0
earliest_mem_seqno
age_for_warm
option worths a separate PR to deal with db stress compatibility. Therefore we manually tested this path for this PRSanitizeCompactionInputFiles()
will be called beforeCompactionPicker::CompactFiles
, we simply replicate the idea in Fix corruption with intra-L0 on ingested files #5958 (comment) inSanitizeCompactionInputFiles()
. To simplify implementation, we returnStats::Abort()
on encountering seqno-overlapped file when doing compaction to L0 instead of skipping the file and proceed with the compaction.Some additional clean-up included in this PR:
earliest_memtable_seqno
toearliest_mem_seqno
for consistent namingearliest_memtable_seqno
in related APIsearliest_memtable_seqno
constant and requiredTest:
TEST_P(DBCompactionTestFIFOCheckConsistencyWithParam, FlushAfterIntraL0CompactionWithIngestedFile)
corresponding to the above 4 cases, which will fail accordingly without the fix