Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert PR 10777 "Fix FIFO causing overlapping seqnos in L0 files due to overla…" #10999

Closed
wants to merge 1 commit into from

Conversation

hx235
Copy link
Contributor

@hx235 hx235 commented Nov 29, 2022

Context/Summary:

This reverts commit fc74abb and related HISTORY record.

The issue with PR 10777 or general approach using earliest_mem_seqno like #5958 (comment) is that the earliest seqno of memtable of each CF will always start with 0 upon Recover(). It is only updated to the correct value when there is unflushed data from previous db session or a new memtable is created. If neither of these happen, it will lead to false positive overlapping seqno and PR 10777 will throw something like https://github.com/facebook/rocksdb/blob/main/db/compaction/compaction_picker.cc#L1002-L1004. See Test for a repro.

A quick fix like 3e583a8 might work. However this solution needs more testing.

Luckily a more elegant and complete solution #10922 to the overlapping seqno problem these PRs aim to solve does not have above problem. It is already being pursued and in the process of review. So we can just revert this PR 10777 and focus on getting PR 10922 to land.

Test:

  • Repro
TEST_F(DBCompactionTest, Repro) {
  Options options = CurrentOptions();
  options.force_consistency_checks = true;
  options.compression = kNoCompression;
  options.compaction_style = CompactionStyle::kCompactionStyleLevel;
  options.disable_auto_compactions = true;

  DestroyAndReopen(options);
  ASSERT_OK(Put("key1", "v"));
  ASSERT_OK(Put("key2", "v"));
  ASSERT_OK(Flush());
  ASSERT_EQ(1, NumTableFilesAtLevel(0));

  ASSERT_OK(Put("key3", "v"));
  ASSERT_OK(Put("key4", "v"));
  ASSERT_OK(Flush());
  ASSERT_EQ(2, NumTableFilesAtLevel(0));

  Reopen(options);

  ColumnFamilyMetaData cf_meta_data;
  db_->GetColumnFamilyMetaData(&cf_meta_data);
  assert(cf_meta_data.levels[0].files.size() == 2);
  std::vector<std::string> input_files;
  for (const auto& file : cf_meta_data.levels[0].files) {
    input_files.push_back(file.name);
  }
  // Just a normal CompactFiles() which should pass but now fail due to false positive caught by #PR 10777
  Status s = db_->CompactFiles(CompactionOptions(), input_files, 0);
  EXPECT_TRUE(s.IsAborted());
  EXPECT_TRUE(s.ToString().find(
                  "has overlapping seqnos with earliest memtable seqnos") !=
              std::string::npos);
}
  • make check

…pped seqnos between ingested files and memtable's (facebook#10777)"

This reverts commit fc74abb.
@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hx235 added a commit to hx235/rocksdb that referenced this pull request Nov 29, 2022
…to overla…" (facebook#10999)

Summary:
**Context/Summary:**

This reverts commit fc74abb and related HISTORY record.

The issue with PR 10777 or general approach using earliest_mem_seqno like facebook#5958 (comment) is that the earliest seqno of memtable of each CFs does not get persisted and will always start with 0 upon Recover(). Later when creating a new memtable in certain CF, we use the last seqno of the whole DB (but not of that CF from previous DB session) for this CF.  This will lead to false positive overlapping seqno and PR 10777 will throw something like https://github.com/facebook/rocksdb/blob/main/db/compaction/compaction_picker.cc#L1002-L1004

Luckily a more elegant and complete solution to the overlapping seqno problem these PR aim to solve does not have above problem, see facebook#10922. It is already being pursued and in the process of review. So we can just revert this PR and focus on getting PR10922 to land.

Pull Request resolved: facebook#10999

Test Plan: make check

Reviewed By: anand1976

Differential Revision: D41572604

Pulled By: hx235

fbshipit-source-id: 9d9bdf594abd235e2137045cef513ca0b14e0a3a
facebook-github-bot pushed a commit that referenced this pull request Jan 3, 2023
…rash test (#11063)

Summary:
**Context/Summary:**
#10777 was reverted (#10999) due to internal blocker and replaced with a better fix #10922. However, the revert also reverted the `Options::CompactionOptionsFIFO::allow_compaction` stress/crash coverage added by the PR.

It's an useful coverage cuz setting `Options::CompactionOptionsFIFO::allow_compaction=true` will [increase](https://github.com/facebook/rocksdb/blob/7.8.fb/db/version_set.cc#L3255) the compaction score of L0 files for FIFO and then trigger more FIFO compaction. This speed up discovery of bug related to FIFO compaction like #10955. To see the speedup, compare the failure occurrence in following commands with `Options::CompactionOptionsFIFO::allow_compaction=true/false`

```
--fifo_allow_compaction=1 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --async_io=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=8.869062094789008 --bottommost_compression_type=none --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_style=2 --compaction_ttl=0 --compression_max_dict_buffer_bytes=8589934591 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=xpress --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=1048576 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --file_checksum_impl=xxh64 --flush_one_in=1000000 --format_version=4 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=10 --index_type=2 --ingest_external_file_one_in=100 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=False --log2_keys_per_lock=10 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=2 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=40000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=7 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=1000 --readahead_size=0 --readpercent=15 --recycle_log_file_num=1 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0  --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --subcompactions=2 --sync=0 --sync_fault_injection=0 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=1 --use_full_merge_v1=1 --use_merge=0 --use_multiget=0 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=65
```

Therefore this PR is adding it back to stress/crash test.

Pull Request resolved: #11063

Test Plan: Rehearsal stress test to make sure stress/crash test is stable

Reviewed By: ajkr

Differential Revision: D42283650

Pulled By: hx235

fbshipit-source-id: 132e6396ab6e24d8dcb8fe51c62dd5211cdf53ef
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants