Remove false-postive assertion in FaultInjectionTestFS::RenameFile#12828
Remove false-postive assertion in FaultInjectionTestFS::RenameFile#12828hx235 wants to merge 1 commit into
FaultInjectionTestFS::RenameFile#12828Conversation
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
| fault_fs_guard->DropRandomUnsyncedFileData(&rand); | ||
| } | ||
| } | ||
| fault_fs_guard->ResetState(); |
There was a problem hiding this comment.
This would reset the state of other files created during DB open, including the data written to WAL that are not synced. This may not work when reopen > 0, see the TODO above.
There was a problem hiding this comment.
This will be an interesting discussion and I'm happy to discuss more:
1 I think the TODO can be removed now since we sync WAL before reopen when FLAGS_reopen > 0
rocksdb/db_stress_tool/db_stress_test_base.cc
Lines 3771 to 3790 in 1f589a3
2 If we look at ResetState(), it includes dir_to_new_files_since_last_sync_ and db_file_state_ (on a side note, I also think it should include open_managed_files_ so if agreed, I can add that).
db_file_state_ is used for (a) reading un-synced data since the recent 9f4c597 and (b) knowing what unsync data to be manually dropped with functions DropXXX()
dir_to_new_files_since_last_sync_ is only used for knowing what files to be manually deleted DeleteFilesCreatedAfterLastDirSync()
They are internal states in our testing infra.
3 Because we shared one FS object across different DB opens, these two internal states from last Open can affect the next Open. A problematic way is described in the PR.
I propose to make these internal states stay within each Open so we don't have such affect. Also, I wonder if we should reset the FS object every time we open.... though it's a bit of bigger change so will keep it as follow-up later.
In terms of side effect of such ResetState(), since we manually drop data before we reset the status, the manual deletion purpose of dir_to_new_files_since_last_sync_ and db_file_state_ can still be served
For reading un-synced data, do we expect to read unsync data from last Open in the current Open?? It looks like we does now by coincident but I think it's not expected cc @pdillinger to confirm. If it's not expected, then ResetState() across Open will not defeat serving the purpose of reading un-synced data.
There was a problem hiding this comment.
-
Yeah, the TODO and the parameter
reopencan be removed (it's actually replaced by your TODO). -
do we expect to read unsync data from last Open in the current Open
I think this is expected when unsynced data is not dropped.
If we do ResetState() after each DB close, then we are essentially dropping unsynced data after each DB close (it probably does not matter much here since you pointed out that we do WAL sync before doing DB reopen).
There was a problem hiding this comment.
(on a side note, I also think it should include open_managed_files_ so if agreed, I can add that).
the TODO and the parameter reopen can be removed (it's actually replaced by your TODO).
Fixed
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Suggestion from @cbi42 also includes removing the assertion involved. Thinking about it |
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
FaultInjectionTestFS::RenameFile
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…acebook#12828) Summary: **Context/Summary:** The assertion `tlist.find(tdn.second) == tlist.end()` https://github.com/facebook/rocksdb/blame/9eebaf11cbd875435b572f05f0378ecdb761cc74/utilities/fault_injection_fs.cc#L1003 can catch us false positive. Some context (1) When fault injection is enabled and db open fails because of that, crash test will retry open without injected error in order to proceed with a clean open: https://github.com/facebook/rocksdb/blob/9eebaf11cbd875435b572f05f0378ecdb761cc74/db_stress_tool/db_stress_test_base.cc#L3559 https://github.com/facebook/rocksdb/blob/9eebaf11cbd875435b572f05f0378ecdb761cc74/db_stress_tool/db_stress_test_base.cc#L3586-L3639 (2) a. `FaultInjectionTestFS::dir_to_new_files_since_last_sync` records files that are created but not yet synced. b. When we create CURRENT, we will first create a temp file and rename it as "CURRENT". As part of the renaming, we will [assert](https://github.com/facebook/rocksdb/blame/9eebaf11cbd875435b572f05f0378ecdb761cc74/utilities/fault_injection_fs.cc#L1003) `FaultInjectionTestFS::dir_to_new_files_since_last_sync ` doesn't already have a file named `CURRENT`. Suppose the following sequence of events happened: (1) 1st open, with metadata write error 1. As part of creating CURRENT file, added "CURRENT" to `FaultInjectionTestFS::dir_to_new_files_since_last_sync_` https://github.com/facebook/rocksdb/blob/9eebaf11cbd875435b572f05f0378ecdb761cc74/utilities/fault_injection_fs.cc#L735 2. `SyncDir()` here https://github.com/facebook/rocksdb/blob/9eebaf11cbd875435b572f05f0378ecdb761cc74/file/filename.cc#L412 failed with injected metadata write error. Therefore, "CURRENT" file didn't get removed from `FaultInjectionTestFS::dir_to_new_files_since_last_sync_` as it would if `SyncDir()` succeeded https://github.com/facebook/rocksdb/blob/9eebaf11cbd875435b572f05f0378ecdb761cc74/utilities/fault_injection_fs.h#L344 (2) 2st open 1. Attempted to create a CURRENT file and failed during renaming since `FaultInjectionTestFS::dir_to_new_files_since_last_sync_` already had a file called CURRENT. So will fail ``` assertion failed - tlist.find(tdn.second) == tlist.end() ``` This PR fixed this by removing the assertion. It used to catch us some missing sync of some directory (e.,g facebook#10573) so we will keep thinking about a better way to catch that. Pull Request resolved: facebook#12828 Test Plan: Command constantly failed before the fix but passed after the PR running for 10 minutes ``` python3 tools/db_crashtest.py --simple blackbox --interval=10 --WAL_size_limit_MB=1 --WAL_ttl_seconds=60 --acquire_snapshot_one_in=100 --adaptive_readahead=1 --adm_policy=2 --advise_random_on_open=1 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --allow_fallocate=1 --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=100 --block_align=0 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_before_level=1 --bloom_bits=10 --bottommost_compression_type=lz4hc --bottommost_file_compaction_delay=86400 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=8388608 --cache_type=tiered_auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=10000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_readahead_size=0 --compaction_ttl=1 --compress_format_version=1 --compressed_secondary_cache_ratio=0.5 --compressed_secondary_cache_size=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=15 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=1 --db_write_buffer_size=0 --default_temperature=kHot --default_write_temperature=kUnknown --delete_obsolete_files_period_micros=30000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_file_deletions_one_in=10000 --disable_manual_compaction_one_in=10000 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=1 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=0 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=1 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=0 --error_recovery_with_no_fault_injection=1 --exclude_wal_from_write_fault_injection=1 --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=crc32c --fill_cache=1 --flush_one_in=1000000 --format_version=3 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=1000000 --get_properties_of_all_tables_one_in=100000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=2097152 --high_pri_pool_ratio=0 --index_block_restart_interval=2 --index_shortening=0 --index_type=2 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --last_level_temperature=kWarm --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=10000 --log_file_time_to_roll=60 --log_readahead_size=16777216 --long_running_snapshots=1 --low_pri_pool_ratio=0.5 --lowest_used_cache_tier=1 --manifest_preallocation_size=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=16384 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=1000000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=1 --max_total_wal_size=0 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.1 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=1 --metadata_read_fault_one_in=32 --metadata_write_fault_one_in=0 --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=8 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_hits=0 --optimize_filters_for_memory=0 --optimize_multiget_for_io=1 --paranoid_file_checks=1 --partition_filters=1 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=1000 --prefix_size=5 --prefixpercent=5 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=32 --read_fault_one_in=0 --readahead_size=524288 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=1000000 --sample_for_compression=0 --secondary_cache_fault_one_in=32 --secondary_cache_uri= --set_options_one_in=0 --skip_stats_update_on_db_open=0 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=foo --sqfc_version=1 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=1 --subcompactions=2 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=2 --uncache_aggressiveness=1582 --universal_max_read_amp=4 --unpartitioned_pinning=0 --use_adaptive_mutex=0 --use_adaptive_mutex_lru=1 --use_attribute_group=1 --use_delta_encoding=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_cf_iterator=1 --use_multi_get_entity=1 --use_multiget=0 --use_put_entity_one_in=1 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000 --verify_compression=1 --verify_db_one_in=10000 --verify_file_checksums_one_in=1000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --write_fault_one_in=8 --writepercent=35 ``` Reviewed By: cbi42 Differential Revision: D59241548 Pulled By: hx235 fbshipit-source-id: 5bb49e6a94943273f47578a2caf3d08ca5b67e5f
Context/Summary:
The assertion
tlist.find(tdn.second) == tlist.end()https://github.com/facebook/rocksdb/blame/9eebaf11cbd875435b572f05f0378ecdb761cc74/utilities/fault_injection_fs.cc#L1003 can catch us false positive.Some context
(1) When fault injection is enabled and db open fails because of that, crash test will retry open without injected error in order to proceed with a clean open:
rocksdb/db_stress_tool/db_stress_test_base.cc
Line 3559 in 9eebaf1
rocksdb/db_stress_tool/db_stress_test_base.cc
Lines 3586 to 3639 in 9eebaf1
(2)
a.
FaultInjectionTestFS::dir_to_new_files_since_last_syncrecords files that are created but not yet synced.b. When we create CURRENT, we will first create a temp file and rename it as "CURRENT". As part of the renaming, we will assert
FaultInjectionTestFS::dir_to_new_files_since_last_syncdoesn't already have a file namedCURRENT.Suppose the following sequence of events happened:
(1) 1st open, with metadata write error
FaultInjectionTestFS::dir_to_new_files_since_last_sync_rocksdb/utilities/fault_injection_fs.cc
Line 735 in 9eebaf1
SyncDir()hererocksdb/file/filename.cc
Line 412 in 9eebaf1
FaultInjectionTestFS::dir_to_new_files_since_last_sync_as it would ifSyncDir()succeededrocksdb/utilities/fault_injection_fs.h
Line 344 in 9eebaf1
(2) 2st open
FaultInjectionTestFS::dir_to_new_files_since_last_sync_already had a file called CURRENT. So will failThis PR fixed this by removing the assertion. It used to catch us some missing sync of some directory (e.,g #10573) so we will keep thinking about a better way to catch that.
Test:
Command constantly failed before the fix but passed after the PR running for 10 minutes