-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protect existing files in FaultInjectionTest{Env,FS}::ReopenWritableFile()
#8995
Conversation
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
FaultInjectionTest{Env,FS}::ReopenWritableFile()
FaultInjectionTest{Env,FS}::ReopenWritableFile()
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the fixes @ajkr ! LGTM, I just have a couple of questions
# Setting `nooverwritepercent > 0` is only possible because we do not vary | ||
# the random seed, so the same keys are chosen by every run for disallowing | ||
# overwrites. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to mention this in the db_stress
help as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, not noticing this took me on a long detour.
utilities/fault_injection_env.cc
Outdated
} else if (!exists) { | ||
// It was created by this `Env` just now. | ||
should_track = true; | ||
open_files_.insert(fname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we would want to add the file to open_files_
even if should_track
is false
? (The same question applies to FaultInjectionTestFS
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, when we don't use TestWritableFile
wrapper, open_files_
won't be maintained:
rocksdb/utilities/fault_injection_env.cc
Line 401 in 2e09a54
open_files_.erase(state.filename_); |
From the perspective of this Env
it doesn't need to know an unmanaged file is open, I guess? Maybe we should rename it open_managed_files_
.
utilities/fault_injection_fs.cc
Outdated
// We preserve contents of overwritten files up to a size threshold. | ||
// We could keep previous file in another name, but we need to worry about | ||
// garbage collect the those files. We do it if it is needed later. | ||
// We ignore I/O errors here for simplicity. | ||
std::string previous_contents = kNewFileNoOverwrite; | ||
if (target()->FileExists(t, IOOptions(), nullptr).ok()) { | ||
uint64_t file_size; | ||
if (target()->GetFileSize(t, IOOptions(), &file_size, nullptr).ok() && | ||
file_size < 1024) { | ||
ReadFileToString(target(), t, &previous_contents).PermitUncheckedError(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, do we force-create the link when there is already a file with the new name? (If my understanding is correct, that's the scenario when we would potentially need the previous contents.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, seems our LinkFile()
doesn't actually overwrite. I will just use kNewFileNoOverwrite
instead. We could also carry over the value of dir_to_new_files_since_last_sync_[sdn.first][sdn.second]
but I find that hard to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
utilities/fault_injection_fs.cc
Outdated
// We preserve contents of overwritten files up to a size threshold. | ||
// We could keep previous file in another name, but we need to worry about | ||
// garbage collect the those files. We do it if it is needed later. | ||
// We ignore I/O errors here for simplicity. | ||
std::string previous_contents = kNewFileNoOverwrite; | ||
if (target()->FileExists(t, IOOptions(), nullptr).ok()) { | ||
uint64_t file_size; | ||
if (target()->GetFileSize(t, IOOptions(), &file_size, nullptr).ok() && | ||
file_size < 1024) { | ||
ReadFileToString(target(), t, &previous_contents).PermitUncheckedError(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, seems our LinkFile()
doesn't actually overwrite. I will just use kNewFileNoOverwrite
instead. We could also carry over the value of dir_to_new_files_since_last_sync_[sdn.first][sdn.second]
but I find that hard to think about.
# Setting `nooverwritepercent > 0` is only possible because we do not vary | ||
# the random seed, so the same keys are chosen by every run for disallowing | ||
# overwrites. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, not noticing this took me on a long detour.
utilities/fault_injection_env.cc
Outdated
} else if (!exists) { | ||
// It was created by this `Env` just now. | ||
should_track = true; | ||
open_files_.insert(fname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, when we don't use TestWritableFile
wrapper, open_files_
won't be maintained:
rocksdb/utilities/fault_injection_env.cc
Line 401 in 2e09a54
open_files_.erase(state.filename_); |
From the perspective of this Env
it doesn't need to know an unmanaged file is open, I guess? Maybe we should rename it open_managed_files_
.
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…File()` (facebook#8995) Summary: `FaultInjectionTest{Env,FS}::ReopenWritableFile()` functions were accidentally deleting WALs from previous `db_stress` runs causing verification to fail. They were operating under the assumption that `ReopenWritableFile()` would delete any existing file. It was a reasonable assumption considering the `{Env,FileSystem}::ReopenWritableFile()` documentation stated that would happen. The only problem was neither the implementations we offer nor the "real" clients in RocksDB code followed that contract. So, this PR updates the contract as well as fixing the fault injection client usage. The fault injection change exposed that `ExternalSSTFileBasicTest.SyncFailure` was relying on a fault injection `Env` dropping unsynced data written by a regular `Env`. I changed that test to make its `SstFileWriter` use fault injection `Env`, and also implemented `LinkFile()` in fault injection so the unsynced data is tracked under the new name. Pull Request resolved: facebook#8995 Test Plan: - Verified it fixes the following failure: ``` $ ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --ops_per_thread=1000 --prefixpercent=0 --readpercent=60 --reopen=0 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1 ... $ ./db_stress --avoid_flush_during_recovery=1 --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=10 --key_len_percent_dist=1,30,69 --max_bytes_for_level_base=4194304 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_write_fault_one_in=16 --ops_per_thread=1000 --prefix_size=-1 --prefixpercent=0 --readpercent=50 --sync=1 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1 ... Verification failed for column family 0 key 000000000000001300000000000000857878787878 (1143): Value not found: NotFound: Crash-recovery verification failed :( ... ``` - `make check -j48` Reviewed By: ltamasi Differential Revision: D31495388 Pulled By: ajkr fbshipit-source-id: 7886ccb6a07cb8b78ad7b6c1c341ccf40bb68385
…File()` (#8995) Summary: `FaultInjectionTest{Env,FS}::ReopenWritableFile()` functions were accidentally deleting WALs from previous `db_stress` runs causing verification to fail. They were operating under the assumption that `ReopenWritableFile()` would delete any existing file. It was a reasonable assumption considering the `{Env,FileSystem}::ReopenWritableFile()` documentation stated that would happen. The only problem was neither the implementations we offer nor the "real" clients in RocksDB code followed that contract. So, this PR updates the contract as well as fixing the fault injection client usage. The fault injection change exposed that `ExternalSSTFileBasicTest.SyncFailure` was relying on a fault injection `Env` dropping unsynced data written by a regular `Env`. I changed that test to make its `SstFileWriter` use fault injection `Env`, and also implemented `LinkFile()` in fault injection so the unsynced data is tracked under the new name. Pull Request resolved: facebook/rocksdb#8995 Test Plan: - Verified it fixes the following failure: ``` $ ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --ops_per_thread=1000 --prefixpercent=0 --readpercent=60 --reopen=0 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1 ... $ ./db_stress --avoid_flush_during_recovery=1 --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=10 --key_len_percent_dist=1,30,69 --max_bytes_for_level_base=4194304 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_write_fault_one_in=16 --ops_per_thread=1000 --prefix_size=-1 --prefixpercent=0 --readpercent=50 --sync=1 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1 ... Verification failed for column family 0 key 000000000000001300000000000000857878787878 (1143): Value not found: NotFound: Crash-recovery verification failed :( ... ``` - `make check -j48` Reviewed By: ltamasi Differential Revision: D31495388 Pulled By: ajkr fbshipit-source-id: 7886ccb6a07cb8b78ad7b6c1c341ccf40bb68385
FaultInjectionTest{Env,FS}::ReopenWritableFile()
functions were accidentally deleting WALs from previousdb_stress
runs causing verification to fail. They were operating under the assumption thatReopenWritableFile()
would delete any existing file. It was a reasonable assumption considering the{Env,FileSystem}::ReopenWritableFile()
documentation stated that would happen. The only problem was neither the implementations we offer nor the "real" clients in RocksDB code followed that contract. So, this PR updates the contract as well as fixing the fault injection client usage.The fault injection change exposed that
ExternalSSTFileBasicTest.SyncFailure
was relying on a fault injectionEnv
dropping unsynced data written by a regularEnv
. I changed that test to make itsSstFileWriter
use fault injectionEnv
, and also implementedLinkFile()
in fault injection so the unsynced data is tracked under the new name.Test Plan:
make check -j48