-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce FaultInjectionTestFS to test fault File system instead of Env #6414
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Do we still need error_handler_test.cc? It seems redundant to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a migration of FaultInjectionTestEnv
to the new FileSystem
interface? If so can we delete the old Env
-based implementation?
test_util/fault_injection_test_fs.cc
Outdated
} | ||
|
||
// A basic file truncation function suitable for this test. | ||
IOStatus TestFSTruncate(FileSystem* fs, const std::string& filename, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is following the existing pattern for dropping unsynced data. But I never understood what's the reason to write unsynced data to a file rather than buffer it in process memory. The latter avoids confusion of persisting data and un-persisting it later.
It would also make the fault injection FileSystem
useful for simulating power loss crash-recovery. db_crashtest.py
works by repeatedly running and killing db_stress
processes and verifying correctness. If the db_stress
buffers its unsynced data in process memory, killing the process yields the same result (lost unsynced data) as if we had crashed the whole machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr Thanks for the comments. I'm a little confused. So here, each file has a state_ and we use pos_at_last_sync_ to remember the position of last sync. Append can happen after the previous sync (post_). If DropUnsyncedData is called, we need to preserve the synced data. So it read out the data from 0 to pos_at_last_sync_ and write to a new file, rename it to the original one. So the file only contains the synced data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree it works for preserving only the synced data after a call to DropUnsyncedData()
. I just think buffering Append()
ed data in process memory and waiting until Sync()
to write it to a file would be more straightforward, and would make this FileSystem
useful for simulating power loss failure in our existing process crash-recovery tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr Oh I see. Correct me if my understanding is wrong. You mean, when people call TestFSWritableFile->Append, instead of directly calling target_->append, we can create a buffer (e.g., attached in file state) to temporally hold the data. Only when TestFSWritableFile->Sync is called, we really call target_->append to really write it to a file (similar as what WritableFileWrite did).
I think the using a write buffer might make the process a little bit complex. The buffer size is preallocated. If the data being appended is over the buffer size, we need to do real target_->append, or create a new buffer. It increase the complexity. If it is the former case, data in the file is more than we expected (append is called earlier than a real sync). If we continues increase the buffer size, managing the memory is not so easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this once before btw: https://github.com/ajkr/cockroach/blob/1c540424696a73adcafd5a773d220fbafb68bcbf/c-deps/libroach/rocksdbutils/env_sync_fault_injection.cc#L60-L104. IIRC the locking was needed when DBOptions::manual_wal_flush = true
as FlushWAL(true /*sync*/)
calls may be concurrent with Append()
. That has extra logic for simulating failed fsyncs (calling exit()
and corrupting the buffered data) that can be ignored for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the using a write buffer might make the process a little bit complex. The buffer size is preallocated. If the data being appended is over the buffer size, we need to do real target_->append, or create a new buffer. It increase the complexity. If it is the former case, data in the file is more than we expected (append is called earlier than a real sync). If we continues increase the buffer size, managing the memory is not so easy.
I'd just use std::string
and buffer everything until Sync()
is called; I don't think this will be used in cases where one file's unsynced data will cause OOM. Maybe in the future if it makes its way into db_stress
and we run with weird parameters like huge files. But IMO that'd be a sign we're getting good use out of this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr Thanks for the reference! I think I can follow your logic to make the change. In this way, DropUnsyncedData function can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the using a write buffer might make the process a little bit complex. The buffer size is preallocated. If the data being appended is over the buffer size, we need to do real target_->append, or create a new buffer. It increase the complexity. If it is the former case, data in the file is more than we expected (append is called earlier than a real sync). If we continues increase the buffer size, managing the memory is not so easy.
I'd just use
std::string
and buffer everything untilSync()
is called; I don't think this will be used in cases where one file's unsynced data will cause OOM. Maybe in the future if it makes its way intodb_stress
and we run with weird parameters like huge files. But IMO that'd be a sign we're getting good use out of this feature.
Yeah. I forget this is just a testing, using std::string is enough. It will handle memory allocation and copy during append.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I forget this is just a testing, using std::string is enough. It will handle memory allocation and copy during append.
You have a point though that the existing solution is more scalable than my suggestion. I didn't think of that before.
@ajkr Thanks for the reference! I think I can follow your logic to make the change. In this way, DropUnsyncedData function can be removed.
Oh interesting, I was initially thinking it'd be implemented by clearing the string
. But I haven't looked at the test case so am not sure; maybe deleting it entirely is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, I was initially thinking it'd be implemented by clearing the
string
. But I haven't looked at the test case so am not sure; maybe deleting it entirely is fine.
I mean, remove the current read->rewrite->rename logic. Just clean the write buffer. Also, remove the pos_at_last_sync_.
Since we want to replace storage related from Env to FileSystem, so I think we can remove error_handler_test.cc. |
08831ed
to
51ff83c
Compare
a77f657
to
ddb9e78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@zhichao-cao merged this pull request in e62fe50. |
…nv (#6414) Summary: In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller. A set of ErrorHandlerFSTests are introduced for testing Pull Request resolved: facebook/rocksdb#6414 Test Plan: pass make asan_check, pass error_handler_fs_test. Differential Revision: D20252421 Pulled By: zhichao-cao fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21 Signed-off-by: Changlong Chen <levisonchen@live.cn>
…nv (#6414) Summary: In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller. A set of ErrorHandlerFSTests are introduced for testing Pull Request resolved: facebook/rocksdb#6414 Test Plan: pass make asan_check, pass error_handler_fs_test. Differential Revision: D20252421 Pulled By: zhichao-cao fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21 Signed-off-by: Changlong Chen <levisonchen@live.cn>
…nv (#6414) Summary: In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller. A set of ErrorHandlerFSTests are introduced for testing Pull Request resolved: facebook/rocksdb#6414 Test Plan: pass make asan_check, pass error_handler_fs_test. Differential Revision: D20252421 Pulled By: zhichao-cao fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21 Signed-off-by: Changlong Chen <levisonchen@live.cn>
…nv (#6414) Summary: In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller. A set of ErrorHandlerFSTests are introduced for testing Pull Request resolved: facebook/rocksdb#6414 Test Plan: pass make asan_check, pass error_handler_fs_test. Differential Revision: D20252421 Pulled By: zhichao-cao fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21 Signed-off-by: Changlong Chen <levisonchen@live.cn>
…nv (#6414) Summary: In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller. A set of ErrorHandlerFSTests are introduced for testing Pull Request resolved: facebook/rocksdb#6414 Test Plan: pass make asan_check, pass error_handler_fs_test. Differential Revision: D20252421 Pulled By: zhichao-cao fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21 Signed-off-by: Changlong Chen <levisonchen@live.cn>
In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR #5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.
A set of ErrorHandlerFSTests are introduced for testing
Test plan: pass make asan_check, pass error_handler_fs_test.