Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce FaultInjectionTestFS to test fault File system instead of Env #6414

Closed
wants to merge 9 commits into from

Conversation

zhichao-cao
Copy link
Contributor

In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR #5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.

A set of ErrorHandlerFSTests are introduced for testing

Test plan: pass make asan_check, pass error_handler_fs_test.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Do we still need error_handler_test.cc? It seems redundant to me.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a migration of FaultInjectionTestEnv to the new FileSystem interface? If so can we delete the old Env-based implementation?

}

// A basic file truncation function suitable for this test.
IOStatus TestFSTruncate(FileSystem* fs, const std::string& filename,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is following the existing pattern for dropping unsynced data. But I never understood what's the reason to write unsynced data to a file rather than buffer it in process memory. The latter avoids confusion of persisting data and un-persisting it later.

It would also make the fault injection FileSystem useful for simulating power loss crash-recovery. db_crashtest.py works by repeatedly running and killing db_stress processes and verifying correctness. If the db_stress buffers its unsynced data in process memory, killing the process yields the same result (lost unsynced data) as if we had crashed the whole machine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr Thanks for the comments. I'm a little confused. So here, each file has a state_ and we use pos_at_last_sync_ to remember the position of last sync. Append can happen after the previous sync (post_). If DropUnsyncedData is called, we need to preserve the synced data. So it read out the data from 0 to pos_at_last_sync_ and write to a new file, rename it to the original one. So the file only contains the synced data.

Copy link
Contributor

@ajkr ajkr Feb 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree it works for preserving only the synced data after a call to DropUnsyncedData(). I just think buffering Append()ed data in process memory and waiting until Sync() to write it to a file would be more straightforward, and would make this FileSystem useful for simulating power loss failure in our existing process crash-recovery tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr Oh I see. Correct me if my understanding is wrong. You mean, when people call TestFSWritableFile->Append, instead of directly calling target_->append, we can create a buffer (e.g., attached in file state) to temporally hold the data. Only when TestFSWritableFile->Sync is called, we really call target_->append to really write it to a file (similar as what WritableFileWrite did).

I think the using a write buffer might make the process a little bit complex. The buffer size is preallocated. If the data being appended is over the buffer size, we need to do real target_->append, or create a new buffer. It increase the complexity. If it is the former case, data in the file is more than we expected (append is called earlier than a real sync). If we continues increase the buffer size, managing the memory is not so easy.

Copy link
Contributor

@ajkr ajkr Feb 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this once before btw: https://github.com/ajkr/cockroach/blob/1c540424696a73adcafd5a773d220fbafb68bcbf/c-deps/libroach/rocksdbutils/env_sync_fault_injection.cc#L60-L104. IIRC the locking was needed when DBOptions::manual_wal_flush = true as FlushWAL(true /*sync*/) calls may be concurrent with Append(). That has extra logic for simulating failed fsyncs (calling exit() and corrupting the buffered data) that can be ignored for now.

Copy link
Contributor

@ajkr ajkr Feb 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the using a write buffer might make the process a little bit complex. The buffer size is preallocated. If the data being appended is over the buffer size, we need to do real target_->append, or create a new buffer. It increase the complexity. If it is the former case, data in the file is more than we expected (append is called earlier than a real sync). If we continues increase the buffer size, managing the memory is not so easy.

I'd just use std::string and buffer everything until Sync() is called; I don't think this will be used in cases where one file's unsynced data will cause OOM. Maybe in the future if it makes its way into db_stress and we run with weird parameters like huge files. But IMO that'd be a sign we're getting good use out of this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr Thanks for the reference! I think I can follow your logic to make the change. In this way, DropUnsyncedData function can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the using a write buffer might make the process a little bit complex. The buffer size is preallocated. If the data being appended is over the buffer size, we need to do real target_->append, or create a new buffer. It increase the complexity. If it is the former case, data in the file is more than we expected (append is called earlier than a real sync). If we continues increase the buffer size, managing the memory is not so easy.

I'd just use std::string and buffer everything until Sync() is called; I don't think this will be used in cases where one file's unsynced data will cause OOM. Maybe in the future if it makes its way into db_stress and we run with weird parameters like huge files. But IMO that'd be a sign we're getting good use out of this feature.

Yeah. I forget this is just a testing, using std::string is enough. It will handle memory allocation and copy during append.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I forget this is just a testing, using std::string is enough. It will handle memory allocation and copy during append.

You have a point though that the existing solution is more scalable than my suggestion. I didn't think of that before.

@ajkr Thanks for the reference! I think I can follow your logic to make the change. In this way, DropUnsyncedData function can be removed.

Oh interesting, I was initially thinking it'd be implemented by clearing the string. But I haven't looked at the test case so am not sure; maybe deleting it entirely is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I was initially thinking it'd be implemented by clearing the string. But I haven't looked at the test case so am not sure; maybe deleting it entirely is fine.

I mean, remove the current read->rewrite->rename logic. Just clean the write buffer. Also, remove the pos_at_last_sync_.

@zhichao-cao
Copy link
Contributor Author

This is great! Do we still need error_handler_test.cc? It seems redundant to me.

Since we want to replace storage related from Env to FileSystem, so I think we can remove error_handler_test.cc.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhichao-cao
Copy link
Contributor Author

LGTM

Thanks for the review!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao merged this pull request in e62fe50.

levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
…nv (#6414)

Summary:
In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.

A set of ErrorHandlerFSTests are introduced for testing
Pull Request resolved: facebook/rocksdb#6414

Test Plan: pass make asan_check, pass error_handler_fs_test.

Differential Revision: D20252421

Pulled By: zhichao-cao

fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
Signed-off-by: Changlong Chen <levisonchen@live.cn>
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
…nv (#6414)

Summary:
In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.

A set of ErrorHandlerFSTests are introduced for testing
Pull Request resolved: facebook/rocksdb#6414

Test Plan: pass make asan_check, pass error_handler_fs_test.

Differential Revision: D20252421

Pulled By: zhichao-cao

fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
Signed-off-by: Changlong Chen <levisonchen@live.cn>
mm304321141 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 23, 2021
…nv (#6414)

Summary:
In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.

A set of ErrorHandlerFSTests are introduced for testing
Pull Request resolved: facebook/rocksdb#6414

Test Plan: pass make asan_check, pass error_handler_fs_test.

Differential Revision: D20252421

Pulled By: zhichao-cao

fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
Signed-off-by: Changlong Chen <levisonchen@live.cn>
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 24, 2021
…nv (#6414)

Summary:
In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.

A set of ErrorHandlerFSTests are introduced for testing
Pull Request resolved: facebook/rocksdb#6414

Test Plan: pass make asan_check, pass error_handler_fs_test.

Differential Revision: D20252421

Pulled By: zhichao-cao

fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
Signed-off-by: Changlong Chen <levisonchen@live.cn>
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Sep 14, 2021
…nv (#6414)

Summary:
In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR facebook/rocksdb#5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.

A set of ErrorHandlerFSTests are introduced for testing
Pull Request resolved: facebook/rocksdb#6414

Test Plan: pass make asan_check, pass error_handler_fs_test.

Differential Revision: D20252421

Pulled By: zhichao-cao

fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
Signed-off-by: Changlong Chen <levisonchen@live.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants