-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix async_io failures in case there is error in reading data #10890
Conversation
23f0f02
to
35dcc41
Compare
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
35dcc41
to
17d1516
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
// In case because of some error curr_ got empty, abort IO for second as well. | ||
// Otherwise data might not align if more data needs to be read in curr_ which | ||
// might overlap with second buffer. | ||
if (!DoesBufferContainData(curr_) && bufs_[second].async_read_in_progress_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the check that fixed the issue of memory corruption.
@@ -268,16 +266,37 @@ void FilePrefetchBuffer::UpdateBuffersIfNeeded(uint64_t offset) { | |||
bufs_[second].buffer_.Clear(); | |||
} | |||
|
|||
{ | |||
// In case buffers do not align, reset second buffer. This can happen in | |||
// case readahead_size is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can it happen if readahead_size
is set? except for the fixed vs exponential size increase, prefetching behaves the same in both cases right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happened in case of db_stress because reads might not be sequential if readahead_size is set and we go for prefetching anyway. I tried adding logging but couldn't find the exact reason as to what exact condition is causing this issue. But the issue was buffers were not aligned. One had old data (related to prev_offset_) and other had new data (related to offset called).
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if data is overlapping between two buffers. If there is IOError while reading the data, it leads to empty buffer and other buffer already in progress of async read goes again for reading causing the error. Fix: Added check to abort IO in second buffer if curr_ got empty. This PR also fixes db_stress failures which happened when buffers are not aligned. Pull Request resolved: #10890 Test Plan: - Ran make crash_test -j32 with async_io enabled. - Ran benchmarks to make sure there is no regression. Reviewed By: anand1976 Differential Revision: D40881731 Pulled By: akankshamahajan15 fbshipit-source-id: 39fcf2134c7b1bbb08415ede3e1ef261ac2dbc58
Summary: Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if data is overlapping between two buffers. If there is IOError while reading the data, it leads to empty buffer and other buffer already in progress of async read goes again for reading causing the error. Fix: Added check to abort IO in second buffer if curr_ got empty. This PR also fixes db_stress failures which happened when buffers are not aligned. Pull Request resolved: #10890 Test Plan: - Ran make crash_test -j32 with async_io enabled. - Ran benchmarks to make sure there is no regression. Reviewed By: anand1976 Differential Revision: D40881731 Pulled By: akankshamahajan15 fbshipit-source-id: 39fcf2134c7b1bbb08415ede3e1ef261ac2dbc58
Summary: The db_stress code uses a wrapper Env on top of the raw/fault injection Env. The wrapper, DbStressEnvWrapper, is a legacy Env and thus has a default implementation of ReadAsync that just does a sync read. As a result, the ReadAsync implementations of PosixFileSystem and other file systems weren't being tested. Also, the ReadAsync interface wasn't implemented in FaultInjectionTestFS. This change implements the necessary interfaces in FaultInjectionTestFS and derives DbStressEnvWrapper from FileSystemWrapper rather than EnvWrapper. Pull Request resolved: #11037 Test Plan: Run db_stress standalone and crash test. With this change, db_stress is able to repro the bug fixed in #10890. Reviewed By: akankshamahajan15 Differential Revision: D42061290 Pulled By: anand1976 fbshipit-source-id: 7f0331fd15ee33fb4f7f0f4b22b206fe801ba074
Summary: Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if data is overlapping between two buffers. If there is IOError while reading the data, it leads to empty buffer and other buffer already in progress of async read goes again for reading causing the error. Fix: Added check to abort IO in second buffer if curr_ got empty. This PR also fixes db_stress failures which happened when buffers are not aligned. Pull Request resolved: facebook/rocksdb#10890 Test Plan: - Ran make crash_test -j32 with async_io enabled. - Ran benchmarks to make sure there is no regression. Reviewed By: anand1976 Differential Revision: D40881731 Pulled By: akankshamahajan15 fbshipit-source-id: 39fcf2134c7b1bbb08415ede3e1ef261ac2dbc58
Summary: Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if data is overlapping between two buffers. If there is IOError while reading the data, it leads to empty buffer and other buffer already in progress of async read goes again for reading causing the error. Fix: Added check to abort IO in second buffer if curr_ got empty. This PR also fixes db_stress failures which happened when buffers are not aligned. Pull Request resolved: facebook/rocksdb#10890 Test Plan: - Ran make crash_test -j32 with async_io enabled. - Ran benchmarks to make sure there is no regression. Reviewed By: anand1976 Differential Revision: D40881731 Pulled By: akankshamahajan15 fbshipit-source-id: 39fcf2134c7b1bbb08415ede3e1ef261ac2dbc58
Summary: Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if data is overlapping between two buffers. If there is IOError while reading the data, it leads to empty buffer and other buffer already in progress of async read goes again for reading causing the error. Fix: Added check to abort IO in second buffer if curr_ got empty. This PR also fixes db_stress failures which happened when buffers are not aligned. Pull Request resolved: facebook/rocksdb#10890 Test Plan: - Ran make crash_test -j32 with async_io enabled. - Ran benchmarks to make sure there is no regression. Reviewed By: anand1976 Differential Revision: D40881731 Pulled By: akankshamahajan15 fbshipit-source-id: 39fcf2134c7b1bbb08415ede3e1ef261ac2dbc58
Summary: Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if data is overlapping between two buffers. If there is IOError while reading the data, it leads to empty buffer and other buffer already in progress of async read goes again for reading causing the error. Fix: Added check to abort IO in second buffer if curr_ got empty. This PR also fixes db_stress failures which happened when buffers are not aligned. Pull Request resolved: facebook/rocksdb#10890 Test Plan: - Ran make crash_test -j32 with async_io enabled. - Ran benchmarks to make sure there is no regression. Reviewed By: anand1976 Differential Revision: D40881731 Pulled By: akankshamahajan15 fbshipit-source-id: 39fcf2134c7b1bbb08415ede3e1ef261ac2dbc58
Summary: Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if data is overlapping between two buffers. If there is IOError while reading the data, it leads to empty buffer and other buffer already in progress of async read goes again for reading causing the error.
Fix: Added check to abort IO in second buffer if curr_ got empty.
This PR also fixes db_stress failures which happened when buffers are not aligned.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: