Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure writes to WAL tail during FlushWAL(true /* sync */) will be synced #10560

Closed
wants to merge 5 commits into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Aug 23, 2022

WAL append and switch can both happen between FlushWAL(true /* sync */)'s sync operations and its call to MarkLogsSynced(). We permit this since locks need to be released for the sync operations. Such an appended/switched WAL is both inactive and incompletely synced at the time MarkLogsSynced() processes it.

Prior to this PR, MarkLogsSynced() assumed all inactive WALs were fully synced and removed them from consideration for future syncs. That was wrong in the scenario described above and led to the latest append(s) never being synced. This PR changes MarkLogsSynced() to only remove inactive WALs from consideration for which all flushed data has been synced.

Test Plan: repro unit test for the scenario described above. Without this PR, it fails on "key2" not found

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ajkr ajkr requested a review from pdillinger August 23, 2022 21:32
@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ajkr ajkr requested a review from riversand963 August 24, 2022 20:16
@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ajkr ajkr requested a review from siying August 25, 2022 16:49
if (immutable_db_options_.track_and_verify_wals_in_manifest &&
wal.GetPreSyncSize() > 0) {
synced_wals->AddWal(wal.number, WalMetadata(wal.GetPreSyncSize()));
}
logs_to_free_.push_back(wal.ReleaseWriter());
it = logs_.erase(it);
if (wal.GetPreSyncSize() == wal.writer->file()->GetFlushedSize()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assertion as below?

if (wal.GetPreSyncSize() == wal.writer->file()->GetFlushedSize()) {
   // Fully synced
   logs_to_free_.push_back(wal.ReleaseWriter());
   it = logs_.erase(it);
} else if (wal.GetPreSyncSize() < wal.writer->file()->GetFlushedSize()) {
  wal.FinishSync();
  ++it;
} else {
  assert(false);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, though slightly differently as that gives me fear of infinite loop

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr It might be noob question. How is it ensured that no other thread is writing to this wal file at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like wal.number < logs_.back().number tells us the log is inactive because a newer one exists, and we only append to the newest (active) WAL.

Copy link
Contributor Author

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

if (immutable_db_options_.track_and_verify_wals_in_manifest &&
wal.GetPreSyncSize() > 0) {
synced_wals->AddWal(wal.number, WalMetadata(wal.GetPreSyncSize()));
}
logs_to_free_.push_back(wal.ReleaseWriter());
it = logs_.erase(it);
if (wal.GetPreSyncSize() == wal.writer->file()->GetFlushedSize()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, though slightly differently as that gives me fear of infinite loop

@ajkr ajkr force-pushed the fix-sync-wal-tail branch from 8e48be3 to 6c3d5ae Compare August 25, 2022 18:51
@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Connor1996 pushed a commit to Connor1996/rocksdb that referenced this pull request Mar 25, 2024
…synced (facebook#10560)

Summary:
WAL append and switch can both happen between `FlushWAL(true /* sync */)`'s sync operations and its call to `MarkLogsSynced()`. We permit this since locks need to be released for the sync operations. Such an appended/switched WAL is both inactive and incompletely synced at the time `MarkLogsSynced()` processes it.

Prior to this PR, `MarkLogsSynced()` assumed all inactive WALs were fully synced and removed them from consideration for future syncs. That was wrong in the scenario described above and led to the latest append(s) never being synced. This PR changes `MarkLogsSynced()` to only remove inactive WALs from consideration for which all flushed data has been synced.

Pull Request resolved: facebook#10560

Test Plan: repro unit test for the scenario described above. Without this PR, it fails on "key2" not found

Reviewed By: riversand963

Differential Revision: D38957391

Pulled By: ajkr

fbshipit-source-id: da77175eba97ff251a4219b227b3bb2d4843ed26
Connor1996 pushed a commit to Connor1996/rocksdb that referenced this pull request Mar 25, 2024
…synced (facebook#10560)

Summary:
WAL append and switch can both happen between `FlushWAL(true /* sync */)`'s sync operations and its call to `MarkLogsSynced()`. We permit this since locks need to be released for the sync operations. Such an appended/switched WAL is both inactive and incompletely synced at the time `MarkLogsSynced()` processes it.

Prior to this PR, `MarkLogsSynced()` assumed all inactive WALs were fully synced and removed them from consideration for future syncs. That was wrong in the scenario described above and led to the latest append(s) never being synced. This PR changes `MarkLogsSynced()` to only remove inactive WALs from consideration for which all flushed data has been synced.

Pull Request resolved: facebook#10560

Test Plan: repro unit test for the scenario described above. Without this PR, it fails on "key2" not found

Reviewed By: riversand963

Differential Revision: D38957391

Pulled By: ajkr

fbshipit-source-id: da77175eba97ff251a4219b227b3bb2d4843ed26
Connor1996 added a commit to tikv/rocksdb that referenced this pull request Mar 26, 2024
…synced (facebook#10560) (#357)

Summary:
WAL append and switch can both happen between `FlushWAL(true /* sync */)`'s sync operations and its call to `MarkLogsSynced()`. We permit this since locks need to be released for the sync operations. Such an appended/switched WAL is both inactive and incompletely synced at the time `MarkLogsSynced()` processes it.

Prior to this PR, `MarkLogsSynced()` assumed all inactive WALs were fully synced and removed them from consideration for future syncs. That was wrong in the scenario described above and led to the latest append(s) never being synced. This PR changes `MarkLogsSynced()` to only remove inactive WALs from consideration for which all flushed data has been synced.

Pull Request resolved: facebook#10560

Test Plan: repro unit test for the scenario described above. Without this PR, it fails on "key2" not found

Reviewed By: riversand963

Differential Revision: D38957391

Pulled By: ajkr

fbshipit-source-id: da77175eba97ff251a4219b227b3bb2d4843ed26

Co-authored-by: Andrew Kryczka <andrewkr@fb.com>
@@ -1532,20 +1532,28 @@ void DBImpl::MarkLogsSynced(uint64_t up_to, bool synced_dir,
auto& wal = *it;
assert(wal.IsSyncing());

if (logs_.size() > 1) {
if (wal.number < logs_.back().number) {
// Inactive WAL
if (immutable_db_options_.track_and_verify_wals_in_manifest &&
Copy link

@mittalrishabh mittalrishabh Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 1537 to 1541 should be moved to if block at line 1541

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the WAL tracking was designed to allow a WAL's synced size to grow over multiple manifest entries. For some time we were even tracking the active WAL's synced size so this was a requirement. If you are confident this logic causes errors, though, I will look deeper into this.

Connor1996 pushed a commit to Connor1996/rocksdb that referenced this pull request Mar 26, 2024
…synced (facebook#10560)

Summary:
WAL append and switch can both happen between `FlushWAL(true /* sync */)`'s sync operations and its call to `MarkLogsSynced()`. We permit this since locks need to be released for the sync operations. Such an appended/switched WAL is both inactive and incompletely synced at the time `MarkLogsSynced()` processes it.

Prior to this PR, `MarkLogsSynced()` assumed all inactive WALs were fully synced and removed them from consideration for future syncs. That was wrong in the scenario described above and led to the latest append(s) never being synced. This PR changes `MarkLogsSynced()` to only remove inactive WALs from consideration for which all flushed data has been synced.

Pull Request resolved: facebook#10560

Test Plan: repro unit test for the scenario described above. Without this PR, it fails on "key2" not found

Reviewed By: riversand963

Differential Revision: D38957391

Pulled By: ajkr

fbshipit-source-id: da77175eba97ff251a4219b227b3bb2d4843ed26
Connor1996 added a commit to tikv/rocksdb that referenced this pull request Mar 27, 2024
#358)

* Ensure writes to WAL tail during `FlushWAL(true /* sync */)` will be synced (facebook#10560)

Summary:
WAL append and switch can both happen between `FlushWAL(true /* sync */)`'s sync operations and its call to `MarkLogsSynced()`. We permit this since locks need to be released for the sync operations. Such an appended/switched WAL is both inactive and incompletely synced at the time `MarkLogsSynced()` processes it.

Prior to this PR, `MarkLogsSynced()` assumed all inactive WALs were fully synced and removed them from consideration for future syncs. That was wrong in the scenario described above and led to the latest append(s) never being synced. This PR changes `MarkLogsSynced()` to only remove inactive WALs from consideration for which all flushed data has been synced.

Pull Request resolved: facebook#10560

Test Plan: repro unit test for the scenario described above. Without this PR, it fails on "key2" not found

Reviewed By: riversand963

Differential Revision: D38957391

Pulled By: ajkr

fbshipit-source-id: da77175eba97ff251a4219b227b3bb2d4843ed26

* comment out test

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

---------

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Co-authored-by: Andrew Kryczka <andrewkr@fb.com>
@mittalrishabh
Copy link

mittalrishabh commented Apr 12, 2024

@ajkr If the memtable is switched after the population of log_context->writer but before a writer writes to the Write-Ahead Log , is there a chance of a race condition occurring? If this race condition does occur, the presync size will be the same as the flush size, resulting in the release of the log while writer is still writing to it.

facebook-github-bot pushed a commit that referenced this pull request Apr 12, 2024
Summary:
Our `FileSystem` for simulating unsynced data loss should not sync during `Close()` because it masks bugs where we forgot to sync as long as we closed the file.

Pull Request resolved: #12528

Test Plan:
Peeled back #10560 fix and verified it is caught much faster now (few seconds vs. ???) with command like

```
$ TEST_TMPDIR=./ python3 tools/db_crashtest.py blackbox --disable_wal=0 --max_key=1000 --write_buffer_size=131072 --max_bytes_for_level_base=524288 --target_file_size_base=131072 --interval=3 --sync_fault_injection=1 --enable_blob_files=0 --manual_wal_flush_one_in=10 --sync_wal_one_in=0 --get_live_files_one_in=0 --get_sorted_wal_files_one_in=0 --backup_one_in=0 --checkpoint_one_in=0 --write_fault_one_in=0 --read_fault_one_in=0 --open_write_fault_one_in=0 --compact_range_one_in=0 --compact_files_one_in=0 --open_read_fault_one_in=0 --get_property_one_in=0 --writepercent=100 -readpercent=0 -prefixpercent=0 -delpercent=0 -delrangepercent=0 -iterpercent=0
```

Reviewed By: anand1976

Differential Revision: D56033250

Pulled By: ajkr

fbshipit-source-id: 6bbf480d79a06c46f08f6214010937f6654af5ca
@mittalrishabh
Copy link

@ajkr do you think the scenario described above is valid ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants