Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug for WalManager with compressed WAL #10130

Closed
wants to merge 4 commits into from

Conversation

akankshamahajan15
Copy link
Contributor

@akankshamahajan15 akankshamahajan15 commented Jun 7, 2022

Summary: RocksDB uses WalManager to manage WAL files. In WalManager::ReadFirstLine(), the assumption is that reading the first record of a valid WAL file will return OK status and set the output sequence to non-zero value.
This assumption has been broken by WAL compression which writes a kSetCompressionType record which is not associated with any sequence number.
Consequently, WalManager::GetSortedWalsOfType() will skip these WALs and not return them to caller, e.g. Checkpoint, Backup, causing the operations to fail.

Test Plan:
- Newly Added test

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akankshamahajan15 for the fix! Left a few comments.

db/db_basic_test.cc Outdated Show resolved Hide resolved
db/db_basic_test.cc Outdated Show resolved Hide resolved
db/wal_manager.cc Show resolved Hide resolved
Comment on lines +511 to +514
// In case of wal_compression, it writes a `kSetCompressionType` record
// which is not associated with any sequence number. As result for an empty
// file, GetSortedWalsOfType() will skip these WALs causing the operations
// to fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As result for an empty file, GetSortedWalsOfType() will skip these WALs causing the operations to fail.

With this PR's fix, is this still true? No operation will fail, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rewrite the comment to mention operations will not fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty file skipped by GetSortedWalsOfType() should not fail because, even if we enable WAL tracking in manifest, we do not log a WAL of sync size 0 to the MANIFEST. Therefore, when we recover later, we won't be looking for such WALs.

@akankshamahajan15
Copy link
Contributor Author

@riversand963 I was thinking of running the crash tests with --wal_compression=zstd. Do I need to set any other option or if this PR is not enough to run the crash test?

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Comment on lines +511 to +514
// In case of wal_compression, it writes a `kSetCompressionType` record
// which is not associated with any sequence number. As result for an empty
// file, GetSortedWalsOfType() will skip these WALs causing the operations
// to fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty file skipped by GetSortedWalsOfType() should not fail because, even if we enable WAL tracking in manifest, we do not log a WAL of sync size 0 to the MANIFEST. Therefore, when we recover later, we won't be looking for such WALs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants