Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug caused by secondary not skipping the beginning of new MANIFEST #5472

Closed

Conversation

riversand963
Copy link
Contributor

While the secondary is replaying after the primary, the primary may switch to a new MANIFEST. The secondary is already able to detect and follow the primary to the new MANIFEST. However, the current implementation has a bug, described as follows.
The new MANIFEST's first records have been generated by VersionSet::WriteSnapshot to describe the current state of the column families and the db as of the MANIFEST creation. Since the secondary instance has already finished recovering upon start, there is no need for the secondary to process these records. Actually, if the secondary were to replay these records, the secondary may end up adding the same SST files again to each column family, causing consistency checks done by VersionBuilder to fail. Therefore, we record the number of records to skip at the beginning of the new MANIFEST and ignore them.

Test plan (on dev server)

$make clean && make -j32 all
$./db_secondary_test

All existing unit tests must pass as well.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. Re-import the pull request

@HaoyuHuang
Copy link
Contributor

As discussed offline, it is possible that the primary switches to a new manifest but haven't wrote any record yet. There is a race condition produced by the following sequence of operations.

  1. The secondary calls ReadAndApply and switches to the new manifest. It updates number_of_edits_to_skip and exists the function since there is no record to read. The updates to number_of_edits_to_skip are gone since it is a local variable.
  2. The primary writes some records that should be skipped.
  3. The secondary calls ReadAndApply again and won't skip these records since number_of_edits_to_skip is 0.

One possible change is to make number_of_edits_to_skip a member variable.

Thanks!

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@riversand963
Copy link
Contributor Author

As discussed offline, it is possible that the primary switches to a new manifest but haven't wrote any record yet. There is a race condition produced by the following sequence of operations.

  1. The secondary calls ReadAndApply and switches to the new manifest. It updates number_of_edits_to_skip and exists the function since there is no record to read. The updates to number_of_edits_to_skip are gone since it is a local variable.
  2. The primary writes some records that should be skipped.
  3. The secondary calls ReadAndApply again and won't skip these records since number_of_edits_to_skip is 0.

One possible change is to make number_of_edits_to_skip a member variable.

Thanks!

Actually I gave it a second thought. Since POSIX rename is atomic, the new MANIFEST should already contain the first a few records by the time it is visible to secondary. Neverthless, I am going to keep current change.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@riversand963 riversand963 deleted the dedup_version_storage_files branch June 18, 2019 18:32
@facebook-github-bot
Copy link
Contributor

@riversand963 merged this pull request in f287f8d.

merryChris pushed a commit to merryChris/rocksdb that referenced this pull request Nov 18, 2019
…EST (facebook#5472)

Summary:
While the secondary is replaying after the primary, the primary may switch to a new MANIFEST. The secondary is already able to detect and follow the primary to the new MANIFEST. However, the current implementation has a bug, described as follows.
The new MANIFEST's first records have been generated by VersionSet::WriteSnapshot to describe the current state of the column families and the db as of the MANIFEST creation. Since the secondary instance has already finished recovering upon start, there is no need for the secondary to process these records. Actually, if the secondary were to replay these records, the secondary may end up adding the same SST files **again** to each column family, causing consistency checks done by VersionBuilder to fail. Therefore, we record the number of records to skip at the beginning of the new MANIFEST and ignore them.

Test plan (on dev server)
```
$make clean && make -j32 all
$./db_secondary_test
```
All existing unit tests must pass as well.
Pull Request resolved: facebook#5472

Differential Revision: D15866771

Pulled By: riversand963

fbshipit-source-id: a1eec4837fb2ad13059398efb0f437e74fd53bed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants