Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secondary Instance will lost manifest data when calling TryCatchUpWithPrimary #7815

Closed
CoderSong2015 opened this issue Dec 28, 2020 · 4 comments
Assignees

Comments

@CoderSong2015
Copy link

Expected behavior

Secondary Instance could read all primary's manifest data after calling TryCatchUpWithPrimary()

Actual behavior

Secondary Instance will lost manifest data when the MANIFEST file switch twice before TryCatchUpWithPrimary() called

Steps to reproduce the behavior

Calling TryCatchUpWithPrimary after primary instance switch MANIFEST twice.

@riversand963

@riversand963
Copy link
Contributor

Thanks @CoderSong2015. This is a bug due to current (hacky) way to handle the primary's MANIFEST switch.
After the primary switches to the new MANIFEST, the first certain number of VersionEdits in the file is a summary of current state, while VersionEdits corresponding to newer updates follow. When it calls TryCatchUp, the secondary has already consumed the summary edits and must skip them. We currently do not have a good way of distinguishing between them, thus used the hack to skip a certain number of edits at the beginning of the new MANIFEST. This hack breaks when switching MANIFEST multiple (>1) times.
I have been thinking of writing a special version edits to denote the end of summary edits (see #7058), but this makes it impossible to roll back to older releases (<6.0). Feel free to contribute if you are interested and we can discuss an ideal way of fixing it.

@riversand963 riversand963 self-assigned this Feb 22, 2021
@riversand963
Copy link
Contributor

I will submit a fix for this. cc @CoderSong2015

facebook-github-bot pushed a commit that referenced this issue Mar 10, 2021
Summary:
This PR

- adds a class `ManifestTailer` that inherits from `VersionEditHandlerPointInTime`. `ManifestTailer::Iterate()` can be called multiple times to tail the primary instance's MANIFEST and apply the changes to the secondary,
- updates the implementation of `ReactiveVersionSet::ReadAndApply` to use this class,
- removes unused code in version_set.cc,
- updates existing tests, e.g. removing deleted sync points from unit tests,
- adds a new test to address the bug in #7815.

Pull Request resolved: #7998

Test Plan:
make check
Existing and newly-added tests in version_set_test.cc and db_secondary_test.cc

Reviewed By: jay-zhuang

Differential Revision: D26926641

Pulled By: riversand963

fbshipit-source-id: 8d4dd15db0ba863c213f743e33b5a207e948c980
@CoderSong2015
Copy link
Author

@riversand963 Thanks very much.
I have fixed it on my own project. My way is that loading the full new MANIFEST file when MANIFEST file changed. I think it's not very frequently for rocksdb to change the manifest file so that the secondary instance could load full manifest every time and then it will not lost manifest data.

@riversand963
Copy link
Contributor

This is the approach I took in #7998. Should we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants