Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMMR size calculation on restart can be wrong #371

Closed
ignopeverell opened this issue Nov 22, 2017 · 7 comments
Closed

PMMR size calculation on restart can be wrong #371

ignopeverell opened this issue Nov 22, 2017 · 7 comments
Labels

Comments

@ignopeverell
Copy link
Contributor

On restart, we compute the size of the PMMR by checking the size of the underlying storage file. However in case of rewinds, the storage isn't truncated, which can lead to a corruption if:

  • We're on branch A that extends the PMMR size by 10 (for example)
  • Branch B has more work, making us switch
  • Branch B extends the PMMR by only 5
  • Node is restarted
@antiochp
Copy link
Member

After our discussion - this makes sense to me now. The file has got out of sync with our rewound view of it before the restart...

@yeastplume
Copy link
Member

yeastplume commented Dec 1, 2017

Just on this, I think we should probably fix this as it may lead to very subtle and difficult-to-track errors.

What do you think the best way to approach this is? Slamming it into the PMMR file would be counterproductive, but perhaps a metadata store file beside the PMMR file that just keeps track of the expected size (for now)? We'd also need to make sure that updating this persisted size and updates/rewinds to the sum tree is an atomic transaction, so they don't get out of sync.. think this issue is slightly harder than it appears on the surface

@ignopeverell
Copy link
Contributor Author

Yes, and there's the issue of pruning too (not activated yet). I don't think we can get perfect atomicity but we can try to limit the window with proper use of OS sync. I like the idea of the metadata file

@sesam
Copy link
Contributor

sesam commented Dec 1, 2017 via email

@antiochp
Copy link
Member

Is this something we want to tackle for testnet2?
This and #544 seem to be the two issue outstanding that can cause corrupted sumtrees and/or consensus failures due to sumtree differences.

@ignopeverell
Copy link
Contributor Author

I've been thinking about this some more, it's actually very unlikely to occur. So definitely not as critical as #544, even though it would still be nice to have.

@yeastplume
Copy link
Member

#735 should hopefeully address this.. will find out in testnet2 :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants