Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VoteListener doesn't retry votes even if they don't land in a bank #6675

Closed
carllin opened this issue Nov 1, 2019 · 8 comments
Closed

VoteListener doesn't retry votes even if they don't land in a bank #6675

carllin opened this issue Nov 1, 2019 · 8 comments
Assignees

Comments

@carllin
Copy link
Contributor

@carllin carllin commented Nov 1, 2019

Problem

The vote listener will pull a vote off of gossip, record the timestamp of that vote, and then attempt to run the vote through banking stage. If the vote doesn't land, the vote listener will not reattempt to play that vote again.

@sagar-solana

This comment has been minimized.

Copy link
Contributor

@sagar-solana sagar-solana commented Nov 1, 2019

@aeyakovenko Can you help propose the right solution to this?

Carl describes the problem pretty well.

The simple fix is to do the following:

  • Track vote arrival in Gossip per peer
  • Keep retrying the latest vote of each peer through banking

The problem with this is that we should stop retrying the vote if it made it into a bank successfully. Otherwise we just end up with a bunch of duplicate signature errors (nbd?) until a new vote comes through or the validator that submitted it goes offline.

We could also cache these in banking_stage (ick) or track the results (since votes are usually all on the same thread) and keep retrying until it succeeds. But then there's no way to clean up the vote if it's genuinely bad.

Also I should add that we don't know just how much of a problem the current behavior is. But it doesn't seem correct.

@aeyakovenko

This comment has been minimized.

Copy link
Member

@aeyakovenko aeyakovenko commented Nov 1, 2019

@carllin is that the bug!? I think for now we can grab all the votes from gossip and retry. The duplicate signature errors are nbd.

@aeyakovenko

This comment has been minimized.

Copy link
Member

@aeyakovenko aeyakovenko commented Nov 1, 2019

How would the votes ever fail and succeed later?

@sagar-solana

This comment has been minimized.

Copy link
Contributor

@sagar-solana sagar-solana commented Nov 1, 2019

So we don't know if this is an issue. It's just how the code is behaving right now.
For this to ever happen, the leader has to lead on two forks. Where maybe they try all the votes on the wrong fork and then never retry when they lead on the other fork.
Is that possible @aeyakovenko ?

@aeyakovenko

This comment has been minimized.

Copy link
Member

@aeyakovenko aeyakovenko commented Nov 1, 2019

@sagar-solana @carllin

  1. leader creates a bank, submit the votes
  2. partition occurs
  3. new fork that doesn't contain the previous votes is presented
  4. next bank doesn't include the previous votes

I think that would cause it. not sure that is what we saw though.

@sagar-solana

This comment has been minimized.

Copy link
Contributor

@sagar-solana sagar-solana commented Nov 1, 2019

Agreed. Not sure this is exactly what happened but we should patch it.

@carllin

This comment has been minimized.

Copy link
Contributor Author

@carllin carllin commented Nov 1, 2019

yeah, just covering bases here. Root cause still pending more analysis.

@mvines

This comment has been minimized.

Copy link
Member

@mvines mvines commented Nov 6, 2019

Fixed by #6695

@mvines mvines closed this Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.