-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update lagging validators on blob reads #2220
base: main
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @andresilva91 and the rest of your teammates on Graphite |
396f3f8
to
f96dc7a
Compare
22affac
to
48c1f52
Compare
246dfc5
to
c017110
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! But so far this addresses only a part of the lagging validator situation: We also need to be able to re-propose another owner's ValidatedBlock
proposal in a later round, even if neither we nor a quorum of validators have the blob. (Maybe that's a separate issue, for a different PR.)
@@ -1328,6 +1335,59 @@ where | |||
message.action = MessageAction::Reject; | |||
continue; | |||
} | |||
} else if let ChainError::ExecutionError( | |||
ExecutionError::SystemError(SystemExecutionError::BlobNotFoundOnRead(blob_id)), | |||
_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also try to fetch the missing blobs in the ChainExecutionContext::IncomingMessage(index)
case, and reject that message only after this has failed. (Doesn't apply in this PR yet, because messages can't read blobs yet.)
So we could leave stage_block_execution_and_discard_failing_messages
as it is, and move this logic into an inner self.stage_block_execution(block)
call (which in turn calls self.client.local_node.stage_block_execution(block.clone())
). But as discussed, we will ultimately want to fetch the blobs without restarting execution from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all then for a separate PR after messages are able to read blobs then? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you prefer, but we have to make sure we remember it. The new system operation is meant to simulate what in production will actually be user operations and incoming messages. So I'm a bit worried about any code that only makes the operation work, but not messages, because then our tests give us a false sense of security.
c017110
to
3400e87
Compare
9e103fd
to
265a1a4
Compare
Graphite Automations"Assign reviewers" took an action on this PR • (07/10/24)6 reviewers were added to this PR based on Andre da Silva's automation. |
265a1a4
to
7983873
Compare
_, | ||
) = &**chain_error | ||
{ | ||
self.update_lagging_validators(*blob_id).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to update the lagging validators if our own local node is missing the blob? It would be clearer if synchronize_from_validators
really only synchronized from, not to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, but AFAIU in our test we try to confirm a block that was validated by a different client, from a client without the blob, and also with lagging validators. How should we deal with this case if we're not updating the lagging validators and getting the blob while processing the certificate? I just couldn't find an alternative, but I'm probably missing something 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect the second client to only update the lagging validators when they send back an error about the missing blob.
I think this |
7983873
to
8fba8a5
Compare
You mean the existing one on |
Motivation
We need to update lagging validators if they're not aware of blobs that have already been published.
Proposal
A few things had to be done here:
ReadBlob
SystemOperation
to be able to test this without creating or modifying existing applications.hashed_certificate_values
andhashed_blobs
toprocess_validated_block
. This was an existing bug because we check for missing blobs there, but didn't pass the information along on a retry.synchronize_chain_state
andtry_synchronize_chain_state_from
to the client, so that we could properly call the lagging validator code on chain synchronization as well.LocalValidatorClient
wasn't properly handling certificates: ifNoConfirm
and we're trying to handle a validated certificate, it wouldn't actually call the handle code like it should.Test Plan
3 tests were written (more to come in following PRs)