Update lagging validators on blob reads #2220

andresilva91 · 2024-07-04T15:13:09Z

Motivation

We need to update lagging validators if they're not aware of blobs that have already been published.

Proposal

A few things had to be done here:

Write the actual code to update the lagging validators. This is initially used in two places: when we're first staging block execution, and when we're synchronizing the chain state from the validators.
Created test only ReadBlob SystemOperation to be able to test this without creating or modifying existing applications.
Added hashed_certificate_values and hashed_blobs to process_validated_block. This was an existing bug because we check for missing blobs there, but didn't pass the information along on a retry.
We had to move synchronize_chain_state and try_synchronize_chain_state_from to the client, so that we could properly call the lagging validator code on chain synchronization as well.
Fixed a bug where our test LocalValidatorClient wasn't properly handling certificates: if NoConfirm and we're trying to handle a validated certificate, it wouldn't actually call the handle code like it should.

Test Plan

3 tests were written (more to come in following PRs)

andresilva91 · 2024-07-04T15:13:23Z

Make BlobId into enum #2231
Update lagging validators on blob reads #2220 👈
read_blob system API #2193
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @andresilva91 and the rest of your teammates on Graphite

afck

Nice! But so far this addresses only a part of the lagging validator situation: We also need to be able to re-propose another owner's ValidatedBlock proposal in a later round, even if neither we nor a quorum of validators have the blob. (Maybe that's a separate issue, for a different PR.)

linera-storage/src/lib.rs

linera-execution/src/system.rs

linera-core/src/client.rs

afck · 2024-07-08T13:08:00Z

linera-core/src/client.rs

@@ -1328,6 +1335,59 @@ where
                        message.action = MessageAction::Reject;
                        continue;
                    }
+                } else if let ChainError::ExecutionError(
+                    ExecutionError::SystemError(SystemExecutionError::BlobNotFoundOnRead(blob_id)),
+                    _,


We should probably also try to fetch the missing blobs in the ChainExecutionContext::IncomingMessage(index) case, and reject that message only after this has failed. (Doesn't apply in this PR yet, because messages can't read blobs yet.)

So we could leave stage_block_execution_and_discard_failing_messages as it is, and move this logic into an inner self.stage_block_execution(block) call (which in turn calls self.client.local_node.stage_block_execution(block.clone())). But as discussed, we will ultimately want to fetch the blobs without restarting execution from the beginning.

Is this all then for a separate PR after messages are able to read blobs then? 🤔

If you prefer, but we have to make sure we remember it. The new system operation is meant to simulate what in production will actually be user operations and incoming messages. So I'm a bit worried about any code that only makes the operation work, but not messages, because then our tests give us a false sense of security.

linera-core/src/unit_tests/client_tests.rs

linera-chain/src/data_types.rs

graphite-app · 2024-07-10T14:05:11Z

Graphite Automations

"Assign reviewers" took an action on this PR • (07/10/24)

6 reviewers were added to this PR based on Andre da Silva's automation.

linera-chain/src/data_types.rs

afck · 2024-07-11T12:26:51Z

linera-core/src/client.rs

+                                _,
+                            ) = &**chain_error
+                            {
+                                self.update_lagging_validators(*blob_id).await?;


Why do we need to update the lagging validators if our own local node is missing the blob? It would be clearer if synchronize_from_validators really only synchronized from, not to them.

I see your point, but AFAIU in our test we try to confirm a block that was validated by a different client, from a client without the blob, and also with lagging validators. How should we deal with this case if we're not updating the lagging validators and getting the blob while processing the certificate? I just couldn't find an alternative, but I'm probably missing something 🤔

I'd expect the second client to only update the lagging validators when they send back an error about the missing blob.

linera-core/src/unit_tests/test_utils.rs

andresilva91 · 2024-07-12T17:45:58Z

I think this test_re_propose_validated test has only been passing because of the NoConfirm bug 🤔

afck · 2024-07-13T09:25:17Z

You mean the existing one on main? If I only apply your test_utils changes, test_re_propose_validated still passes for me on main.

andresilva91 mentioned this pull request Jul 4, 2024

read_blob system API #2193

Merged

andresilva91 force-pushed the 07-04-update_lagging_validators_on_blob_reads branch from 396f3f8 to f96dc7a Compare July 4, 2024 16:16

andresilva91 force-pushed the 06-25-read_blob_system_api branch 3 times, most recently from 22affac to 48c1f52 Compare July 8, 2024 12:42

andresilva91 force-pushed the 07-04-update_lagging_validators_on_blob_reads branch 3 times, most recently from 246dfc5 to c017110 Compare July 8, 2024 13:09

afck reviewed Jul 8, 2024

View reviewed changes

andresilva91 force-pushed the 07-04-update_lagging_validators_on_blob_reads branch from c017110 to 3400e87 Compare July 8, 2024 13:53

Base automatically changed from 06-25-read_blob_system_api to main July 8, 2024 14:50

andresilva91 force-pushed the 07-04-update_lagging_validators_on_blob_reads branch 4 times, most recently from 9e103fd to 265a1a4 Compare July 10, 2024 13:44

andresilva91 marked this pull request as ready for review July 10, 2024 14:02

graphite-app bot requested review from afck, jvff, christos-h, ma2bd, MathieuDutSik and Twey July 10, 2024 14:02

andresilva91 force-pushed the 07-04-update_lagging_validators_on_blob_reads branch from 265a1a4 to 7983873 Compare July 10, 2024 14:24

andresilva91 mentioned this pull request Jul 10, 2024

Make BlobId into enum #2231

Draft

afck reviewed Jul 11, 2024

View reviewed changes

linera-chain/src/data_types.rs Outdated Show resolved Hide resolved

linera-chain/src/data_types.rs Outdated Show resolved Hide resolved

afck reviewed Jul 11, 2024

View reviewed changes

Update lagging validators on blob reads

ebe0fca

Fix test

8fba8a5

andresilva91 force-pushed the 07-04-update_lagging_validators_on_blob_reads branch from 7983873 to 8fba8a5 Compare July 12, 2024 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update lagging validators on blob reads #2220

Update lagging validators on blob reads #2220

andresilva91 commented Jul 4, 2024 •

edited

Loading

andresilva91 commented Jul 4, 2024 •

edited

Loading

afck left a comment

afck Jul 8, 2024

andresilva91 Jul 12, 2024

afck Jul 13, 2024

graphite-app bot commented Jul 10, 2024

afck Jul 11, 2024

andresilva91 Jul 12, 2024

afck Jul 13, 2024

andresilva91 commented Jul 12, 2024

afck commented Jul 13, 2024

Update lagging validators on blob reads #2220

Are you sure you want to change the base?

Update lagging validators on blob reads #2220

Conversation

andresilva91 commented Jul 4, 2024 • edited Loading

Motivation

Proposal

Test Plan

andresilva91 commented Jul 4, 2024 • edited Loading

afck left a comment

Choose a reason for hiding this comment

afck Jul 8, 2024

Choose a reason for hiding this comment

andresilva91 Jul 12, 2024

Choose a reason for hiding this comment

afck Jul 13, 2024

Choose a reason for hiding this comment

graphite-app bot commented Jul 10, 2024

Graphite Automations

afck Jul 11, 2024

Choose a reason for hiding this comment

andresilva91 Jul 12, 2024

Choose a reason for hiding this comment

afck Jul 13, 2024

Choose a reason for hiding this comment

andresilva91 commented Jul 12, 2024

afck commented Jul 13, 2024

andresilva91 commented Jul 4, 2024 •

edited

Loading

andresilva91 commented Jul 4, 2024 •

edited

Loading