derivation: remove `EngineQueue` #10643

protolambda · 2024-05-24T12:50:03Z

Description

Removal of the engine-queue.

This decouples the execution-engine from the L2 derivation-pipeline, which now attributes L2 block attributes.

The "sync deriver" (in earlier PRs, syncStep function) encapsulates the synchronous sequence of sub-derivers, preserving the old behavior still.

The DerivationPipeline does not know about any related to the EngineController anymore, making it possible to schedule it asynchronous to the other derivation work.

Builds on top of #10642

Tests

Minor test changes:

Fixed sequencer action test: since derivation traverses L1 to find when L1 reorgs, it needs to traverse the old chain before it can report a reorg. If the reorg is detected upon a sequencing action, then the derivation will be reset, but not applied to the engine, until the derivation itself runs.

Additional context

Completes phase 1a of op-node derivers design-doc

Metadata

Fix https://github.com/ethereum-optimism/protocol-quest/issues/272

op-node/rollup/driver/state.go

op-node/rollup/derive/pipeline.go

semgrep-app · 2024-05-24T12:54:41Z

Semgrep found 2 golang_fmt_errorf_no_params findings:

op-chain-ops/upgrades/l1.go
- L704 - Triage
- L707 - Triage

No fmt.Errorf invocations without fmt arguments allowed

_{Ignore this finding from golang_fmt_errorf_no_params.}

Semgrep found 2 invalid-usage-of-modified-variable findings:

op-conductor/conductor/service.go
- L668-678 - Triage
- L668-678 - Triage

Variable unsafeInNode is likely modified and later used on error. In some cases this could result in panics due to a nil dereference

_{Ignore this finding from invalid-usage-of-modified-variable.}

Semgrep found 1 todos_require_linear finding:

packages/core-utils/src/common/bn.ts
- L17 - Triage

Please create a GitHub ticket for this TODO.

_{Ignore this finding from todos_require_linear.}

op-node/rollup/derive/pipeline.go

protolambda · 2024-05-30T15:31:01Z

Might need a rebase to fix the finalize-test flake

protolambda · 2024-05-30T15:37:25Z

Trying to debug what is wrong with op-program-compat here, I did have to make some changes to fit in the separation of ResetEngine from derivation. Regular tests pass, but something is still off about FP program tests.

protolambda · 2024-05-30T19:04:37Z

Rebased on the rebased base

ajsutton · 2024-05-31T01:24:27Z

The op-program-compat test is failing because it's now reading an additional preimage to what it used to. We can regenerate that test if needed and it will grab the extra preimages required, but it's slightly suspicious that the behaviour changed. it is in the l2 state trie reads though and we have seen that hash map order can cause an extra read in the case of a deletion with those sometimes (not an issue in cannon, but can be when running natively) - odd that we haven't hit it for that test before though given how much its run since it went in. @Inphi any chance you can look at the failing test and tell if it is this random difference quickly or not?

It could just be a new read op-node is doing which isn't an issue though (fault proofs used a fixed version of op-program so repeatability isn't a concern as long as this new behaviour is deterministic).

ajsutton · 2024-05-31T01:38:11Z

Ok I'm hitting that test failure locally as well with the verify-mainnet-genesis case (cd op-program && make verify-mainnet-genesis). That test runs with --l1.head "0x4903424f6cc2cfba7c2bf8c8f48ca46721c963fa64b411cfee3697b781e3e5f1" --l2.start "105235063" --l2.end "105235064" which is the very first bedrock block. So I'm guessing we do have an additional read in that specific case now.

ajsutton · 2024-05-31T04:04:02Z

Code looks sane, though I need to dig into it better to follow the actual flow.

I've captured fresh test data for the mainnet-genesis case - on this branch it winds up with 167758 preimages and on develop it has 167728. It quite consistently fails with the existing data and passes with the new data. So it is requesting something new, but it's not going crazy requesting a ton of extra data which is good.

Looking at the difference in logs I think it's not stopping derivation when it should and winds up importing a bunch more blocks. The old log stops at:

Derivation complete: reached L2 block" head=0x1e90d8496358cbe4998efcd3657d243bcdfdcf0f13127afa146210b9ef7576af:105235064

but the new one stops at:

Derivation complete: reached L2 block" head=0x02609db31f70efb1b906cc572d23ac9ef79f8a5cb8e3baff11b824720b55ebd5:105237751

And it's part of importing one of those additional blocks that it winds up needing the extra l2 state nodes. It ultimately works because op-program will get the output root at the claimed block even if derivation went past there (required for span batches) but we are doing more work than we need to which I'd like to understand. I've put the old (from develop) and new (from this branch) logs in a gist. I'd guess though that we're stepping through a full L1 block in each "Step" call the op-program driver makes now or not updating safe head until the L1 block has been fully processed which would have the same result. The last Advancing bq origin message we get in the new logs is indeed before we import any blocks which supports the argument we're applying all batches from the L1 block in a single "step".

Inphi · 2024-05-31T05:05:15Z

I'll take a closer look later. But from glancing at the stacktrace, I see that the missing trie node was fetched during deletion:

github.com/ethereum/go-ethereum/core/state.(*StateDB).getDeletedStateObject(0xc0111dcf00, {0x9b, 0xbf, 0xb9, 0x91, 0x90, 0x62, 0xc2, 0x9a, 0x5e, ...})
        /go/pkg/mod/github.com/ethereum-optimism/op-geth@v1.101315.1/core/state/statedb.go:595 +0x39a
github.com/ethereum/go-ethereum/core/state.(*StateDB).getStateObject(...)
        /go/pkg/mod/github.com/ethereum-optimism/op-geth@v1.101315.1/core/state/statedb.go:550
github.com/ethereum/go-ethereum/core/state.(*StateDB).GetNonce(0x5?, {0x9b, 0xbf, 0xb9, 0x91, 0x90, 0x62, 0xc2, 0x9a, 0x5e, ...})
        /go/pkg/mod/github.com/ethereum-optimism/op-geth@v1.101315.1/core/state/statedb.go:293 +0x26
github.com/ethereum/go-ethereum/core.(*StateTransition).preCheck(0xc0113ee630)
        /go/pkg/mod/github.com/ethereum-optimism/op-geth@v1.101315.1/core/state_transition.go:314 +0x90
github.com/ethereum/go-ethereum/core.(*StateTransition).innerTransitionDb(0xc0113ee630)
        /go/pkg/mod/github.com/ethereum-optimism/op-geth@v1.101315.1/core/state_transition.go:449 +0x51
github.com/ethereum/go-ethereum/core.(*StateTransition).TransitionDb(0xc0113ee630)
        /go/pkg/mod/github.com/ethereum-optimism/op-geth@v1.101315.1/core/state_transition.go:412 +0xe5
github.com/ethereum/go-ethereum/core.ApplyMessage(0x9ac2629091b9bf9b?, 0xc3d7c993cd5ae15e?, 0xaa314db1?)
        /go/pkg/mod/github.com/ethereum-optimism/op-geth@v1.101315.1/core/state_transition.go:193 +0x57
github.com/ethereum/go-ethereum/core.applyTransaction(0xc01104f450, 0xc0001ce300, 0x11000000?, 0xc0111dcf00, 0xc010545280, {0xc2, 0x94, 0x72, 0xc9, 0xa8, ...}, ...)

It seems similar to the Go map iteration order issue.

ajsutton · 2024-06-01T00:01:22Z

Thanks @Inphi I dug a bit deeper after my initial comment and it is because op-program is now processing more blocks than before - these state accesses happen because one of those blocks contains a transaction that accesses them. So thankfully not the iteration order issue.

ajsutton

This looks good to me - we just need to put in a fix for op-program not stopping at the target block. I'd suggest just having the if head.number >= d.targetBlockNum from the op-program driver.go run only if there isn't an error from the pipeline rather than on NotEnoughData. That actually picks up the new block slightly earlier even in the old code and avoids depending on implementation details - typically nil return value means the Step made some progress so we should check if we've reached the target block.

op-e2e/actions/l2_verifier.go

op-program/client/driver/driver.go

codecov · 2024-06-04T14:10:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.92%. Comparing base (70912c0) to head (7f93249).

❗ Current head 7f93249 differs from pull request most recent head aadb641

Please upload reports for the commit aadb641 to get more accurate results.

Additional details and impacted files

@@             Coverage Diff              @@
##           develop   #10643       +/-   ##
============================================
+ Coverage    54.65%   81.92%   +27.27%     
============================================
  Files           37       10       -27     
  Lines         2944     1079     -1865     
  Branches       415        0      -415     
============================================
- Hits          1609      884      -725     
+ Misses        1303      163     -1140     
  Partials        32       32

Flag	Coverage Δ
cannon-go-tests	`81.92% <ø> (+2.31%)`	⬆️
chain-mon-tests	`?`
sdk-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 30 files with indirect coverage changes

protolambda · 2024-06-04T14:10:46Z

Rebased on develop. Reviewing the op-program issue / comments of above now.

semgrep-app · 2024-06-04T17:44:08Z

Semgrep found 5 sol-style-return-arg-fmt findings:

packages/contracts-bedrock/scripts/PeripheryDeployConfig.s.sol
- L95 - Triage
- L187 - Triage
- L191 - Triage
packages/contracts-bedrock/scripts/DeployPeriphery.s.sol
- L537 - Triage
- L546 - Triage

Named return arguments to functions must be appended with an underscore (_)

_{Ignore this finding from sol-style-return-arg-fmt.}

(squashed) remove debug line

protolambda · 2024-06-04T19:54:03Z

Rebased on develop, trying to deal with Fjord flakes. At least the op-program tests seem to be passing.

ajsutton

LGTM.

* op-node: remove engine queue (squashed) remove debug line * op-node: test VerifyNewL1Origin * op-node: engine-queue removal review fixes

protolambda self-assigned this May 24, 2024

semgrep-app bot reviewed May 24, 2024

View reviewed changes

op-node/rollup/driver/state.go Outdated Show resolved Hide resolved

semgrep-app bot reviewed May 24, 2024

View reviewed changes

op-node/rollup/derive/pipeline.go Outdated Show resolved Hide resolved

protolambda force-pushed the no-engine-queue branch from 232d425 to 3517849 Compare May 29, 2024 20:29

semgrep-app bot reviewed May 29, 2024

View reviewed changes

op-node/rollup/derive/pipeline.go Outdated Show resolved Hide resolved

protolambda force-pushed the no-engine-queue branch 2 times, most recently from 0bbb879 to 82b6f03 Compare May 30, 2024 15:10

protolambda changed the title ~~derivation: remove EngineQueue (work in progress)~~ derivation: remove EngineQueue May 30, 2024

protolambda force-pushed the no-engine-queue branch from 82b6f03 to ead738a Compare May 30, 2024 15:12

protolambda marked this pull request as ready for review May 30, 2024 15:31

protolambda requested review from a team and ajsutton as code owners May 30, 2024 15:31

protolambda force-pushed the attributes-handler branch from 9059f6d to fbe198b Compare May 30, 2024 19:01

protolambda force-pushed the no-engine-queue branch from 7f93249 to 103bd56 Compare May 30, 2024 19:04

ajsutton reviewed Jun 2, 2024

View reviewed changes

op-e2e/actions/l2_verifier.go Outdated Show resolved Hide resolved

op-program/client/driver/driver.go Outdated Show resolved Hide resolved

op-program/client/driver/driver.go Show resolved Hide resolved

protolambda mentioned this pull request Jun 3, 2024

op-node: extract unsafe-block processing from derivation code-path #10599

Merged

protolambda force-pushed the attributes-handler branch 2 times, most recently from 3f5004d to 3bc8330 Compare June 4, 2024 12:49

Base automatically changed from attributes-handler to develop June 4, 2024 13:29

protolambda force-pushed the no-engine-queue branch from 103bd56 to eae42d1 Compare June 4, 2024 14:09

protolambda added 3 commits June 4, 2024 21:53

op-node: remove engine queue

07a8675

(squashed) remove debug line

op-node: test VerifyNewL1Origin

934459c

op-node: engine-queue removal review fixes

aadb641

protolambda force-pushed the no-engine-queue branch from a4cf28b to aadb641 Compare June 4, 2024 19:53

protolambda requested a review from ajsutton June 4, 2024 21:36

ajsutton approved these changes Jun 4, 2024

View reviewed changes

protolambda added this pull request to the merge queue Jun 5, 2024

Merged via the queue into develop with commit 4a91c9a Jun 5, 2024
64 checks passed

protolambda deleted the no-engine-queue branch June 5, 2024 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

derivation: remove `EngineQueue` #10643

derivation: remove `EngineQueue` #10643

protolambda commented May 24, 2024 •

edited

Loading

semgrep-app bot commented May 24, 2024

protolambda commented May 30, 2024

protolambda commented May 30, 2024

protolambda commented May 30, 2024

ajsutton commented May 31, 2024

ajsutton commented May 31, 2024

ajsutton commented May 31, 2024

Inphi commented May 31, 2024

ajsutton commented Jun 1, 2024

ajsutton left a comment

codecov bot commented Jun 4, 2024 •

edited

Loading

protolambda commented Jun 4, 2024

semgrep-app bot commented Jun 4, 2024

protolambda commented Jun 4, 2024

ajsutton left a comment

derivation: remove EngineQueue #10643

derivation: remove EngineQueue #10643

Conversation

protolambda commented May 24, 2024 • edited Loading

semgrep-app bot commented May 24, 2024

protolambda commented May 30, 2024

protolambda commented May 30, 2024

protolambda commented May 30, 2024

ajsutton commented May 31, 2024

ajsutton commented May 31, 2024

ajsutton commented May 31, 2024

Inphi commented May 31, 2024

ajsutton commented Jun 1, 2024

ajsutton left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 4, 2024 • edited Loading

Codecov Report

protolambda commented Jun 4, 2024

semgrep-app bot commented Jun 4, 2024

protolambda commented Jun 4, 2024

ajsutton left a comment

Choose a reason for hiding this comment

derivation: remove `EngineQueue` #10643

derivation: remove `EngineQueue` #10643

protolambda commented May 24, 2024 •

edited

Loading

codecov bot commented Jun 4, 2024 •

edited

Loading