Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/stagedsync: fixes for mining on devnet #8874

Closed
wants to merge 5 commits into from

Conversation

manav2401
Copy link
Contributor

@manav2401 manav2401 commented Dec 1, 2023

Due to the recent changes with the snapshots, there were some issues with mining on devnets. This PR attempts to fix that but also needs some more discussion before merging.

Issue 1
With the latest devel, when erigon tries to mine with bor consensus, the engine.Prepare call fails as it tries to get the 0th span required for it to proceed. On further investigation, looks like it is reaching our to eth/stagedsync/chain_reader.go:BorSpan for the span which simply panics. Ideally it should reach out to the chain reader in consensus which has the implementation of loading span from snapshots. I wanted to know why panic was implemented in this function in the first place? For now, I have added the implementation.

Issue 2
With this change, it no longer panics but now is unable to find span in snapshot. This is expected because in the mining stages, the MiningCreateBlock stage is before BorHeimdall and hence when we start from genesis, we don't have the span loaded. This PR moves the BorHeimdall stage to 1st stage.

Issue 3
On moving this stage to the top, we no longer have access to the header and hence we can't validate it against whitelisted milestones. For now, this PR comments that check but it needs some workaround. A simple suggestion might be to skip this check while mining and only rely on syncing validation as it anyways will be checked when the block is actually processed post mining through the same stage.

Moving this stage to top also requires some more small changes to handle the 0th span which are done in this PR.

Edit: Do not merge, this breaks a few more things, trying to test it out on a devnet.

CC @AskAlexSharov

return fmt.Errorf("attempting to mine %d, which is behind current head: %d", minedHeadNumber, headNumber)
}
}
// if mine {
Copy link
Contributor Author

@manav2401 manav2401 Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some discussion around this (more details in the PR description - see Issue 3).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can safely skip this checks as the block anyways goes through this stage (with mine = false) as if it's syncing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manav2401 bor heimdall stage is called with mine=true in the mining stage loop. There are 2 separate loops - syncing and mining. My understanding is that when we are in the mine mode we need to get the header of the mined block from the mining state using cfg.miningState.MiningBlock.Header. However now that the mine block stage has been moved after the bor heimdall stage that means the mined header will not be available? We need the header in order to fetch and persist the state sync events I believe. This makes me think that we would need 2 stages - 1 before the mine block stage to fetch and persist spans and 1 after the mine block to fetch the state sync events. What are your thoughts & is my understanding so far correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note I believe currently we are not correctly handling the state sync events for the mine=true case as the mined header is not correctly passed to the fetchAndWriteBorEvents function)

Copy link
Contributor

@taratorio taratorio Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it more maybe splitting is not necessary since in iteration N the bor heimdall stage will process state sync events for the block created in iteration N-1 since the header for that block will be available

however if we need to persist state sync events then we may need to fix that bug I mentioned with passing the mined header to fetchAndWriteBorEvents - but this is something that 1) I'm not sure if is a requirement for the block producer (do you happen to know?) and 2) can be done in another PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @taratorio, you're correct that we need header for processing certain things in the bor heimdall stage. I was initially of the view that we'll anyways pass through the same stage when syncing (because when we mine block N, the data like header, bodies, exec results are cached and then it's again sent to all the sync stages where most of them just directly use the cached values and persists the data in DB and perform the remaining operations). That time we do have a header available and that should be sufficient to perform all the validations.

Re. you questions, I don't think we need the whole header looking at the fetchAndWriteBorEvents function. But, I am a bit confused because I didn't check that part of the code and looks like writing these things (not snapshot, but writing in contracts) should happen via bor consensus and not an external bor heimdall stage. At least, that's how it's done in bor. In this function, we could definitely persist the data (i.e. the state sync events) in snapshot and then bor consensus should be able to utilise it when required. Wdyt?

nextSpanId = snapshotLastSpanId + 1
}
var endSpanID uint64
if headNumber > zerothSpanEnd {
endSpanID = 2 + (headNumber-zerothSpanEnd)/spanLength
} else {
endSpanID = 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, we can simply set this to 1 so that on the first block itself, it fetches both 0th and 1st span.

@@ -206,12 +202,14 @@ func BorHeimdallForward(
nextSpanId = binary.BigEndian.Uint64(k) + 1
}
snapshotLastSpanId := cfg.blockReader.(LastFrozen).LastFrozenSpanID()
if snapshotLastSpanId+1 > nextSpanId {
if s.BlockNumber != 0 && snapshotLastSpanId+1 > nextSpanId {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because even though we don't have 0th span, the function LastFrozenSpanID will return 0 which makes this check assume that we have 0th span and need the 1st span. This is not the case when we start to mine from scratch in devnets.

@manav2401
Copy link
Contributor Author

Also, not sure why lint is failing. go mod tidy doesn't change anything in go.mod/go.sum at least locally.

@AskAlexSharov
Copy link
Collaborator

@manav2401 merge devel

@taratorio
Copy link
Contributor

taratorio commented Jan 9, 2024

this has been fixed as part of the refactor to split apart bor<->heimdall sync stage responsibilities and bor<->heimdall mining stage responsibilities #9149 - @manav2401 thank you for your help with this

@taratorio taratorio closed this Jan 9, 2024
@manav2401 manav2401 deleted the manav/mining_span_fix branch January 9, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants