Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: Staged sync #40

Closed
14 of 20 tasks
onbjerg opened this issue Oct 11, 2022 · 9 comments
Closed
14 of 20 tasks

Tracking: Staged sync #40

onbjerg opened this issue Oct 11, 2022 · 9 comments
Assignees
Labels
A-staged-sync Related to staged sync (pipelines and stages) C-tracking-issue An issue that collects information about a broad development initiative

Comments

@onbjerg
Copy link
Member

onbjerg commented Oct 11, 2022

Stage abstraction

This abstraction should be mostly done, pending changes related to how the database abstractions evolve - e.g. instead of taking a raw MDBX transaction, we will likely receive another type in the future.

Pipeline

  • Better unwind priorities (@onbjerg): The current unwind priority system is based on Akula's method, but it can and should be simplified to prevent footgunning
  • Error and skip events (@onbjerg): The pipeline emits events that are currently only used for testing, but may be useful later on for metrics or other things. In some cases Ran and Unwound events are emitted with "special" values that denote that a stage either failed or was skipped. We should just add events for these cases
  • Commit intervals (@onbjerg): Currently data is committed to the database every time a stage returns from Stage::execute, but realistically this behavior should be tuneable to only commit meaningful progress

Tooling

  • Benchmarking helpers: We want to benchmark stages, so we will probably end up needing some utilities to make that easier
  • Profiling: We want insight into what the stages are doing to find paths to optimize. Currently we use tracing to mark out spans and emit events - we might be able to leverage this info in conjunction with e.g. tracing_tracy to be able to use Tracy. However, there may be tools that are better suited for profiling in our case.
  • Metrics: While not only a thing for staged sync (we need them in general), tools to expose metrics should be provided as well.

Stages

Initially we will use the good learnings from Akula, which is based on good learnings from Silkworm and Erigon, and essentially delineate the stages around the same boundaries as they have. As we progress, we might need more stages than listed here (or fewer).

For the more complex stages I propose we create separate tracking issues that link back to this one.

  • HeaderDownload (@rkrasiuk): Downloads headers over P2P
  • TotalGasIndex1: Builds an index of BlockNumber -> TotalGas. Seems to mostly be used for reporting.
  • BlockHashes1: Builds an index of BlockHash -> BlockNumber from the BlockNumber -> BlockHash table built in the HeaderDownload stage
  • BodyDownload: Downloads block bodies and saves a minimal structure containing ommers, the first transaction ID in the block and the number of transactions. Also builds a table of TxId -> Tx.
  • TotalTxIndex1: Builds an index of BlockNumber -> TotalTx. Seems to only be used for reporting in the next stage.
  • SenderRecovery: Recovers sender addresses in each transaction
  • Execution: Executes blocks
  • HashState: Hashes accounts and account storage
  • Interhashes: Builds trie hashes
  • AccountHistoryIndex1: Builds indexes related to account histories/changesets
  • StorageHistoryIndex1: Builds indexes related to storage histories/changesets
  • TxLookup1: Builds an index of TxHash -> BlockNumber, used in the RPC to look up transactions by hash.
  • CallTraceIndex1: Builds indexes that specify where an account has been the origin or destination of a message
  • Finish: Sets the chain tip (used in the RPC to figure out what our latest synced block is)

Footnotes

  1. These stages are generally what I would categorize as indexes, which we may be able to generalize somewhat. 2 3 4 5 6 7

@onbjerg onbjerg assigned onbjerg and rkrasiuk and unassigned onbjerg and rkrasiuk Oct 11, 2022
@onbjerg onbjerg added C-tracking-issue An issue that collects information about a broad development initiative A-staged-sync Related to staged sync (pipelines and stages) labels Oct 11, 2022
@gakonst
Copy link
Member

gakonst commented Oct 12, 2022

Great breakdown, agree on all. @rkrasiuk can you pls open an issue for Headers stage? And let's open them one by one as we pursue them. @rakita opened #39 which is relevant to general eth testing of stages, Dragan do you want to open another specific one for how you're going to be approaching the Executor + Execution Stage?

@gakonst
Copy link
Member

gakonst commented Oct 14, 2022

@onbjerg Noticing there's a bunch of stages in erigon not present here, WDYT? e.g. https://github.com/ledgerwatch/erigon/tree/devel/eth/stagedsync#stage-15-transaction-pool-stage

@onbjerg
Copy link
Member Author

onbjerg commented Oct 14, 2022

It seems we're only missing 2? Transpilation stage, which we can't have because we don't have anything like TEVM, and the txpool stage, but we talked about having block building be a separate part since it's a bit more involved (might be custom, flashbots etc etc) so I don't think having that stage makes sense for us. Instead the block building part will just push down the block elsewhere through the pipeline

@rakita
Copy link
Collaborator

rakita commented Oct 14, 2022

I added an additional trackng issue and reworded existing one:

  • Tracking: Eth chain tests #39 I would need mockings of databases and p2p to pass through all stages. I am assuming that there will be some minor modifications to stages (As in header stage to simplify it) but I am not sure atm extent of them. But i like idea of using chain tests to cover all stages.
  • Tracking: Execution/Validation of blocks #72 It is good to have execution and validation in one place. And there are additional functionalities that this module can give (As in building of blocks and execution of transactions). Utilities/functionalities would be aligned with the needs of stages.

@onbjerg
Copy link
Member Author

onbjerg commented Oct 14, 2022

I think in terms of #72 that would be in the consensus engine mostly, no? Or at least part of it @rakita

@rakita
Copy link
Collaborator

rakita commented Oct 14, 2022

@onbjerg there are things that are common for all consensus types so that thing can be in reth-executor. I am not sure atm if consensus is going to call execution for additional verification or execution is going to call consensus we can see this later.

@onbjerg
Copy link
Member Author

onbjerg commented Oct 18, 2022

@rakita My point is - should these commonalities not be in a consensus crate (or a consensus-traits crate) instead of the execution crate? From what I've seen from e.g. Akula and Erigon, the stage calls consensus and the VM itself does not

@rakita
Copy link
Collaborator

rakita commented Oct 18, 2022

I am not sure to be honest, consensus should contain only different consensus engines, for common things I mean block building, roots, execution etc. I would like to separate them into a standalone crate to have them in one place.

I see your point, for the stage side, to not complicate things maybe it is best just to use one trait Consensus and put any function that stages would need there, Consensus can just use whatever it needs internally.

@onbjerg onbjerg moved this to Todo in Reth Tracker Jan 4, 2023
@onbjerg onbjerg moved this from Todo to In Progress in Reth Tracker Jan 4, 2023
@onbjerg onbjerg moved this from In Progress to Tracking in Reth Tracker Jan 4, 2023
@onbjerg
Copy link
Member Author

onbjerg commented Jan 23, 2023

Closing this as it is out of date - the stages have been shuffled around/merged/split etc. We need a few more stages, but those are handled in separate issues.

@onbjerg onbjerg closed this as completed Jan 23, 2023
@github-project-automation github-project-automation bot moved this from Tracking to Done in Reth Tracker Jan 23, 2023
anonymousGiga added a commit to anonymousGiga/reth that referenced this issue Feb 20, 2024
anonymousGiga added a commit to anonymousGiga/reth that referenced this issue Feb 20, 2024
yutianwu pushed a commit to yutianwu/reth that referenced this issue Jul 1, 2024
AshinGau added a commit to AshinGau/reth that referenced this issue Oct 13, 2024
AshinGau added a commit to AshinGau/reth that referenced this issue Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-staged-sync Related to staged sync (pipelines and stages) C-tracking-issue An issue that collects information about a broad development initiative
Projects
Archived in project
Development

No branches or pull requests

4 participants