Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: 1 body per request vs multiple bodies per request #226

Closed
onbjerg opened this issue Nov 18, 2022 · 5 comments
Closed

Discussion: 1 body per request vs multiple bodies per request #226

onbjerg opened this issue Nov 18, 2022 · 5 comments
Labels
A-devp2p Related to the Ethereum P2P protocol A-staged-sync Related to staged sync (pipelines and stages) C-discussion A discussion about the direction and design of the project

Comments

@onbjerg
Copy link
Member

onbjerg commented Nov 18, 2022

Creating this issue to unblock #220 and continue the discussion here.

The question is: What are the tradeoffs between requesting 1 body per request to a peer vs requesting multiple bodies per request to a peer?

Currently:

  • The body downloader requests 1 body per call to the P2P layer, but does multiple calls at the same time
  • This is done because the downloader returns bodies in order of block number, not the order in which requests are fulfilled
  • This means that the downloader assumes that the body returned is valid (but as soon as it is handed over to the stage, it is validated)

Alternatively:

  • We request multiple bodies per call to the P2P layer
  • There is no order guarantee on the bodies returned by the P2P layer, so we have to match ourselves
  • A way to match is to calculate the transactions root of the body, however, in order to match this root with a header, we need to store a mapping of TxRoot => HeaderHash, and the downloader would need a RO transaction to the database (which is not the case currently)

In other words: Is the added complexity (extra database calls, more computation) worth the upside? What is the upside?

Worth noting:

  • If we compute the transaction root as soon as the request is fulfilled, we might compute the transaction root for a large number of bodies we later discard, if they are after an invalid body. This is not currently the case, since the transaction root is calculated in the order the blocks are handed over to the stage
  • The alternative requires a new table for only that part of sync (it is not used by e.g. RPC)
@onbjerg onbjerg added A-staged-sync Related to staged sync (pipelines and stages) A-devp2p Related to the Ethereum P2P protocol C-discussion A discussion about the direction and design of the project labels Nov 18, 2022
@Rjected
Copy link
Member

Rjected commented Nov 18, 2022

I am mainly worried about potential DoS vectors - a malicious node could send us a list of transactions that are not associated with any known header, we should not do anything with the bodies until we map it to a header. Overall though I think requesting multiple bodies per p2p message is alright.

The bodies stage already fetches headers for each body we request, so we can use the txroots from each header to create an in-memory map TxRoot => Header (or TxRoot||OmmersRoot => Header), instead of creating a new db table?

There are a few advantages to computing roots:

  • we can immediately validate peer responses
    • definition of the validity of a BlockBodies message: all transactions and ommers returned match a header whose hash is in our request
    • having headers beforehand makes this process very easy and mainly cpu-bound (just calculating transaction and ommers root)
    • we can kick malicious/faulty peers as soon as we know a response is invalid
  • we may not need to worry about ordering, since we can use the above map to determine if a body corresponds to a hash from our request

Does this make sense? cc @mattsse @rakita @rkrasiuk

@gakonst
Copy link
Member

gakonst commented Nov 19, 2022

As @Rjected suggested in Discord:

  1. Let's assume NOTHING about what peers respond to us with.
  2. Let's define the validity rules.
  3. Let's add tests for literally any kind of raw data being fed back to us.

@rakita
Copy link
Collaborator

rakita commented Nov 19, 2022

I want to move the discussion, from single or multi bodies request (as it does not matter a lot) to where to check transaction roots is it in the body stage or later?

multi bodies request is probably more optimal on network bandwidth as fewer requests are sent/received but it does not matter and can be switched later if we find it is a big trouble.

In both cases, we "trust" the peer f that they are delivering the requested body.
@onbjerg how does the unwind happen body stage downloads all bodies, sender recovery recovers all signature, but this is checked in execution and f it fails you would unwind all recovered transaction until that block, even good ones?

and it is more safer to ask block body by its hash,

Other than that complexity, you will not have information about peer that was malicious/faulty to disconnect him.

@gakonst
Copy link
Member

gakonst commented Nov 25, 2022

#247 will make this much easier

@onbjerg
Copy link
Member Author

onbjerg commented Jan 4, 2023

We are now doing multiple bodies per request

@onbjerg onbjerg closed this as completed Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-devp2p Related to the Ethereum P2P protocol A-staged-sync Related to staged sync (pipelines and stages) C-discussion A discussion about the direction and design of the project
Projects
Archived in project
Development

No branches or pull requests

4 participants