Skip to content
This repository has been archived by the owner on Aug 13, 2021. It is now read-only.

[EPIC] Ethereum header chain over swarm #9

Open
6 tasks
pipermerriam opened this issue May 26, 2019 · 5 comments
Open
6 tasks

[EPIC] Ethereum header chain over swarm #9

pipermerriam opened this issue May 26, 2019 · 5 comments

Comments

@pipermerriam
Copy link

pipermerriam commented May 26, 2019

Rationale

The header chain for the mainnet is needed by every Ethereum client. The data is effectively append only and regularly accessed. This makes it a prime candidate for storage on the Swarm network.

Owner

@pipermerriam

Stakeholder Point of Contact

Description

  • a mechanism for header data to be stored in Swarm
  • a mechanism for swarm to learn about new headers
  • a mechanism for ethereum to retrieve headers by their native ethereum hash (not swarm chunk hash)

Storage

For swarm to store and serve headers, they need to behave like other chunks. This only requires an extra validator using the Keccak SHA3 256-bit addressing.

For swarm to access header chunks, the localstore needs to be prepopulated. The following options are viable and non-exclusive

  • manual import
  • Ethereum node pushes data to Swarm nodes/network
  • Swarm node pulls data from Ethereum node/network

The latter option is easiest if we subsume the pull mechanism (swarm nodes requesting data from ethereum clients) under the protocol with which swarm and eth clients communicate.

Communication

For ethereum nodes to retrieve this data, they will need a communication channel with swarm nodes. Obvious options include.

  • use the http-based public swarm gateway
  • use the external JSON-RPC API exposed by a swarm node.
  • use the DevP2P network to talk directly to swarm nodes (probably over a new sub-protocol like bzzeth

Candidate for DevP2P communication

One way for nodes to communicate would be over devp2p. This has the following benefits

  • it does require users to run both an Ethereum and Swarm node to benefit from this functionality since both nodes already connect to this network.
  • it allows for bidirectional communication, thus allowing swarm nodes to get informed about hashes.

The following commands define version 1 of a new sub protocol identified by the string bzz-eth.

This p2p protocol is somewhat special in that it is asymmetrical, ie., the two peers are not sending the same type of messages.
In particular the swarm nodes never send NewBlockHeaders, only receive them.

Handshake (0x00)

This MUST be the first message sent (under this protocol) after a p2p connection has been established.

[serves_headers: uint8|bool]

TODO: I removed the head from this since it seems like swarm nodes shouldn't be required to track the chain head and that the NewBlockHeaders message serves as a mechanisms for eth nodes to broadcast this information. Consider adding an Announce message to serve as a more concrete way to update a peer about stateful protocol information (like chain head).

  • serve_headers: boolean indicating if this node can be expected to serve requests for headers.

If later we find that swarm nodes do not always need new headers announced, a serve_new_headers: uint8|bool field could be added.

NewBlockHeaders (0x01)

[[hash: B_32, number: uint256], ...]
  • hash: the block hash
  • number: the block number corresponding to the provided block hash.

Advertise headers that the connected peer may be interested in. For a given session with a peer, no block hash should be sent more than once (never re-advertise the same block hash).

If later we find that swarm nodes do not always need new headers announced, a GetNewHeaders message could be introduced.

GetBlockHeaders (0x02)

[request_id: uint32, hashes: [hash_0: B_32, hash_1: B_32, ...]]
  • request_id: any 32 bit integer
  • hashes: array of 32-byte hashes.

Request a set of headers referenced by their ethereum hashes.

BlockHeaders (0x03)

[request_id: uint32, headers: [header_0, header_1, ...]]
  • request_id: The request_id from the GetHeaders message.
  • headers: array of rlp encoded block headers.

Response to GetBlockHeaders. headers must be a subset of RLP encoded block headers. No ordering is enforced on the response headers.

TODO: discuss semantics of multi-response. Maybe add nonce to response. Maybe enforce uniqueness across response headers.

Note that it is allowed to send several headers responses to the same request. This way, the swarm node can send all it has whenever it has something and serve the eth client with minimal latency.

Note that it is allowed to send a Headers message with empty headers array. This serves as an indication to the requesting eth client that the peer has no more headers available out of the requested batch. Even though this cannot be enforced, it is prudent so that the eth client can register the request context closed and fire alternative requests on the outstanding headers. This has increased relevance once requests become non-free in order to control cost vs concurrency trade-off.

Context

Ethereum node implementation notes

It's worth noting that Ethereum clients that want to retrieve this data will need to learn about the latest headers from a separate mechanism such as other ETH peers, since it will not be possible to request headers by their block number. Once an ETH peer has a recent header that they trust, they can use the parent_hash to track their way backwards to the genesis block. At a later stage of this track, swarm nodes will need to be able to do the same, see https://hackmd.io/oj9_cT2KQimMdIPe_W_ejQ#

It seems that a reasonable algorithm for syncing the header chain when connected to both a set of ETH peers and a set of BZZ peers would be to use the ETH peers to construct a "header skeleton" which is the header chain with large gaps, and then to use the bzz-eth peers to fill the gaps.

Swarm node data validation notes

Swarm nodes will want to validate headers they receive. The things that can be validated are:

  • bytes are a valid RLP encoded header.
  • keccak(rlp-encoded-header-bytes) matches the expected content hash.
  • ethash validation of the proof-of-work seal.
  • recursive validation via lookup of parent header using parent_hash field
    • this should probably have a maximum depth limit since naively tracing back to genesis is probably undesirable.

For POC, doing the RLP and keccak validation are likely adequate to catch obvious bugs.

Issues

No external issues at this time

Dependencies

Swarm needs to support a new hash type that is the keccak(raw-binary-data) so that Ethereum nodes are able to use the hashes it has available to request data and the Swarm nodes are able to know what data is being requested.

Timeline

  • Phase 0
    1. ability to do the subprotocol handshake (currently there is no defined handshake logic but maybe there should be an announce message)
  • Phase 1 (ethereum node only)
    1. Ethereum node advertises headers via NewBlockHeaders.
    2. Upon connection, Ethereum node advertises the latest 256 headers from its header chain.
    3. Ethereum node is able to respond to GetBlockHeaders requests.
  • Phase 2
    1. Swarm node fetches, validates and stores header chain based on announced headers from Ethereum node.
    2. Swarm node able to respond to GetBlockHeaders requests sent by Ethereum nodes.

The Trinity client team should be able to deliver on each of these phases in the 1-2 week time frame which suggests that if the Swarm team can deliver on a similar timeline we should be able to have a fully working POC of this within a 4 week timeline.

Acceptance criteria

  • An ETH peer with access to a trusted recent header can populate its header chain with data pulled from Swarm nodes.
  • A Swarm node can stay reasonably up-to-date with the Ethereum header chain as well as applying reasonable validation to new header data.
  • Swarm nodes can retrieve an Ethereum header that has been referenced by its ethereum hash from other swarm nodes or fall back to other ETH nodes.

To implement an easy test harness, we will assume the swarm node will be connected to at least 2 eth clients

  • a light (fast syncing) node; and
  • a full node, which will serve new headers upon request to the swarm node
@pipermerriam
Copy link
Author

pipermerriam commented May 26, 2019

I'd propose the following appoach.

  • Phase 0
    1. ability to do the sub protocol handshake (currently there is no defined handshake logic but maybe there should be an announce message)
  • Phase 1 (ethereum node only)
    1. Ethereum node advertises headers via NewHeaders.
    2. Upon connection, Ethereum node advertises the latest 256 headers from its header chain.
    3. Ethereum node is able to respond to GetHeaders requests.
  • Phase 2
    1. Swarm node fetches, validates and stores header chain based on announced headers from Ethereum node.
    2. Swarm node able to respond to GetHeaders requests sent by Ethereum nodes.

The Trinity client team should be able to deliver on each of these phases in the 1-2 week time frame which suggests that if the Swarm team can deliver on a similar timeline we should be able to have a fully working POC of this within a 4 week timeline.

@pipermerriam
Copy link
Author

One issue to resolve is from the client perspective to know what type of node you are connected to. At the beginning we can just use the capabilities that are broadcast on connection and sniff it for eth or bzz, but before we formalize this we will need an Announce style message that has flags like serves_headers to signal whether the node you are connected to is willing to respond with headers.

@pipermerriam
Copy link
Author

still a little soon for it but we should migrate this to a markdown document that we can open pull requests to at some point so that we can track changes. Probably only needed once we have the two clients talking to each other.

@pipermerriam
Copy link
Author

@zelig https://notes.ethereum.org/k1yEcw1gSo-iCNmEOmpUUg

I will not be surprised if there is something broken, there you go, should be a good starting point for us to get our nodes connected to each other.

@jmozah
Copy link
Collaborator

jmozah commented Aug 22, 2019

ethersphere/swarm#1685

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

4 participants