Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Networking] Add hello message #1531

Closed
mcdee opened this issue Dec 17, 2019 · 14 comments
Closed

[Networking] Add hello message #1531

mcdee opened this issue Dec 17, 2019 · 14 comments

Comments

@mcdee
Copy link
Contributor

mcdee commented Dec 17, 2019

At current when two peers first connect they exchange some information about the current state of their chains with the status message, but there is some information that can be useful to help peers know how to interact with each other. This should not be added to the status message, as the information is static per connection and sending it with every status update would be wasteful. Instead, a new handshake message hello is proposed.

An initial stab at the contents of a hello message might be:

  • genesis state root (32 bytes)
  • genesis fork version (4 bytes)
  • client name/semver e.g. prysm/1.2.3 (bytes)
  • supported version(s) of the networking spec (repeated uint32)

This provides a basis for understanding if the two peers could talk to each other, even if either or both of them have no state information beyond genesis. It also allows for peers to tweak their behaviour depending on known idiosyncrasies in particular client/spec versions.

@AgeManning
Copy link
Contributor

If all clients supported the identify libp2p protocol: https://github.com/libp2p/specs/tree/master/identify
Would this be sufficient information for what you are chasing.

Is the fork version already in Status?

Perhaps we could add the genesis state root to the agent_version in identify?

I ask, because it's something most libp2p-compatible clients would support, and already gives us a clients' supported libp2p protocols and versions.

@mcdee
Copy link
Contributor Author

mcdee commented Dec 17, 2019

Thanks for the link I was unaware of this function. There are two areas to understand for suitability:

  1. when/how is this triggered, how much control does the application have?
  2. can we fit the data we want in to the structures supplied?

I'll have a read through and see exactly how this works and report back.

@AgeManning
Copy link
Contributor

The application can configure an interval which periodically updates the information. It's also used to help peers discover their external IP's as well as each other's capabilities.

The agent_version is configurable and is an arbitrary string, I believe of arbitrary length. The protocols and versions are the libp2p protocols we support, in addition to the stream multiplexers and potentially the encryption protocols.

For us, it would tell us, which RPC protocols we support, which versions and encodings and also whether we support gossipsub or not.

There might also be some relevant information about this here: https://docs.libp2p.io/concepts/nat/

@mcdee
Copy link
Contributor Author

mcdee commented Dec 17, 2019

Looking at https://github.com/libp2p/go-libp2p/blob/master/p2p/protocol/identify/id.go it appears the only thing that can be set by applications is the agent_version value. Although we could use this as a compound field it would be, in my opinion, fragile compared to a well-formed message.

@mcdee mcdee changed the title [Networknig] Add hello message [Networking] Add hello message Dec 17, 2019
@dryajov
Copy link

dryajov commented Dec 17, 2019

@AgeManning I would say that the info provided by identify is mostly internal to libp2p, the way AgentVersion has been used so far is to identify the libp2p implementation, e.g. go-libp2p/3.3.4. This is free form and we could overload it to identify the Eth2 client, but this isn't how it's been used so far and I would be inclined to say that it makes more sense in the context of libp2p itself rather than any application specific information.

Another issue with using identify for this purpose, is the fact that client implementations/version can rely on the same libp2p implementation, in that case it's very hard to tell whether they are still compatible at the application layer. In addition, we rely on multistream-select for protocol selection, which is aware of versioning and AFAIK the current spec relies on that to version individual endpoints. The drawback here is that it requires additional roundtrips to select compatible protocols, if this information is communicated with the Hello message, those roundtrips wouldn't be necessary.

With all this, I see a lot of value in having a Hello message as proposed in this this issue. We can always add this info to the Status message, but it seems like that is exchanged on a periodic basis, and the info outlined here is static, as @mcdee states, in this case, sending it only once, makes the most sense.

@AgeManning
Copy link
Contributor

For clarity, I'm not against a Hello message, I think it would be useful.

Some interesting points:

I would say that the info provided by identify is mostly internal to libp2p, the way AgentVersion has been used so far is to identify the libp2p implementation, e.g. go-libp2p/3.3.4. This is free form and we could overload it to identify the Eth2 client, but this isn't how it's been used so far and I would be inclined to say that it makes more sense in the context of libp2p itself rather than any application specific information.

Substrate, for example, use the client version here.

Another issue with using identify for this purpose, is the fact that client implementations/version can rely on the same libp2p implementation, in that case it's very hard to tell whether they are still compatible at the application layer. In addition, we rely on multistream-select for protocol selection, which is aware of versioning and AFAIK the current spec relies on that to version individual endpoints. The drawback here is that it requires additional roundtrips to select compatible protocols, if this information is communicated with the Hello message, those roundtrips wouldn't be necessary.

The agent-version would tell us if they are compatible at the application layer. Agree that attempting to connect to each end-point makes unnecessary round-trips. Which is why identify tells us all the protocols (and versions) in the same round-trip that a Hello message would do.

The only difference I see between a Hello message and Identify is the extra two bits of information, genesis state-root and genesis-fork version. Identify also gives us external IP's to help with NATing but in principle we have this information from discv5.

With this being said, I also don't particularly like the idea of using identify if we think a more purpose built protocol suits us better.

Is the fork-version in the current Status message not sufficient?

The reason we want to include genesis state-root is for peers that say they are on the same fork version, but for some reason have an incorrect genesis state? In the current implementation, requesting a block from such a peer, would be invalid and the peer would be dropped. Having the genesis state-root given in advance would prevent this step correct? Is there another purpose I'm missing?

@dryajov
Copy link

dryajov commented Dec 18, 2019

Thanks for the perspective @AgeManning!

For clarity, I'm not against a Hello message, I think it would be useful.

Great, I think it's useful too.

With regards to identify, traditionally this info has been internal to the libp2p stack, some implementations don't even expose it to external apps.

Which is why identify tells us all the protocols (and versions) in the same round-trip that a Hello message would do.

I would call this a happy coincidence (perhaps by design) that we've chosen a scheme of one protocol string per endpoint and it works in our favor because identify does communicate the protocols it knows about. But should the app know about the version of mplex we're on, or what other protocols are mounted by the stack by perhaps other unrelated parts of the app? What I mean is that we seem to be overloading internals for application use.

Just leaving the above for perspective, because given the below quote, we seem to agree.

With this being said, I also don't particularly like the idea of using identify if we think a more purpose built protocol suits us better.

The reason we want to include genesis state-root is for peers that say they are on the same fork version, but for some reason have an incorrect genesis state? In the current implementation, requesting a block from such a peer, would be invalid and the peer would be dropped. Having the genesis state-root given in advance would prevent this step correct? Is there another purpose I'm missing?

What does fork mean in the context of Eth2? Is it a protocol upgrade? If that's the case, would there be a situation where there is just no fork active, i.e. no upgrades made so far? What would the fork value be in that case nil/0? If that's the case, then the genesis field makes sense because we can readily determine if the client is a mainnet client, or some random experiment or testnet peer...

@arnetheduck
Copy link
Contributor

Ideally, clients will have some notion of what they're connecting to before they make the connection - thus it would be good to include some of this information in the discovery protocol instead (fork/genesis).

re multistream, the intent is to move to a newer version that cuts down on roundtrips and makes it cheaper overall - negotiating protocols and versions is in the domain of what libp2p should take care of for us and would ideally be solved there instead of also doing it in the application layer.

status is generally cheap (compared to block data). adding a new message with special semantics I feel should be quantified with numbers at this point - for example in terms of overhead compared to other data flowing on the network. when looking at those numbers, it would be interesting to discuss and provide guidelines for when status should / needs to be used - this discussion should probably include sync as well, since syncing is the main use case when one might want to refresh the remote status information.

genesis_fork_version is included in genesis_state_root, why would both be needed? worth keeping in mind also that head_fork_version is not static.

semver has been suggested and discarded several times in the network protocol setting, due to the difficulty of providing good semantics around it.

@mcdee
Copy link
Contributor Author

mcdee commented Dec 18, 2019

Ideally, clients will have some notion of what they're connecting to before they make the connection - thus it would be good to include some of this information in the discovery protocol instead (fork/genesis).

libp2p's identify protocol has a single text field that is available to the protocol layer; without making it a seriously compound field I don't see that we can use it today. If it was expanded by the lib2p folks to include additional user-defined fields it would be a possibility.

re multistream, the intent is to move to a newer version that cuts down on roundtrips and makes it cheaper overall - negotiating protocols and versions is in the domain of what libp2p should take care of for us and would ideally be solved there instead of also doing it in the application layer.

This would be a nice way of doing things, but again doesn't exist today.

...adding a new message with special semantics I feel should be quantified with numbers at this point...

Well for any long-running network the overhead of hello would tend to 0, whereas the overhead of status would likely remain constant. But if it is considered the additional network burden is acceptable there is no technical reason why the information could not be added to the status message.

genesis_fork_version is included in genesis_state_root, why would both be needed?

It allows a node a very fast check to see if the peer even has the possibility of being compatible, and does not require the connecting peer to have built the genesis state.

semver has been suggested and discarded several times in the network protocol setting, due to the difficulty of providing good semantics around it.

The requirement is for a value that can be used to uniquely identify a software version. It could be something else (e.g. hash of last commit) if preferred. However, note this has nothing to do with the supported protocols themselves; these are provided in the last entry as an array of uint32

@axic
Copy link
Member

axic commented Dec 18, 2019

I'd also suggest to read https://eips.ethereum.org/EIPS/eip-2124 as it may give some good ideas regarding compatibility negotiation. This may be a well solved problem in Eth2 (I don't know), but getting a second perspective from Eth1 my not hurt.

@arnetheduck
Copy link
Contributor

arnetheduck commented Dec 18, 2019

libp2p's identify protocol

discovery is the process before libp2p identify comes into play: the selection of peers to connect to - we're using discv5 for this purpose, and the idea would be to add custom ENR records with information like this, so that you don't have to "ping"/connect to a client to know which network they're on / what protocols they support, as suggested in the EIP linked by @axic (thanks!)

of being compatible

the semantics of fork_version are somewhat underspecified but generally, the genesis_state and genesis_fork_version are not that interesting: consider ETH and ETC - they'd have the same genesis information but you want to know which is which when connecting.

supported protocols

the requests are versioned individually and negotiated by libp2p: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/networking/p2p-interface.md#protocol-identification - there is no such thing as a "global" spec version - there was considerable discussion on this point earlier on.

@mcdee
Copy link
Contributor Author

mcdee commented Jan 13, 2020

libp2p's identify protocol

discovery is the process before libp2p identify comes into play: the selection of peers to connect to - we're using discv5 for this purpose, and the idea would be to add custom ENR records with information like this, so that you don't have to "ping"/connect to a client to know which network they're on / what protocols they support, as suggested in the EIP linked by @axic (thanks!)

What is the current state of discv5? If it's ready to go and we can add node version there that would be great. Although if we are going to continue to use libp2p's discovery mechanism as well there would remain the requirement for a separate message.

of being compatible

the semantics of fork_version are somewhat underspecified but generally, the genesis_state and genesis_fork_version are not that interesting: consider ETH and ETC - they'd have the same genesis information but you want to know which is which when connecting.

Although they are not enough to guarantee that we're on the correct network they are enough to quickly decide that we're not on the correct network in the negative case. At the point an Ethereum 2 node first starts up and before it connects to any peers this information is all it knows, so it can be enough for the node to decide if it at least wants to try syncing with a potential peer.

@djrtwo
Copy link
Contributor

djrtwo commented Aug 3, 2020

I'm closing this issue. A hello protocol will not be introduced prior to phase 0. If there is a compelling reason based on phase 0 production issues to add this in subsequent phases, we'll open this up again then

@djrtwo djrtwo closed this as completed Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants