Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-778: Ethereum Node Records #43

Closed
fjl opened this issue Sep 20, 2018 · 47 comments
Closed

EIP-778: Ethereum Node Records #43

fjl opened this issue Sep 20, 2018 · 47 comments
Labels

Comments

@fjl
Copy link
Collaborator

fjl commented Sep 20, 2018

https://eips.ethereum.org/EIPS/eip-778

This issue is for discussions about ENR.

@fjl fjl mentioned this issue Sep 21, 2018
5 tasks
@Matthalp-zz
Copy link

Matthalp-zz commented Mar 14, 2019

Node records can be relayed through a future version of the node discovery protocol. They can also be relayed through arbitrary other mechanisms such as DNS, ENS, a devp2p subprotocol, etc.

Does it make sense to approve something before we know how it will work out in practice? IMHO this would be more well-suited with a concrete use case, such as a new discovery message type or two (that would hopefully be reused for v5).

@Matthalp-zz
Copy link

seq: The sequence number, a 64 bit integer

Was there a reason to make it signed versus unsigned? The latter seems safer IMHO

@Matthalp-zz
Copy link

The resulting 64-byte signature is encoded as the concatenation of the r and s signature values.

Just want to confirm that you intentionally left out the v (recovery id) parameter because that's pretty obvious (since it's either 0 or 1)?

@Matthalp-zz
Copy link

Matthalp-zz commented Mar 14, 2019

Node records improve cryptographic agility and handling of protocol upgrades.

I believe this is referring to "identity schemes"? But if not would you be able to provide some examples to help me understand?

@Matthalp-zz
Copy link

The remainder of the record consists of arbitrary key/value pairs, which must be sorted by key.

Why do they have to be sorted? Do keys also have to be unique?

@Matthalp-zz
Copy link

Although it can be found in the discovery v4 specification, it would be good to list the key-value pairs for the "v4" discovery team. One interesting thing to note is that by ordering the keys per the specification in here the RLP encoding is no longer backward compatible.

@Matthalp-zz
Copy link

Playing devil's advocate, two high level questions:

(1) Is the process for making network-level protocol changes so slow that this level of generality is needed?
(2) What (urgent) use case(s) is this enabling?

@fjl
Copy link
Collaborator Author

fjl commented Mar 14, 2019

MHO this would be more well-suited with a concrete use case, such as a new discovery message type or two (that would hopefully be reused for v5).

Was there a reason to make it signed versus unsigned? The latter seems safer IMHO.

Unsigned integer is implied, but I can add the word 'unsigned' to be explicit.

Just want to confirm that you intentionally left out the v (recovery id) parameter because that's pretty obvious (since it's either 0 or 1)?

The recovery id is not needed here and isn't part of the signature. The signature can be verified without the recovery id by checking it against the public key contained in the record itself (they're self-signed). I decided this way because another identity scheme might not use secp256k1 keys at all, but nodes may still list a secp256k1 key to be compatible with the RLPx transport.

I believe this is referring to "identity schemes"? But if not would you be able to provide some examples to help me understand?

Node records can be signed with any kind of key, and can contain key material for multiple transports using different cryptosystems. Example:

Let's say you want to provide two transport options: RLPx and libp2p secio. You want you node's 'identity key' to be an ed25519 key. You'd define an identity scheme that uses ed25519 and create a record that looks like this:

[ 
  <sig>, <seq>, 
  "ed25519", <your static ed25519 public key>, 
  "id", "ed25519", 
  "ip", <your IP>,
  "secp256k1", <your static secp256k1 public key>,
  "tcp", <TCP port>,
  "transports", ["rlpx", "secio"]
]

That record would provide sufficient information to connect with either transport and to derive a node ID for use with the DHT.

Please note that identity schemes aren't meant to be something you define all the time, it's more something that allows us to plan ahead when the specific hash function or signature scheme used is falling out of favor or attacks are discovered. An example for the process I had in mind:

A collision attack on the keccak256 hash function is discovered and we decide that it's time to move on and use a different hash function. We'd define a new identity scheme in an EIP and implement it in clients, but don't use it yet. As time passes, all participants in the DHT upgrade their client software and support the new scheme. When a significant fraction of the DHT supports the scheme, we can enable it in clients.

Why do they have to be sorted? Do keys also have to be unique?

Being sorted just ensures there is a canonical encoding. Keys are supposed to be unique, I'll add that to the spec. There is also a nice practical benefit: if keys are guaranteed to be sorted, we can just verify that they're sorted once when validating the record and then use binary search to access key/value pairs. That's what the Go implementation of ENR does.

Although it can be found in the discovery v4 specification, it would be good to list the key-value pairs for the "v4" discovery team. One interesting thing to note is that by ordering the keys per the specification in here the RLP encoding is no longer backward compatible.

The encoding is not backwards compatible either way and that's fine. Discovery v4 is fine with the default keys in the EIP and doesn't need any extra keys (it actually doesn't need the "tcp" key).

Is the process for making network-level protocol changes so slow that this level of generality is needed? What (urgent) use case(s) is this enabling?

I think it makes sense to agree on a standard format for node information that is the same across multiple discovery mechanisms. We've kind-of designed ourselves into a particular corner with discovery v4 and RLPx and I want implementations to be able to explore alternative transports and discovery mechanisms while still being compatible with the main DHT.

My most urgent use case is replacing RLPx with a saner transport, e.g. a transport built from libp2p components. That could happen pretty much immediately after the discovery v4 extension is adopted.

One thing ENRs solve by having an authenticated sequence number is the problem of node endpoint changes even in discovery v4. If we learn about node A from another node B, it's generally impossible to know how fresh the information about A shared by B is. Having the sequence number really helps because we'll always be able to know whether our local node's knowledge about A is fresher than what B knows. This property translates across mechanisms: if we found A in DNS previously, we can check whether the DHTs knowledge about A is newer without actually talking to A directly.

Another use case is relaying devp2p capabilities and eth network ID through discovery. That way we could dial smarter and stop connecting to every node we find just to check whether they're participating in mainnet, another chain, etc.

A third use case that people have requested a lot is TOR support. Running an ethereum node on TOR
is impossible because discovery uses UDP and leaks the IP. With ENR, you could just omit the "ip" key and put an onion service address as the node's primary endpoint. If that node would get listed in, say, a DNS discovery node list others would connect to that node using TOR without ever knowing the IP.

@fjl fjl added the eip label Mar 14, 2019
@Matthalp-zz
Copy link

Thank you for the thoughtful response! It addresses most of the points I raised.

What do you think about adding the Discovery V4 ENR support (EIP 868) at the same time as this EIP? That way it is not only approved, but immediately applicable?

Further, I think the thing missing is how V4 will handle routing with the ENR knowledge. For example, it should ideally now help connect peers with shared capabilities.

We should also do an inventory of all the potential fields being discussed for various things and ensure they can fit within 300 bytes. I'm worried that the key-value capability may encourage a lot more key-value pairs than can be held within these limits. There might need to be some more analysis to ensure this is a good size and if not how to address that.

@fjl
Copy link
Collaborator Author

fjl commented Mar 14, 2019

Yes it would make a lot of sense to include EIP-868 at the same time. I really just want consensus on the basic ENR format for now because nobody has commented on it for more than a year.

@fjl
Copy link
Collaborator Author

fjl commented Mar 14, 2019

Regarding discv4 and ENR-based connectivity: there is no way to modify the protocol in a backwards-compatible way that relays ENRs because we always have to assume that there will be nodes in the DHT which do not support EIP-868 and you can't relay records through those.

Full ENR relay is going to happen with discv5 which isn't backwards compatible and doesn't have this restriction.

@Matthalp-zz
Copy link

It seems like the additional enr-seq field to the ping/pong packets can be used to signal whether there is v4 support for ENR? That field will be ignored on non-ENR compliant v4 peers because of the liberal parsing adopted in EIP 8.

@fjl
Copy link
Collaborator Author

fjl commented Mar 14, 2019

Yes, that's in the EIP. But we can't really change any other message or require that nodes start sending ENRs in Neighbors replies. The best we can do is fetching the ENR via UDP before connecting. But even if the peer doesn't support EIP-868 it might still be a good peer so it's not very useful. It is very useful to avoid dialing someone on mainnet if you're looking for testnet peers though.

@Matthalp-zz
Copy link

Sure, but we can define some additional messages that can be exchanged amongst peers that have agreed that they both support ENR. Peers would use the existing DHT but there may be some auxiliary metadata stored for the ones who support ENR, which would enable two high-level message pairs I think would be useful are:

GetENRNeighbors for doing a version of GetNeighbors for peers that support ENR
QueryENR which would allow a peer to request a list of peers that meet some sort of criteria from within the DHT.

I do understand that this is giving preference to some peers on the discovery layer. However, I view this as an incentive for peers to adopt ENR support. Nonetheless, it's all worth thinking through.

@esaulpaugh
Copy link

EIP 778 made a good test case for my RLP library (Java) esaulpaugh/headlong@1c58784

@gumb0
Copy link
Member

gumb0 commented Mar 28, 2019

Some nitpicks to the spec.

content = rlp(seq) || rlp(k) || rlp(v) || ...
signature = rlp(sign(rlp_list(content)))

This notation is a bit confusing. I think by || you meant concatenating values to make a list, but it can be confused with byte concatenation. If I understand it correctly, I'd express it with something like

content   = rlp_list(seq, k, v, ...)
signature = sign(content)
record    = rlp_list(signature, seq, k, v, ...)

ip | IP address, 4 or 16 bytes

Should we specify the byte order? Also for TCP & UDP ports.

@esaulpaugh
Copy link

why is content wrapped in a singleton list anyway

@gumb0
Copy link
Member

gumb0 commented Apr 2, 2019

Are sequence numbers 0-based?

@gumb0
Copy link
Member

gumb0 commented Apr 2, 2019

To derive a node address, take the keccak256 hash of the uncompressed public key.

What address does this refer to? v4 enode-address includes all 512 bytes of public key. (is this Kademlia-address for distance calculation?)

@fjl
Copy link
Collaborator Author

fjl commented Apr 3, 2019

@esaulpaugh there is no particular reason for content being a list. One way to rationalize it is protection against (hypothetical) length-extension attacks.

@fjl
Copy link
Collaborator Author

fjl commented Apr 3, 2019

ethereum/EIPs#1906 contains some edits to make the spec clearer

@fjl
Copy link
Collaborator Author

fjl commented Apr 3, 2019

@gumb0 You can start sequence number at zero or one. It doesn't really matter because the only rule is that that sequence number must increase when the record is updated.

@esaulpaugh
Copy link

esaulpaugh commented Apr 4, 2019

@fjl The record format seems very strange. content looks like it's wrapped in its own list for signing but when encoded in record it is not.

Is it not more conventional and intuitive to have the record format simply be signature(x) || x, similar to Encrypt-then-MAC's ciphertext || MAC(ciphertext)? On that note, how about even x || signature(x)? Currently, content has to either be encoded two different ways (or just once if you're careful to avoid copying the extraneous (variable-length) list prefix over to record).

I like the Encrypt-then-MAC style because it's all about reducing the attack surface. And I'm inclined to suggest that content and record should not be RLP lists at all but rather be concatenations (i.e. RLP sequences). Should save a few bytes as well. Thoughts?

@fjl
Copy link
Collaborator Author

fjl commented Apr 10, 2019

@esaulpaugh Records are often embedded into another RLP structure. This means a node record should be a single RLP value. If you are worried that encoding the record will be inefficient, please note that it's fine to just encode the elements of content once, then add a list header for signing and a different one when creating the final record. Turning a pre-encoded blob into an RLP list is cheap because it's literally just adding a length prefix.

@fjl
Copy link
Collaborator Author

fjl commented Apr 10, 2019

I'm leaning towards closing this discussion soon because the EIP was accepted a while ago and some people have started implementing it. It'll be too late for changes once records are deployed in a protocol.

If you are unhappy about anything in the spec, speak up soon.

@fjl fjl added this to the Discovery v5 milestone Apr 11, 2019
@esaulpaugh
Copy link

Well, then can we change the nomenclature? What am I supposed to call seq, k, v, ... in my code? I can't call it content because content is [seq, k, v, ...]. Rather than calling it contentContent can we rename content to contentList and call seq, k, v, ... content?

@fjl
Copy link
Collaborator Author

fjl commented Apr 24, 2019

@esaulpaugh I used elements in the Go implementation :)

@atoulme
Copy link
Contributor

atoulme commented Apr 28, 2019

Are the pre-defined keys mandatory?

@atoulme
Copy link
Contributor

atoulme commented Apr 28, 2019

It looks like the signature requires the secp256k1 key to be present for validation. What if I want to use a different curve? (I want to use SSB with an Edwards curve).

@fjl
Copy link
Collaborator Author

fjl commented Apr 30, 2019

The pre-defined keys are not mandatory. I put them in the EIP so we'd have some common ground regarding encoding of basic info such as IP and port.

Should I add a notice to the EIP saying the keys aren't mandatory?

@fjl
Copy link
Collaborator Author

fjl commented Apr 30, 2019

About secp256k1 keys: the "secp256k1" key is required if you use the "v4" identity scheme. You can define a custom identity scheme that uses an ed25519 key. I think Ethereum should start out with "v4" id scheme because all implementations already have the crypto code/libraries for that.

We definitely need an agreed-upon definition of a scheme using ed25519 because it will help with integration of libp2p hosts. All participants in the DHT need to understand the schemes which are in active use, otherwise they can't verify each other's records.

@atoulme
Copy link
Contributor

atoulme commented Apr 30, 2019

OK I get the subtlety now. I have implement ENRs yesterday here: apache/incubator-tuweni#8
I think I could reframe this as "ENR for v4 scheme" then.

I can propose a couple of alternative identity schemes so it looks less than "v4" is the one and only way to propose identity.

@fjl
Copy link
Collaborator Author

fjl commented May 27, 2019

New PR to update the EIP: ethereum/EIPs#2087

The change adds a text encoding (base64) and splits IP address keys into "ip" for IPv4
and "ip6" for IPv6. The IP address change is needed because nodes should really provide
both addresses if they have them.

@fjl
Copy link
Collaborator Author

fjl commented Jun 4, 2019

One more: ethereum/EIPs#2097

I expect this to be the last update to EIP-778.

@fjl
Copy link
Collaborator Author

fjl commented Aug 4, 2019

Closing this issue because EIP-778 and EIP-878 are now deployed in several ethereum clients.

@shemnon
Copy link

shemnon commented Dec 7, 2020

@flj Do the sequence numbers need to be adjacent? Could they be the unix second from epoch for when the record was made just as easily as an increasing number?

@fjl
Copy link
Collaborator Author

fjl commented Dec 7, 2020

You can use the timestamp because the only requirement is that they need to be monotonically increasing. But please be careful with unix times, they might not be monotonic in call cases.

@fulldecent
Copy link

  • Can this please be implemented in a competing second client before this is published as final status?

@atoulme
Copy link
Contributor

atoulme commented May 14, 2021

As pointed here, this is now implemented in Tuweni: #43 (comment)

Besu also implemented support for it.

@fulldecent
Copy link

Thank you, can this please be noted in the EIP text?

@atoulme
Copy link
Contributor

atoulme commented May 14, 2021

I’ll send a PR in a few hours, sure thing.

@shemnon
Copy link

shemnon commented May 14, 2021

@fulldecent Besu implementes this, interop verified as we show up on the DNS discovery list that the EF generates.

I support moving it to final as-is.

@esaulpaugh
Copy link

https://github.com/esaulpaugh/headlong/blob/master/src/main/java/com/esaulpaugh/headlong/rlp/Record.java is my impl. Haven't tested for interoperability with any other implementations.

@fulldecent
Copy link

I think this spec is good. And clearly it is implemented.

When the next spec comes along with a low quality implementation details, for example like the Ethereum Yellow Paper, I would like to be able to point to EIP-778 and say "you should have multiple competing implementations like these guys before you go to final".

@atoulme
Copy link
Contributor

atoulme commented May 20, 2021

Sorry, just finally got around to sending a PR with the implementations. I guess you should review there @fulldecent , and let me know.

@fulldecent
Copy link

I thank you, this looks great

@atoulme
Copy link
Contributor

atoulme commented May 21, 2021

Please make sure to review over there: ethereum/EIPs#3582
Please see and attend to Micah's comment. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants
@fjl @atoulme @shemnon @fulldecent @Matthalp-zz @gumb0 @esaulpaugh and others