relay/DCUtR: Add Direct Connection Upgrade through Relay protocol #173

vyzo · 2019-05-29T13:34:25Z

~~still early draft, but it's an important subject that needs to gain some momentum.~~

In this specification, we describe a synchronization protocol for direct
connectivity with hole punching that eschews signaling servers and utilizes
existing relay connections instead.

Status: Ready for review.

raulk

A really good start, @vyzo! Happy to sit on the Interest Group for this.

relay/DCUtR.md

raulk · 2019-05-29T17:05:43Z

relay/DCUtR.md

+     obtained from the `Connect` message.
+   - Upon expiry of the timer, `B` starts a direct dial to `A` using the addresses obtained
+     from the `Connect` message.
+6. If the connection is successful, then it is prioritized over the relay connection, which


We need to cover the stream migration procedure in this spec.

We don't have any yet...

Yes, it needs to be specified, otherwise this whole thing is incomplete. We can incubate it in this spec and then spin it off.

We don't necessarily need any stream migration for the protocol to work.
We can simply open all new streams in the direct connection and garbage collect the relay connection when it no longer has any streams.
Or we can just close it after a grace period and force new streams to be created in the direct connection.

We can simply open all new streams in the direct connection

This behaviour needs to be specified. Failing to specify the overall choreography makes this spec unactionable. "We now established a direct connection, now what?"

Maybe it's a naming thing. "Upgrade" implies the existing connection will evolve. If all we're intending to cover is the signalling and synchronisation, then this spec should be named accordingly.

Referencing stream migration protocol here #328.

relay/DCUtR.md

addressed.

albrow · 2019-05-29T21:53:43Z

relay/DCUtR.md

+The protocol starts with the completion of a relay connection from `A`
+to `B`.  Upon observing the new connection, the inbound peer (here `B`)
+checks the addresses advertised by `A` via identify. If that set
+includes public addresses, then `A` _may_ be reachable by a direct


Isn't it possible that A may also be directly reachable at a private address if A and B are on the same local network?

Yes it is possible, but that would have been dialed directly as the private addresses are still advertised with relay addresses.

I think @albrow has a point. @vyzo: while that should be the case, if we want to be resilient and robust, this protocol should not make assumptions about how any other part of the system behaves. Usually those implicit assumptions make systems brittle.

Luckily our spec lifecycle process allows us to add this topic as an active discussion:

To facilitate open progress tracking and observability, as the Working Draft
evolves, the author(s) SHOULD assemble a checklist of items that are pending
specification, explicitly stating which items are compulsory for promoting the
spec to a Candidate Recommendation.

from: https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md

Not making this assumption will make us dial private addresses in vain multiple times.
We already have a problem with that.

At best, we can consider dialing them in the bidirectional part of the protocol.

Also, if A is public and B is private, we can't possibly be behind the same NAT.

Furthermore, for the bidirectional part of the protocol we could check the public address of the other node. If that doesn't match our own, we can't possibly be behind the same NAT and dialing private addrs is pointless.

It would be nice to avoid dialing private addrs if we can avoid it though. Perhaps we could still exchange them, but in a separate field. Then they can be ignored unless your public address matches the other node and you infer that you're behind the same NAT. Or your implementation may be able to always ignore them, since they would have been dialed previously.

Anyway, I agree that we could punt on this for this round and discuss when we promote to candidate rec.

raulk · 2019-05-30T12:04:41Z

@vyzo

@albrow and the 0x guys pointed us to the Trickle ICE spec, which seems like a relevant background read for us. I do expect our specs to be influenced by ICE – as a real-life, successful technology for coordinating hole punching between any two peers.

We should aim to reference ICE WG material to back up our ideas and routines.

vyzo · 2019-05-30T12:22:28Z

@raulk there is a reference to the ICE RFC already.

raulk · 2019-05-30T13:14:16Z

@vyzo great, I missed that. What are the parallels between the algo we propose and the ICE procedure?

vyzo · 2019-05-30T14:17:58Z

What are the parallels between the algo we propose and the ICE procedure?

It's like ICE without a signalling server, and distributed STUN - we rely on public peers to tell us our observed addresses instead of using STUN servers.
Also note that ICE mainly caters to UDP, while we very much care for TCP.

yusefnapora · 2019-05-30T14:22:57Z

I think the Trickle ICE spec @raulk linked to is an iteration on ICE that incrementally exchanges candidates instead of sending them all at once. Apparently this lets you start testing connectivity sooner

dryajov · 2019-12-17T17:50:42Z

related previous discussion - #64

relay/DCUtR.md

mxinden

I am proposing to merge this specification in its current state as a Working Draft.

Reviews welcome!

Below I am highlighting the two most notable recent changes.

mxinden · 2021-08-17T16:41:51Z

relay/DCUtR.md

+If the unilateral connection upgrade attempt fails or if `A` is itself a NATed
+peer that doesn't advertise public address, then `B` initiates the direct
+connection upgrade protocol as follows:
+1. `B` opens a stream to `A` using the `/libp2p/dcutr` protocol.


Note the protocol name /libp2p/dcutr.

mxinden · 2021-08-17T16:42:23Z

relay/DCUtR.md

+6. On failure go back to step (2), reusing the same stream opened in (1).
+   Inbound peers (here `B`) SHOULD retry twice (thus a total of 3 attempts)
+   before considering the upgrade as failed.


Note the added retry logic. Also see FAQ further below for additional reasoning.

That change makes a lot of sense to me.

marten-seemann · 2021-08-20T12:19:13Z

relay/DCUtR.md

+relies on the two peers synchronizing and simultaneously opening
+connections to each other to their predicted external address. It
+works well for UDP, with an estimated 80% success rate, and reasonably
+well for TCP, with an estimated 60% success rate.


Didn't we see much better numbers than this?

I think they have been in the same ballpark, but I might as well be mistaken. Unfortunately I am unable to access the data from project flare phase 1. Either the data or my access seems to be removed.

@vyzo do you know more here?

Discussion continued on #173 (comment).

relay/DCUtR.md

marten-seemann · 2021-08-21T09:46:50Z

relay/DCUtR.md

+6. On failure go back to step (2), reusing the same stream opened in (1).
+   Inbound peers (here `B`) SHOULD retry twice (thus a total of 3 attempts)
+   before considering the upgrade as failed.


That change makes a lot of sense to me.

relay/DCUtR.md

Co-authored-by: Marten Seemann <martenseemann@gmail.com>

relay/DCUtR.md

vyzo · 2021-08-23T11:49:49Z

We had much better experimental success rate, 90% for UDP/QUIC. Maybe NATs have gotten better behaved in the 10-15 years since the literature, or at least our sample had better behaved ones.

…

On Mon, Aug 23, 2021, 14:17 Max Inden ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In relay/DCUtR.md <#173 (comment)>: > + Packets should be sent in random intervals between 10 and 200 ms to each + address. Done via 6f558f1 <6f558f1> . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#173 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAI4SQL56BDHB6WDFOSCWDT6IU6NANCNFSM4HQM3A6Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

mxinden · 2021-08-23T11:59:12Z

Thanks @vyzo.

cab60cc removes the concrete (outdated) success rate statements. My reasoning for removing them is the following: The results of Project Flare Phase 1 were convincing enough that we consider Project Flare worth finishing. We can only measure the real success rates once the protocols are widely deployed. Instead of outdated numbers in this spec from early on, I am in favor of removing them, maybe bringing them back once we have more data.

relay/DCUtR.md

Co-authored-by: Marten Seemann <martenseemann@gmail.com>

mxinden · 2021-08-23T14:44:20Z

Thanks to the many people involved here! 🙏

wngr · 2021-09-26T17:14:51Z

relay/DCUtR.md

+5. Simultaneous Connect. The two nodes follow the steps below in parallel for
+   every address obtained from the `Connect` message:
+   - For a TCP address:
+      - Upon receiving the `Sync`, `A` immediately dials the address to `B`.
+      - Upon expiry of the timer, `B` dials the address to `A`.
+      - This will result in a TCP Simultaneous Connect. For the purpose of all
+        protocols run on top of this TCP connection, `A` is assumed to be the
+        client and `B` the server.
+   - For a QUIC address:
+      - Upon receiving the `Sync`, `A` immediately dials the address to `B`.
+      - Upon expiry of the timer, `B` starts to send UDP packets filled with
+        random bytes to `A`'s address. Packets should be sent repeatedly in
+        random intervals between 10 and 200 ms.
+      - This will result in a QUIC connection where `A` is the client and `B` is
+        the server.


From what I see, this whole mechanism would also fit nicely upgrading the relay connection to a direct WebRTC connection, if the peers would be allowed to exchange their SDP data here.
Would you be open in amending the spec?
(cc @mxinden)

Yes, good point. We had this in mind, but as you said, it isn't mentioned anywhere. Given that the protocol uses protocol buffers, we could easily extend the messages to include additional data such as SDP payloads, or derive an SDP payload based on the information exchanged through the protocol.

Unfortunately there is no uniform way of speaking WebRTC across the many libp2p libraries (yet). In addition there is no specification yet (see #220 and #159). This is not to say that the project is not interested in adding WebRTC support in the future. Quite the opposite (see https://github.com/libp2p/specs/blob/master/connections/hole-punching.md and https://github.com/libp2p/specs/blob/master/ROADMAP.md#-unprecedented-global-connectivity).

With the above in mind, I am not sure whether it makes much sense to extend this paragraph with a section on WebRTC quite yet.

@wngr what do you think?

I think DCUtR would be a great way to add support for upgrading relayed connections to a direct WebRTC connection -- this just feels like the right abstraction, and the the alternative proposals so far appear inferior. Now I acknowledge that the big downside of this approach is that this requires a valid TLS certificate for the peer offering a WS endpoint, but I think that is a pill that can be swallowed, but that's orthogonal to the relayed connection upgrade.
In other words, I think DCUtR is the right way to add support for upgrades to WebRTC (or allow exchanging arbitrary payloads here?), and I don't want to let the current opportunity window slide ;-).

(By the way, I hacked on an experimental webrtc transport for rust-libp2p which supports both browser apis (through wasm) and native; signalling is currently done via p2p-webrtc-star.)

In other words, I think DCUtR is the right way to add support for upgrades to WebRTC (or allow exchanging arbitrary payloads here?), and I don't want to let the current opportunity window slide ;-).

👍

(By the way, I hacked on an experimental webrtc transport for rust-libp2p which supports both browser apis (through wasm) and native; signalling is currently done via p2p-webrtc-star.)

🚀 that is great to hear. Mind opening a work-in-progress pull request on rust-libp2p @wngr?

My current WIP is at https://github.com/wngr/libp2p-webrtc; however I really want to replace the WS signalling server with a libp2p relay node; this is why I started adding my own custom (behaviour, transport) tuple on top of rust-libp2p, which very much is similar to dcutr on a higher level.
What's the state of your dcutr branch? Maybe it makes more sense to prototype it ontop of that?

What's the state of your dcutr branch? Maybe it makes more sense to prototype it ontop of that?

You could leverage both libp2p/rust-libp2p#2059 and libp2p/rust-libp2p#2076. In case my understanding of WebRTC and SDP is correct, it solely needs to exchange a payload. If so (at least for now) you could just extend the Protobuf definition of the DCUTR protocol by a single field for that payload.

Happy to talk through this in person if that is preferred. Feel free to reach out via mail @wngr.

initial DCUtR draft

727f8b1

vyzo requested review from Stebalien, raulk and whyrusleeping May 29, 2019 13:34

raulk reviewed May 29, 2019

View reviewed changes

raulk mentioned this pull request May 29, 2019

Add WebRTC Signaling Protocol Spec #159

Closed

raulk previously requested changes May 29, 2019

View reviewed changes

relay/DCUtR.md Outdated Show resolved Hide resolved

relay/DCUtR.md Show resolved Hide resolved

relay/DCUtR.md Outdated Show resolved Hide resolved

vyzo added 2 commits May 29, 2019 21:08

add paragraph about stream migration

9db77f0

add boilerplate

fee2b99

vyzo requested a review from raulk May 29, 2019 18:13

fix formatting.

97e5d61

albrow reviewed May 29, 2019

View reviewed changes

vyzo mentioned this pull request May 30, 2019

Support Direct Connection Upgrade through Relay libp2p/go-libp2p#651

Closed

3 tasks

Demi-Marie mentioned this pull request Jun 5, 2019

Make NAT traversal more user-friendly paritytech/polkadot-sdk#567

Closed

Stebalien mentioned this pull request Jan 21, 2020

Content Filter Plugin ipfs/kubo#6842

Closed

6 tasks

Stebalien mentioned this pull request Apr 1, 2020

[RFC] Changing the default ports ipfs/kubo#7053

Closed

This was referenced Nov 4, 2020

NAT traversal: Implement connection upgrade protocol for NAT hole punching using Relays libp2p/go-libp2p#1013

Closed

NAT traversal: QUIC Hole Punching libp2p/go-libp2p#1015

Closed

NAT traversal: Support for WebRTC transport in libp2p libp2p/go-libp2p#1018

Closed

aarshkshah1992 reviewed Nov 5, 2020

View reviewed changes

relay/DCUtR.md Outdated Show resolved Hide resolved

aarshkshah1992 reviewed Jan 8, 2021

View reviewed changes

relay/DCUtR.md Outdated Show resolved Hide resolved

vasco-santos mentioned this pull request Jan 13, 2021

Question: Browser to browser direct libp2p/js-libp2p-webrtc-direct#98

Closed

aarshkshah1992 mentioned this pull request Jan 14, 2021

Support for Hole punching libp2p/go-libp2p-swarm#233

Merged

1 task

mxinden changed the title ~~RFC: Direct Connection Upgrade through Relay~~ relay/DCUtR: Add specification for Direct Connection Upgrade through Relay protocol Aug 17, 2021

mxinden added 6 commits August 17, 2021 14:26

relay/DCUtR: Wrap at 80 chars

9958df2

relay/DCUtR: Add mxinden to interest group

4b7c1ce

relay/DCUtR: Document retry logic

fe64a21

relay/DCUtR: Remove implementation specific event emission

db9475e

relay/DCUtR: Remove note on obs address sending

2d8b38f

relay/DCUtR: Update date

6530d45

mxinden reviewed Aug 17, 2021

View reviewed changes

mxinden changed the title ~~relay/DCUtR: Add specification for Direct Connection Upgrade through Relay protocol~~ relay/DCUtR: Add Direct Connection Upgrade through Relay protocol Aug 19, 2021

marten-seemann reviewed Aug 21, 2021

View reviewed changes

relay/DCUtR.md Outdated Show resolved Hide resolved

marten-seemann reviewed Aug 22, 2021

View reviewed changes

relay/DCUtR.md Show resolved Hide resolved

mxinden and others added 5 commits August 22, 2021 21:55

relay/DCUtR: Add Marten to interest group

0076c69

Co-authored-by: Marten Seemann <martenseemann@gmail.com>

relay/DCUtR: Assign roles and describe hole punching on QUIC (#361)

b420064

relay/DCUtR: Stress that one should connect to all addresses

af0b9bb

relay/DCUtR: Fix typo

6f475de

relay/DCUtR: Mention addressing specification for ObsAddrs field

17f6275

mxinden reviewed Aug 23, 2021

View reviewed changes

relay/DCUtR.md Outdated Show resolved Hide resolved

mxinden added 4 commits August 23, 2021 12:02

relay/DCUtR: Do not reuse same stream on retry

5943d3b

relay/DCUtR: Reword steps for each address

6f558f1

relay/DCUtR: Detail on success case

f7b43df

relay/DCUtR: Inline Sync reasoning

85f567d

relay/DCUtR: Remove concrete success rates

cab60cc

marten-seemann approved these changes Aug 23, 2021

View reviewed changes

relay/DCUtR.md Outdated Show resolved Hide resolved

relay/DCUtR.md Outdated Show resolved Hide resolved

relay/DCUtR.md Outdated Show resolved Hide resolved

relay/DCUtR: Reword

8001cd9

Co-authored-by: Marten Seemann <martenseemann@gmail.com>

mxinden merged commit 689e5cb into master Aug 23, 2021

wngr reviewed Sep 26, 2021

View reviewed changes

relay/DCUtR: Add Direct Connection Upgrade through Relay protocol #173

relay/DCUtR: Add Direct Connection Upgrade through Relay protocol #173

Conversation

vyzo commented May 29, 2019 • edited by mxinden Loading

raulk left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo May 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulk commented May 30, 2019

vyzo commented May 30, 2019

raulk commented May 30, 2019

vyzo commented May 30, 2019

yusefnapora commented May 30, 2019

dryajov commented Dec 17, 2019

mxinden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo commented Aug 23, 2021 via email

mxinden commented Aug 23, 2021 • edited Loading

mxinden commented Aug 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wngr Sep 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo commented May 29, 2019 •

edited by mxinden

Loading

raulk left a comment •

edited

Loading

vyzo May 29, 2019 •

edited

Loading

mxinden commented Aug 23, 2021 •

edited

Loading

wngr Sep 27, 2021 •

edited

Loading