core: introduce a unified `Transport::dial` interface #4226

thomaseizinger · 2023-07-20T10:58:41Z

This issue describes a design proposal on how we can modify the Transport trait to accommodate for the needs of AutoNATv2 and at the same time, can simplify address translation.

In a nutshell, we need a way to control the port for outgoing connections.

Context

Socket implementations for TCP and UDP - on some operating systems - support a thing called "port reuse". That is, instead of allocating a new port for an outgoing connection, they can create a socket on a port that is already in use for listening. This allows for techniques like hole punching to work: NAT devices will allow for ingress traffic from an IP that we have previously sent a TCP packet too.

In AutoNATv2, we want to explicitly avoid this. To test whether an address is publicly reachable we need to avoid accidental hole-punching. By connecting to an AutoNAT server, we might establish a mapping for ingress traffic in a NAT device and for that particular peer, we appear to be reachable when in reality, the port is not forwarded and a connection attempt by any other peer will fail.

To support AutoNATv2 as a NetworkBehaviour, it needs to create a connection to the autonat server where we explicitly want a new port. Currently, we have the dial and dial_as_listener functions. We could now add a dial_with_new_port function but this doesn't seem to scale particularly well.

In addition, there is also issue #3953 about address translation. For address translation to work correctly, we need to know whether an outgoing connection created an ephemeral port or is reusing an existing one.

The proposal

We can solve both of these problems by merging the two existing dial functions on Transport and instead expose a single dial that supports two configuration options:

Whether to reuse and existing port / use an ephemeral one (The two are equivalent and the opposite of each other, not sure which one is best to model)
Which role to final connection should assume

The default for these options would be:

Reuse an existing port
Be the dialer on the final connection

The exciting thing is that we can use this model to solve several problems:

The dcutr behaviour, which is responsible for orchestrating the hole punch, can specify that the outgoing connection should reuse the listening port and/or which role to assume.
The autonat behaviour can explicitly request to use an ephemeral port for the connection it establishes to the autonat server
Currently, users have to know that they need to activate port reuse for the TCP transport in order for hole punching to work. With the above design, this requirement just goes away, making things easier to setup.
It should build the foundation for removing address_translation from Transport because the Swarm will know, which connections were established with a resused port and can thus perform the mapping correctly.
The "unclean" design of having multiple dial functions on Transport

Further considerations

With the reuse of an existing port as the default, new connections will fail if we don't have a listening port yet. We don't want outgoing connections to fail by default, meaning the transports should probably fall back to creating a new socket just for dialing in that case. This however means that each new connection should expose, whether it is using an ephemeral or an existing port and the "config" passed into Transport::dial is a mere "request" to the transport that might not get fulfilled.

This would allow Swarm to correctly identify, whether a connection did end up reusing a port and can thus correctly implement address_translation which is required for identify to report the correct observed address.

The text was updated successfully, but these errors were encountered:

dariusc93 · 2023-07-24T21:11:06Z

Havent had the chance to look over AutoNATv2 spec so i cannot comment to much on that part yet, but I do like the idea of merging both functions together and simplifying it here in Transport

mxinden · 2023-08-02T12:38:51Z

To support AutoNATv2 as a NetworkBehaviour, it needs to create a connection to the autonat server where we explicitly want a new port.

The probing connection from the AutoNAT server to the local node needs to use a new port.

mxinden · 2023-08-02T12:45:29Z

Currently, users have to know that they need to activate port reuse for the TCP transport in order for hole punching to work. With the above design, this requirement just goes away, making things easier to setup.

Indeed, this is a nice benefit!

With the reuse of an existing port as the default, new connections will fail if we don't have a listening port yet. We don't want outgoing connections to fail by default, meaning the transports should probably fall back to creating a new socket just for dialing in that case. This however means that each new connection should expose, whether it is using an ephemeral or an existing port and the "config" passed into Transport::dial is a mere "request" to the transport that might not get fulfilled.

Note that libp2p-quic already creates a new socket for a dial in case there is no listening socket yet.

I am not sure port-reuse should be a best-effort request. E.g. in the case of libp2p-autonat no-port-reuse is an absolute requirement and in the case of libp2p-dcutr port-reuse is an absolute requirement. Falling back to one in case the other is not available doesn't make sense in these two protocols.

This would allow Swarm to correctly identify, whether a connection did end up reusing a port and can thus correctly implement address_translation which is required for identify to report the correct observed address.

I am fine with exploring the removal of address_translation from Transport moving it into Swarm. That said, especially since it is a refactoring only, I would prefer this to happen in a consecutive step, to decrease the scope of the core work here, namely to allow the configuration of port-reuse per connection.

Will remove the decision-pending label as I think this issue is actionable as is. Thank you @thomaseizinger for writing this down!

thomaseizinger · 2023-08-02T13:23:09Z

I am not sure port-reuse should be a best-effort request. E.g. in the case of libp2p-autonat no-port-reuse is an absolute requirement and in the case of libp2p-dcutr port-reuse is an absolute requirement. Falling back to one in case the other is not available doesn't make sense in these two protocols.

Creating a new port binding will always work so the AutoNAT case is handled. I agree with you that making it best-effort is probably not good and only complicates the interfaces.

In the case of dcutr, we should probably check whether we actually have a listening interface and if we don't then we shouldn't even attempt to hole punch! That will solve this problem.

libp2p/specs#562 has some interesting conversation around this problem too. go and nim both don't treat it as a connection-level setting but rather have an algorithm on how they decide what to do for new connections.

I think we can copy that reasonably well with this approach. We can't access the transport directly but making it a connection-level setting allows us to pick the default in the Swarm in a smart way: If we don't have an external address, we reuse port bindings by default. If we have an external address, we create new port bindings by default.

Thoughts?

mxinden · 2023-08-20T10:11:01Z

If we have an external address, we create new port bindings by default.

An AutoNAT client needs to suggest external address candidates to the AutoNAT server in order for the server to dial the client on those candidates. Thus far we plan to collect the candidates via libp2p-identify.

With the above strategy, namely to use a new port binding when one has an external address, libp2p-identify would no longer produce valid candidates. Thus once one has discovered one external address, one will not discover new external address candidates.

As a result, unless I am missing something, I suggest to always reuse port bindings default.

thomaseizinger · 2023-08-21T09:48:55Z

Thus once one has discovered one external address, one will not discover new external address candidates.

That is only if you don't establish connections where you override the port binding strategy. We can only reuse a port binding if we have a listen address, thus the default would be even more nuanced: Always reuse port bindings unless we don't have a listen address, then the default should be to create a new one. (The Swarm can track whether we have a listen address).

We could also be even smarter and track, how many connections we have to that peer already. If the user explicitly requests a new connection (DialCondition::Always) but doesn't specify a port binding strategy, then we could switch to a new port binding in case we already have a connection.

How about the following:

Swarm tracks how many listen addresses we have
If we have 0 listen addresses, we allocate a new port by default
If we have at least one listen address, we reuse an existing port by default
As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

mxinden · 2023-09-01T15:25:04Z

If we have 0 listen addresses, we allocate a new port by default

👍 if I am not mistaken that is the status quo in libp2p-tcp and libp2p-quic.

If we have at least one listen address, we reuse an existing port by default

👍 status quo on libp2p-quic. And status quo on libp2p-tcp (when port-reuse enabled).

As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

Works for me, though as you say, "in the future", i.e. I don't think this should be a priority.

thomaseizinger · 2023-09-03T08:45:44Z

As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

Works for me, though as you say, "in the future", i.e. I don't think this should be a priority.

Do we currently fail to create a new connection when port-reuse is enabled and we already have one? If yes, then I am happy to keep that behaviour in the first iteration.

mxinden · 2023-09-04T19:34:26Z

Do we currently fail to create a new connection when port-reuse is enabled and we already have one?

QUIC: Not needed. I.e. given that QUIC has connection IDs it can run multiple connections on the same 5-tuple.
TCP: We don't fail within rust-libp2p. Instead the OS returns an error when trying to establish a new connection with the same 5-tuple as an existing connection.

umgefahren · 2023-09-27T11:36:03Z

How about the following:

Swarm tracks how many listen addresses we have

If we have 0 listen addresses, we allocate a new port by default

If we have at least one listen address, we reuse an existing port by default

As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

So the desired new interface for Transport has a configuration option, that has two options:

Reuse port / create a new one / smart behavior (on which we default and which you describe in the quote)
Dialer / Listener (defaulting to dialer)

And just to make sure I understand everything correctly:
After designing this, I need to implement a way that a Behavior can pass these options to the transport.

mxinden · 2023-09-27T19:25:09Z

Exactly right @umgefahren.

After designing this, I need to implement a way that a Behavior can pass these options to the transport.

You can extend DialOpts.

rust-libp2p/swarm/src/dial_opts.rs

Lines 29 to 48 in 91fb6a1

    
           /// Options to configure a dial to a known or unknown peer. 
        
           /// 
        
           /// Used in [`Swarm::dial`](crate::Swarm::dial) and 
        
           /// [`ToSwarm::Dial`](crate::behaviour::ToSwarm::Dial). 
        
           /// 
        
           /// To construct use either of: 
        
           /// 
        
           /// - [`DialOpts::peer_id`] dialing a known peer 
        
           /// 
        
           /// - [`DialOpts::unknown_peer_id`] dialing an unknown peer 
        
           #[derive(Debug)] 
        
           pub struct DialOpts { 
        
               peer_id: Option<PeerId>, 
        
               condition: PeerCondition, 
        
               addresses: Vec<Multiaddr>, 
        
               extend_addresses_through_behaviour: bool, 
        
               role_override: Endpoint, 
        
               dial_concurrency_factor_override: Option<NonZeroU8>, 
        
               connection_id: ConnectionId, 
        
           }

thomaseizinger · 2023-09-27T20:17:01Z

How about the following:

Swarm tracks how many listen addresses we have

If we have 0 listen addresses, we allocate a new port by default

If we have at least one listen address, we reuse an existing port by default

As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

So the desired new interface for Transport has a configuration option, that has two options:
* Reuse port / create a new one / smart behavior (on which we default and which you describe in the quote)

I would have made the transports "dumb" and only pass two options on:

create new port
use existing port

The smart behaviour would live in libp2p-swarm based on how many listen addresses we have. We might have to differentiate by transport here (tcp vs udp vs webrtc).

Uplifting this behaviour to the swarm will allow us to do #3953. For that, we need to know whether a given connection allocated a new port or not.

I don't want to unnecessarily blow up the scope of this but we should work towards the correct API in my opinion.

umgefahren · 2023-09-27T21:23:05Z

I have written up a first implementation of the new design. Please give feedback and I would propose continuing the design discussion on #4568.

Resolves: #4226. Resolves: #3953. Resolves: #3889. Pull-Request: #4568.

thomaseizinger mentioned this issue Jul 20, 2023

Implementation considerations for dialing interface libp2p/specs#562

Closed

thomaseizinger added the decision-pending Marks issues where a decision is pending before we can move forward. label Jul 20, 2023

b-zee mentioned this issue Jul 21, 2023

feat(transport): Implement dial_with_new_port to prevent port reuse #3910

Closed

4 tasks

mxinden mentioned this issue Jul 31, 2023

AutoNATv2 implementation for rust-libp2p filecoin-project/devgrants#1421

Closed

mxinden removed the decision-pending Marks issues where a decision is pending before we can move forward. label Aug 2, 2023

mxinden mentioned this issue Aug 7, 2023

fix(quic): improve listener socket selection when dialing #4259

Open

mxinden mentioned this issue Aug 18, 2023

feat: add rust-libp2p-server to monorepo #4311

Merged

4 tasks

thomaseizinger mentioned this issue Aug 18, 2023

enhancement(Identify): add a behavior event that reports the list of locally supported protocols whenever that changes #4332

Closed

thomaseizinger mentioned this issue Sep 20, 2023

AutoNAT v2 #4524

Open

umgefahren mentioned this issue Sep 27, 2023

refactor(*): Transport redesign #4568

Merged

4 tasks

mergify bot closed this as completed in #4568 Aug 3, 2024

mergify bot pushed a commit that referenced this issue Aug 3, 2024

refactor(*): Transport redesign

079b2d6

Resolves: #4226. Resolves: #3953. Resolves: #3889. Pull-Request: #4568.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: introduce a unified `Transport::dial` interface #4226

core: introduce a unified `Transport::dial` interface #4226

thomaseizinger commented Jul 20, 2023

dariusc93 commented Jul 24, 2023

mxinden commented Aug 2, 2023

mxinden commented Aug 2, 2023

thomaseizinger commented Aug 2, 2023

mxinden commented Aug 20, 2023

thomaseizinger commented Aug 21, 2023

mxinden commented Sep 1, 2023

thomaseizinger commented Sep 3, 2023

mxinden commented Sep 4, 2023

umgefahren commented Sep 27, 2023 •

edited

Loading

mxinden commented Sep 27, 2023

thomaseizinger commented Sep 27, 2023 •

edited

Loading

umgefahren commented Sep 27, 2023

core: introduce a unified Transport::dial interface #4226

core: introduce a unified Transport::dial interface #4226

Comments

thomaseizinger commented Jul 20, 2023

Context

The proposal

Further considerations

dariusc93 commented Jul 24, 2023

mxinden commented Aug 2, 2023

mxinden commented Aug 2, 2023

thomaseizinger commented Aug 2, 2023

mxinden commented Aug 20, 2023

thomaseizinger commented Aug 21, 2023

mxinden commented Sep 1, 2023

thomaseizinger commented Sep 3, 2023

mxinden commented Sep 4, 2023

umgefahren commented Sep 27, 2023 • edited Loading

mxinden commented Sep 27, 2023

thomaseizinger commented Sep 27, 2023 • edited Loading

umgefahren commented Sep 27, 2023

core: introduce a unified `Transport::dial` interface #4226

core: introduce a unified `Transport::dial` interface #4226

umgefahren commented Sep 27, 2023 •

edited

Loading

thomaseizinger commented Sep 27, 2023 •

edited

Loading