Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: introduce a unified Transport::dial interface #4226

Closed
Tracked by #4524
thomaseizinger opened this issue Jul 20, 2023 · 13 comments · Fixed by #4568
Closed
Tracked by #4524

core: introduce a unified Transport::dial interface #4226

thomaseizinger opened this issue Jul 20, 2023 · 13 comments · Fixed by #4568

Comments

@thomaseizinger
Copy link
Contributor

This issue describes a design proposal on how we can modify the Transport trait to accommodate for the needs of AutoNATv2 and at the same time, can simplify address translation.

In a nutshell, we need a way to control the port for outgoing connections.

Context

Socket implementations for TCP and UDP - on some operating systems - support a thing called "port reuse". That is, instead of allocating a new port for an outgoing connection, they can create a socket on a port that is already in use for listening. This allows for techniques like hole punching to work: NAT devices will allow for ingress traffic from an IP that we have previously sent a TCP packet too.

In AutoNATv2, we want to explicitly avoid this. To test whether an address is publicly reachable we need to avoid accidental hole-punching. By connecting to an AutoNAT server, we might establish a mapping for ingress traffic in a NAT device and for that particular peer, we appear to be reachable when in reality, the port is not forwarded and a connection attempt by any other peer will fail.

To support AutoNATv2 as a NetworkBehaviour, it needs to create a connection to the autonat server where we explicitly want a new port. Currently, we have the dial and dial_as_listener functions. We could now add a dial_with_new_port function but this doesn't seem to scale particularly well.

In addition, there is also issue #3953 about address translation. For address translation to work correctly, we need to know whether an outgoing connection created an ephemeral port or is reusing an existing one.

The proposal

We can solve both of these problems by merging the two existing dial functions on Transport and instead expose a single dial that supports two configuration options:

  • Whether to reuse and existing port / use an ephemeral one (The two are equivalent and the opposite of each other, not sure which one is best to model)
  • Which role to final connection should assume

The default for these options would be:

  • Reuse an existing port
  • Be the dialer on the final connection

The exciting thing is that we can use this model to solve several problems:

  • The dcutr behaviour, which is responsible for orchestrating the hole punch, can specify that the outgoing connection should reuse the listening port and/or which role to assume.
  • The autonat behaviour can explicitly request to use an ephemeral port for the connection it establishes to the autonat server
  • Currently, users have to know that they need to activate port reuse for the TCP transport in order for hole punching to work. With the above design, this requirement just goes away, making things easier to setup.
  • It should build the foundation for removing address_translation from Transport because the Swarm will know, which connections were established with a resused port and can thus perform the mapping correctly.
  • The "unclean" design of having multiple dial functions on Transport

Further considerations

With the reuse of an existing port as the default, new connections will fail if we don't have a listening port yet. We don't want outgoing connections to fail by default, meaning the transports should probably fall back to creating a new socket just for dialing in that case. This however means that each new connection should expose, whether it is using an ephemeral or an existing port and the "config" passed into Transport::dial is a mere "request" to the transport that might not get fulfilled.

This would allow Swarm to correctly identify, whether a connection did end up reusing a port and can thus correctly implement address_translation which is required for identify to report the correct observed address.

@dariusc93
Copy link
Contributor

Havent had the chance to look over AutoNATv2 spec so i cannot comment to much on that part yet, but I do like the idea of merging both functions together and simplifying it here in Transport

@mxinden
Copy link
Member

mxinden commented Aug 2, 2023

To support AutoNATv2 as a NetworkBehaviour, it needs to create a connection to the autonat server where we explicitly want a new port.

The probing connection from the AutoNAT server to the local node needs to use a new port.

@mxinden
Copy link
Member

mxinden commented Aug 2, 2023

Currently, users have to know that they need to activate port reuse for the TCP transport in order for hole punching to work. With the above design, this requirement just goes away, making things easier to setup.

Indeed, this is a nice benefit!

With the reuse of an existing port as the default, new connections will fail if we don't have a listening port yet. We don't want outgoing connections to fail by default, meaning the transports should probably fall back to creating a new socket just for dialing in that case. This however means that each new connection should expose, whether it is using an ephemeral or an existing port and the "config" passed into Transport::dial is a mere "request" to the transport that might not get fulfilled.

Note that libp2p-quic already creates a new socket for a dial in case there is no listening socket yet.

I am not sure port-reuse should be a best-effort request. E.g. in the case of libp2p-autonat no-port-reuse is an absolute requirement and in the case of libp2p-dcutr port-reuse is an absolute requirement. Falling back to one in case the other is not available doesn't make sense in these two protocols.

This would allow Swarm to correctly identify, whether a connection did end up reusing a port and can thus correctly implement address_translation which is required for identify to report the correct observed address.

I am fine with exploring the removal of address_translation from Transport moving it into Swarm. That said, especially since it is a refactoring only, I would prefer this to happen in a consecutive step, to decrease the scope of the core work here, namely to allow the configuration of port-reuse per connection.

Will remove the decision-pending label as I think this issue is actionable as is. Thank you @thomaseizinger for writing this down!

@mxinden mxinden removed the decision-pending Marks issues where a decision is pending before we can move forward. label Aug 2, 2023
@thomaseizinger
Copy link
Contributor Author

I am not sure port-reuse should be a best-effort request. E.g. in the case of libp2p-autonat no-port-reuse is an absolute requirement and in the case of libp2p-dcutr port-reuse is an absolute requirement. Falling back to one in case the other is not available doesn't make sense in these two protocols.

Creating a new port binding will always work so the AutoNAT case is handled. I agree with you that making it best-effort is probably not good and only complicates the interfaces.

In the case of dcutr, we should probably check whether we actually have a listening interface and if we don't then we shouldn't even attempt to hole punch! That will solve this problem.

libp2p/specs#562 has some interesting conversation around this problem too. go and nim both don't treat it as a connection-level setting but rather have an algorithm on how they decide what to do for new connections.

I think we can copy that reasonably well with this approach. We can't access the transport directly but making it a connection-level setting allows us to pick the default in the Swarm in a smart way: If we don't have an external address, we reuse port bindings by default. If we have an external address, we create new port bindings by default.

Thoughts?

@mxinden
Copy link
Member

mxinden commented Aug 20, 2023

If we have an external address, we create new port bindings by default.

An AutoNAT client needs to suggest external address candidates to the AutoNAT server in order for the server to dial the client on those candidates. Thus far we plan to collect the candidates via libp2p-identify.

With the above strategy, namely to use a new port binding when one has an external address, libp2p-identify would no longer produce valid candidates. Thus once one has discovered one external address, one will not discover new external address candidates.

As a result, unless I am missing something, I suggest to always reuse port bindings default.

@thomaseizinger
Copy link
Contributor Author

Thus once one has discovered one external address, one will not discover new external address candidates.

That is only if you don't establish connections where you override the port binding strategy. We can only reuse a port binding if we have a listen address, thus the default would be even more nuanced: Always reuse port bindings unless we don't have a listen address, then the default should be to create a new one. (The Swarm can track whether we have a listen address).

We could also be even smarter and track, how many connections we have to that peer already. If the user explicitly requests a new connection (DialCondition::Always) but doesn't specify a port binding strategy, then we could switch to a new port binding in case we already have a connection.

How about the following:

  • Swarm tracks how many listen addresses we have
  • If we have 0 listen addresses, we allocate a new port by default
  • If we have at least one listen address, we reuse an existing port by default
  • As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

@mxinden
Copy link
Member

mxinden commented Sep 1, 2023

If we have 0 listen addresses, we allocate a new port by default

👍 if I am not mistaken that is the status quo in libp2p-tcp and libp2p-quic.

If we have at least one listen address, we reuse an existing port by default

👍 status quo on libp2p-quic. And status quo on libp2p-tcp (when port-reuse enabled).

As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

Works for me, though as you say, "in the future", i.e. I don't think this should be a priority.

@thomaseizinger
Copy link
Contributor Author

As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

Works for me, though as you say, "in the future", i.e. I don't think this should be a priority.

Do we currently fail to create a new connection when port-reuse is enabled and we already have one? If yes, then I am happy to keep that behaviour in the first iteration.

@mxinden
Copy link
Member

mxinden commented Sep 4, 2023

Do we currently fail to create a new connection when port-reuse is enabled and we already have one?

  • QUIC: Not needed. I.e. given that QUIC has connection IDs it can run multiple connections on the same 5-tuple.
  • TCP: We don't fail within rust-libp2p. Instead the OS returns an error when trying to establish a new connection with the same 5-tuple as an existing connection.

@umgefahren
Copy link
Contributor

umgefahren commented Sep 27, 2023

How about the following:

  • Swarm tracks how many listen addresses we have
  • If we have 0 listen addresses, we allocate a new port by default
  • If we have at least one listen address, we reuse an existing port by default
  • As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

So the desired new interface for Transport has a configuration option, that has two options:

  • Reuse port / create a new one / smart behavior (on which we default and which you describe in the quote)
  • Dialer / Listener (defaulting to dialer)

And just to make sure I understand everything correctly:
After designing this, I need to implement a way that a Behavior can pass these options to the transport.

@mxinden
Copy link
Member

mxinden commented Sep 27, 2023

Exactly right @umgefahren.

After designing this, I need to implement a way that a Behavior can pass these options to the transport.

You can extend DialOpts.

/// Options to configure a dial to a known or unknown peer.
///
/// Used in [`Swarm::dial`](crate::Swarm::dial) and
/// [`ToSwarm::Dial`](crate::behaviour::ToSwarm::Dial).
///
/// To construct use either of:
///
/// - [`DialOpts::peer_id`] dialing a known peer
///
/// - [`DialOpts::unknown_peer_id`] dialing an unknown peer
#[derive(Debug)]
pub struct DialOpts {
peer_id: Option<PeerId>,
condition: PeerCondition,
addresses: Vec<Multiaddr>,
extend_addresses_through_behaviour: bool,
role_override: Endpoint,
dial_concurrency_factor_override: Option<NonZeroU8>,
connection_id: ConnectionId,
}

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Sep 27, 2023

How about the following:

  • Swarm tracks how many listen addresses we have
  • If we have 0 listen addresses, we allocate a new port by default
  • If we have at least one listen address, we reuse an existing port by default
  • As an optimisation for the future: If we already have a connection for this 5 tuple, we allocate a new port by default

So the desired new interface for Transport has a configuration option, that has two options:

* Reuse port / create a new one / smart behavior (on which we default and which you describe in the quote)

I would have made the transports "dumb" and only pass two options on:

  • create new port
  • use existing port

The smart behaviour would live in libp2p-swarm based on how many listen addresses we have. We might have to differentiate by transport here (tcp vs udp vs webrtc).

Uplifting this behaviour to the swarm will allow us to do #3953. For that, we need to know whether a given connection allocated a new port or not.

I don't want to unnecessarily blow up the scope of this but we should work towards the correct API in my opinion.

@umgefahren
Copy link
Contributor

I have written up a first implementation of the new design. Please give feedback and I would propose continuing the design discussion on #4568.

@mergify mergify bot closed this as completed in #4568 Aug 3, 2024
mergify bot pushed a commit that referenced this issue Aug 3, 2024
Resolves: #4226.
Resolves: #3953.
Resolves: #3889.

Pull-Request: #4568.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants