feat(swarm)!: Allow `NetworkBehaviour`s to manage incoming connections #3099

thomaseizinger · 2022-11-09T02:11:01Z

Description

Previously, ConnectionHandler employed the prototype pattern via the IntoConnectionHandler abstraction. This allowed the Swarm to construct an instance of IntoConnectionHandler before the connection was established. This abstraction is however unnecessary. We can model all existing usecases by delaying the call to NetworkBehavour::new_handler until the connection is established. Not only does this delete a lot of code, it also makes several APIs simpler:

NetworkBehaviours can track state for future connections within themselves (indexed by PeerId) and pass it into the ConnectionHandler upon construction. This removes the need for passing along a handler in NetworkBehaviourAction::Dial.
Removing Handler from NetworkBehaviourAction::Dial also avoids the need to pass the handler back into the behaviour in inject_dial_failure.
By making NetworkBehaviour::new_handler fallible, NetworkBehaviours can implement almost arbitrary connection management policies by denying the construction of a ConnectionHandler for a newly established connection.

Resolves #2824.

Links to any relevant issues

Open Questions

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

protocols/dcutr/src/behaviour.rs

rkuhn · 2022-11-13T12:24:36Z

The answer also depends on what inject_dial_failure shall mean, and to whom. Currently it is called upon any failure for all composed behaviours, but that won’t work anymore if every behaviour then expects its own payload — the dial was caused by one behaviour, with only one specific payload. So the composed behaviour needs to dispatch the failure to only exactly that sub-behaviour which requested the dial. While this makes some amount of sense, it is also weird, because dial failures are not specific to a particular sub-behaviour, their scope is the target peer.

rkuhn · 2022-11-13T12:26:16Z

BTW: Swarm::dial could also be kept in this case, and it could carry its own payload that is returned to the swarm poller upon dial failure — and none of the composed behaviours are informed in this case.

thomaseizinger · 2022-11-13T21:05:40Z

The answer also depends on what inject_dial_failure shall mean, and to whom. Currently it is called upon any failure for all composed behaviours, but that won’t work anymore if every behaviour then expects its own payload — the dial was caused by one behaviour, with only one specific payload. So the composed behaviour needs to dispatch the failure to only exactly that sub-behaviour which requested the dial. While this makes some amount of sense, it is also weird, because dial failures are not specific to a particular sub-behaviour, their scope is the target peer.

That is right! This is more tricky than I thought. If I wanted to inject a DialPayload on every on_new_outbound call, I'd have to make that one up for all other behaviours that are composed because I need a handler for every behaviour.

rkuhn · 2022-11-14T08:12:51Z

My primary worry right now is not the structure of IntoConnectionHandler, so I don’t have good inputs there, but the proposed DialSource enum sounds very useful to me!

thomaseizinger · 2022-11-14T08:37:19Z

I just had another idea which I think is much better:

The reason for passing a handler along with Dial was to enable users to directly inject a message into the new connection.

Previously, that required really complex tracking of state in the behaviour because of the IntoConnectionHandler abstraction and the delayed access to PeerId.

If we remove this abstraction, it is really easy for NetworkBehaviours to track initial state for a new connection: Just make a HashMap indexed by PeerId and pick out the data whenever new_outbound_handler gets called!

That is functionally equivalent to what we have today and makes several APIs a lot simpler.

thomaseizinger · 2022-11-15T01:10:32Z

swarm/src/lib.rs

+        let supported_protocols = todo!();
+
+        // let supported_protocols = self
+        //     .behaviour
+        //     .new_handler()
+        //     .inbound_protocol()
+        //     .protocol_info()
+        //     .into_iter()
+        //     .map(|info| info.protocol_name().to_vec())
+        //     .collect();


This turns out to be difficult.

@mxinden I remember you saying that PollParameters was always something that didn't appeal to you. Did you ever think about alternative design? This might be a good time to bring them up :)

I am now initialising these to an empty vector and they are updated every time a new connection is made.

@mxinden I remember you saying that PollParameters was always something that didn't appeal to you.

Yes, very much. I would like to not have PollParameters.

I am now initialising these to an empty vector and they are updated every time a new connection is made.

In my eyes that is the better behavior.

@mxinden I remember you saying that PollParameters was always something that didn't appeal to you.

Yes, very much. I would like to not have PollParameters.

Tracking this here: #3124

thomaseizinger · 2022-11-15T08:19:31Z

swarm/src/behaviour.rs

    ///
-    /// # Example carrying state in the handler


Should we replace this example? Now that new_handler gives you a PeerId, I think it is quite easy to realize that you can store data for a future connection in the behaviour and just pass it into the handler here.

NetworkBehaviours can track state for future connections within themselves (indexed by PeerId) and pass it into the ConnectionHandler upon construction. This removes the need for passing along a handler in NetworkBehaviourAction::Dial.

Say there are two NetworkBehaviour implementations, each requesting a new connection to a new peer. In such case, when NetworkBehaviour::inject_connection_established is called, neither of them knows whether this new connection corresponds to their dial request.

We could still allow the user to provide user data via DialOpts. For the case of Swarm::dial and NetworkBehaviour::inject_dial_failure we could wrap this in an Option.

I am not sure whether we need to design for the above race condition. Your proposal might be just fine. What do folks think?

In ipfs-embed I validate advertised peer addresses by dialling them — and closing such connections upon success. This is only reliably possible if I can detect whether the connection resulted from my own dialling attempt. OTOH, another cause may have resulted in dialling that same peer with that same address for other reasons, which would then possibly have said “nah, attempt is already underway”. But all of this would happen in addition to existing connections, so it might not be a huge problem.

Do you need to actively close them? Assuming reasonable keep_alive implementations of your ConnectionHandlers, the connection should get closed automatically if it is not in use.

OTOH, another cause may have resulted in dialling that same peer with that same address for other reasons, which would then possibly have said “nah, attempt is already underway”.

Can this be expressed with PeerCondition::NotDialing?

Note that new_handler also gives you access to ConnectedPoint. So in addition to storing data in your behaviour based on PeerId, you could also index it by the dialed address. new_handler being called for an address you wanted to validate IS the validation that a connection was made to this address. If you have a dedicated AddressValidationBehaviour, that behaviour could then straight up deny that connection which instantly closes it again.

thomaseizinger · 2022-12-06T09:56:19Z

TODO:

Always box denied reason to avoid generating an enum
Figure out how to manage pending connections

…nied` reason" This reverts commit 19348e9.

thomaseizinger · 2022-12-08T04:39:09Z

Here is a proposal regarding pending connection management:

Remove addresses_of_peer function

Add:

fn new_outgoing_connection(Option<PeerId>, &[Multiaddr]) -> Result<Vec<Multiaddr>, Box<dyn std::error::Error + Send + 'static>>

fn new_incoming_connection(local_addr: Multiaddr, send_back_addr: Multiaddr) -> Result<(), Box<dyn std::error::Error + Send + 'static>>

The new_outgoing_connection replaces the addresses_of_peer function. It is given the PeerId - if known - and the addresses we already have that we will try to dial. It can decline the dial by returning an error or provide even more addresses to use.

The new_incoming_connection is given the local and remote's address. It doesn't return anything usually as there is no information needed to upgrade a connection.

I am reasonably happy with this design. We are only extending the NetworkBehaviour by one function, all three new_ functions follow a similar naming pattern and function signature.

Thoughts @mxinden?

mergify · 2022-12-09T14:46:15Z

This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏

elenaf9

Very much in favor of this! Did not do a full review yet.

elenaf9 · 2022-12-10T19:41:45Z

swarm/src/behaviour.rs

+pub struct ConnectionClosed<'a, Handler> {
    pub peer_id: PeerId,
    pub connection_id: ConnectionId,
    pub endpoint: &'a ConnectedPoint,
-    pub handler: <Handler as IntoConnectionHandler>::Handler,
+    pub handler: Handler,


Out of scope for this PR, but I do wonder if we really need the handler field on ConnectionClosed.
The handler property was added in #2191 together with adding it to Dial, DialFailure and ListenFailure. If I understand it correctly, the main motivation behind it was:

it allows one to attach state to a dial request, thus not having to keep track of it within a NetworkBehaviour implementation

Per #3099 (comment) attaching a state to a dial is not needed anymore. What's the use-case of still returning the handler in ConnectionClosed? As far as I can tell, none of our behaviours ever use it.

Motivation behind the question is that it would be great if we could get rid of the duplication between SwarmEvent and FromSwarm.
Until now the main difference was that FromSwarm also contains the handler, which should not be exposed in the SwarmEvent¹: #2423 (comment). If we could remove it here, we might be able to do something like:

enum SwarmEvent<TBehaviourOutEvent> { Behaviour(TBehaviourOutEvent), FromSwarm(FromSwarmEvent) }

Footnotes

There are also some other differences; FromSwarm contains ConnectionIds, and the NewExternalAddr and ExpiredExternalAddr variants. But I don't see see a drawback in exposing them to the user. ↩

Related: #3046

thomaseizinger · 2022-12-10T20:37:28Z

Any ideas on how to make this less breaking would be gladly appreciated! :)

thomaseizinger · 2022-12-10T20:41:57Z

Here is a proposal regarding pending connection management:
Remove addresses_of_peer function
Add:
fn new_outgoing_connection(Option<PeerId>, &[Multiaddr]) -> Result<Vec<Multiaddr>, Box<dyn std::error::Error + Send + 'static>>
fn new_incoming_connection(local_addr: Multiaddr, send_back_addr: Multiaddr) -> Result<(), Box<dyn std::error::Error + Send + 'static>>
The new_outgoing_connection replaces the addresses_of_peer function. It is given the PeerId - if known - and the addresses we already have that we will try to dial. It can decline the dial by returning an error or provide even more addresses to use.

The new_incoming_connection is given the local and remote's address. It doesn't return anything usually as there is no information needed to upgrade a connection.

I am reasonably happy with this design. We are only extending the NetworkBehaviour by one function, all three new_ functions follow a similar naming pattern and function signature.

Thoughts @mxinden?

One slight variation on this: new_handler could now be called new_established_connection to be more consistent with the two others. We could also split it into new_established_inbound_connection and new_established_outbound_connection. It is a bit odd at the moment that for pending connections, the function is split by inbound and outbound but for the established one it is not.

thomaseizinger · 2022-12-12T09:27:16Z

I think we can limit the breaking changes of this PR to removing the handler from Dial. Everything else should be possible to do in a non-breaking way.

mxinden · 2022-12-12T12:16:30Z

swarm/CHANGELOG.md

@@ -96,6 +97,10 @@

 - Update `rust-version` to reflect the actual MSRV: 1.62.0. See [PR 3090].

+- Remove `IntoConnectionHandler` abstraction and change the signature of `NetworkBehaviour::new_handler` to accept `PeerId` and `ConnectedPoint`.


What do you think of a small before and after code snippet to help folks upgrade?

mxinden · 2022-12-12T12:17:03Z

swarm/src/behaviour.rs

-    fn new_handler(&mut self) -> Self::ConnectionHandler;
+    fn new_handler(
+        &mut self,
+        peer: &PeerId,


Suggested change

peer: &PeerId,

peer: PeerId,

Why provide a reference to a Copy type?

mxinden · 2022-12-12T12:22:26Z

One slight variation on this: new_handler could now be called new_established_connection to be more consistent with the two others.

Sounds good to me.

We could also split it into new_established_inbound_connection and new_established_outbound_connection. It is a bit odd at the moment that for pending connections, the function is split by inbound and outbound but for the established one it is not.

Do I understand correctly that both methods would have the same signature and that we don't foresee the two signatures to diverge any time soon? If so, I suggest not splitting it. While nice for the sake of consistency, I don't think it justifies the additional complexity.

mxinden · 2022-12-12T12:27:56Z

protocols/relay/src/v2/client.rs

@@ -96,6 +99,8 @@ pub struct Client {
    /// connection.
    directly_connected_peers: HashMap<PeerId, Vec<ConnectionId>>,

+    initial_events: HashMap<PeerId, handler::In>,


Suggested change

initial_events: HashMap<PeerId, handler::In>,

/// [`handler::In`] event to be provided to a handler in `NetworkBehaviour::new_handler` once the corresponding connection is established.

initial_events: HashMap<PeerId, handler::In>,

What do you think?

mxinden · 2022-12-12T12:33:31Z

protocols/relay/src/v2/client.rs

+            if let Some(event) = self.initial_events.remove(peer) {
+                #[allow(deprecated)]
+                handler.inject_event(event)
+            }


Injecting events through two ways, i.e. (1) at creation and (2) regularly during the lifetime of the handler via Swarm, seems inconsistent to me. That said, I can not think of a better way. I suggest keeping as is.

We could hold the event back here and instead listen for the ConnectionEstablished event and then emit an event later?

mxinden · 2022-12-12T12:43:32Z

protocols/relay/src/v2/client.rs

+                log::debug!(
+                    "Established relayed instead of direct connection to {:?}, \
+                     dropping initial in event {:?}.",
+                    peer,
+                    event


How do we know that there is no second direct connection in progress of establishment to the given peer? How do we know that this event was predetermined for this connection and not another connection still being established?

This is rather complex. I rewrote this comment 4 times. I might be missing something.

We can index the HashMap by the dialed address instead of the peer ID. That should fix things I guess?

mxinden · 2022-12-12T12:58:23Z

protocols/relay/src/v2/client.rs

+                        self.initial_events
+                            .insert(relay_peer_id, handler::In::Reserve { to_listener });
+
                        NetworkBehaviourAction::Dial {
                            opts: DialOpts::peer_id(relay_peer_id)
                                .addresses(vec![relay_addr])
                                .extend_addresses_through_behaviour()


Note that some NetworkBehaviour might add a relayed address here.

The relay protocol should never be spoken on top of a relayed connection. We should never do nested relaying as otherwise one could build circular relayed connections that could be misused for a DOS attack. Thus returning a dummy::ConnectionHandler on relayed connections is critical.

That said, we might be establishing both a relayed and a direct connection to a given peer in parallel. How do we know which one the initial_events event was destined for?

Not sure how to prevent this. The solution you do above, namely to drop the initial_events event in case it turns out to be a relayed connection covers the case where there is only a single connection attempt in progress. What about the case where both (relayed and direct) are in progress?

I only 1-to-1 ported the code that already existed, as far as I know, no functionality should currently change in this PR.

thomaseizinger · 2022-12-12T20:42:25Z

We could also split it into new_established_inbound_connection and new_established_outbound_connection. It is a bit odd at the moment that for pending connections, the function is split by inbound and outbound but for the established one it is not.

Do I understand correctly that both methods would have the same signature and that we don't foresee the two signatures to diverge any time soon? If so, I suggest not splitting it. While nice for the sake of consistency, I don't think it justifies the additional complexity.

If split, I would inline the variants of the ConnectedPoint enum:

rust-libp2p/core/src/connection.rs

Line 116 in f8f19ba

pub enum ConnectedPoint {

So no, the signatures would not be the same.

divagant-martian · 2022-12-12T22:08:09Z

Sorry for the delay. As far as lighthouse is concerned, this would work really well in particular having the ConnectedPoint, since most of the time we decide based on percentage of inbound and outbound connections. Regarding the possibly generic error, I see why it would be a nice to have, but at least in our case that can be directly handled when creating the handler and wouldn't really need the error reported by the swarm.

Also of course, thanks for all these efforts!

thomaseizinger · 2022-12-14T06:43:48Z

@mxinden: Instead of a new DialId or DialToken, we could also create the ConnectionId as part of the dial attempt. The Pool would later allocate one anyway so we could actually directly use that instead. This gives us one identifier for the entire connection from the moment a dial is issued up until the connection gets closed.

thomaseizinger · 2022-12-14T07:01:36Z

@mxinden: Instead of a new DialId or DialToken, we could also create the ConnectionId as part of the dial attempt. The Pool would later allocate one anyway so we could actually directly use that instead. This gives us one identifier for the entire connection from the moment a dial is issued up until the connection gets closed.

I think with that idea, we might be able to ship the removal of Dial::Handler as a separate patch. That would be nice.

thomaseizinger · 2022-12-19T05:53:09Z

@mxinden: Instead of a new DialId or DialToken, we could also create the ConnectionId as part of the dial attempt. The Pool would later allocate one anyway so we could actually directly use that instead. This gives us one identifier for the entire connection from the moment a dial is issued up until the connection gets closed.

I think with that idea, we might be able to ship the removal of Dial::Handler as a separate patch. That would be nice.

Turns out we can't but overall the version in #3254 is a lot less breaking which is nice :)

thomaseizinger added 3 commits November 9, 2022 12:45

Introduce NetworkBehaviour::DialPayload

d501019

WIP Migrate production code

77c85f6

Deprecate new_handler

118ffb6

thomaseizinger requested a review from mxinden November 9, 2022 02:11

thomaseizinger commented Nov 9, 2022

View reviewed changes

protocols/dcutr/src/behaviour.rs Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

thomaseizinger mentioned this pull request Nov 9, 2022

swarm/: Support generic connection management #2824

Closed

This comment was marked as outdated.

Sign in to view

WIP: Completely remove IntoConnectionHandler

377c351

thomaseizinger changed the title ~~WIP: Introduce NetworkBehaviour::DialPayload~~ refactor!(swarm): Remove IntoConnectionHandler abstraction Nov 14, 2022

thomaseizinger changed the title ~~refactor!(swarm): Remove IntoConnectionHandler abstraction~~ refactor(swarm)!: Remove IntoConnectionHandler abstraction Nov 15, 2022

thomaseizinger commented Nov 15, 2022

View reviewed changes

thomaseizinger added 7 commits November 15, 2022 13:27

Set supported protocols upon connection establishment

9972eb4

Remove TODOs

fb9cdbc

Fix bad boolean logic

a96780d

Merge branch 'master' into 2824-remove-into-connection-handler

3c9d98e

Fix gossipsub tests

900edef

Fix clippy warning

d1eea3a

Update docs

98bf927

thomaseizinger commented Nov 15, 2022

View reviewed changes

Reduce diff

1cef941

thomaseizinger marked this pull request as ready for review November 15, 2022 08:28

This comment was marked as outdated.

Sign in to view

thomaseizinger mentioned this pull request Dec 1, 2022

Removing the handler field from FromSwarm::ConnectionClosed in favor of ConnectionHandler::poll_close #3046

Closed

Revert "Allow each NetworkBehaviour to have their own `ConnectionDe…

a269e9d

…nied` reason" This reverts commit 19348e9.

thomaseizinger force-pushed the 2824-remove-into-connection-handler branch from eed846b to 7369cc1 Compare December 7, 2022 01:53

Always box cause for denied connection

dcb4f96

thomaseizinger force-pushed the 2824-remove-into-connection-handler branch from 7369cc1 to dcb4f96 Compare December 7, 2022 01:54

elenaf9 reviewed Dec 10, 2022

View reviewed changes

thomaseizinger mentioned this pull request Dec 12, 2022

swarm/: Expose time to establish connection #2745

Closed

mxinden reviewed Dec 12, 2022

View reviewed changes

This was referenced Dec 16, 2022

swarm/: Support generic connection management through NetworkBehaviour #2828

Closed

refactor(swarm): express dial logic linearly #3253

Merged

feat(swarm)!: allow NetworkBehaviours to manage connections #3254

Merged

thomaseizinger closed this Dec 19, 2022

thomaseizinger mentioned this pull request Dec 22, 2022

refactor(swarm)!: don't be generic over Transport #3272

Merged

4 tasks

mxinden mentioned this pull request Jan 13, 2023

feat(p2p): always connected peers n0-computer/beetle#256

Open

thomaseizinger mentioned this pull request Jan 16, 2023

refactor!: Move ConnectionId and PendingPoint to libp2p-swarm #3221

Closed

2 tasks

thomaseizinger deleted the 2824-remove-into-connection-handler branch February 24, 2023 14:48

		@@ -96,6 +97,10 @@

		- Update `rust-version` to reflect the actual MSRV: 1.62.0. See [PR 3090].

		- Remove `IntoConnectionHandler` abstraction and change the signature of `NetworkBehaviour::new_handler` to accept `PeerId` and `ConnectedPoint`.

	initial_events: HashMap<PeerId, handler::In>,
	/// [`handler::In`] event to be provided to a handler in `NetworkBehaviour::new_handler` once the corresponding connection is established.
	initial_events: HashMap<PeerId, handler::In>,

feat(swarm)!: Allow NetworkBehaviours to manage incoming connections #3099

feat(swarm)!: Allow NetworkBehaviours to manage incoming connections #3099

Conversation

thomaseizinger commented Nov 9, 2022 • edited

Description

Links to any relevant issues

Open Questions

Change checklist

This comment was marked as outdated.

This comment was marked as outdated.

rkuhn commented Nov 13, 2022

rkuhn commented Nov 13, 2022

thomaseizinger commented Nov 13, 2022

This comment was marked as outdated.

This comment was marked as outdated.

rkuhn commented Nov 14, 2022

thomaseizinger commented Nov 14, 2022

This comment was marked as outdated.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaseizinger Nov 23, 2022 • edited

Choose a reason for hiding this comment

This comment was marked as outdated.

thomaseizinger commented Dec 6, 2022 • edited

thomaseizinger commented Dec 8, 2022

mergify bot commented Dec 9, 2022

elenaf9 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

thomaseizinger commented Dec 10, 2022

thomaseizinger commented Dec 10, 2022

thomaseizinger commented Dec 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden commented Dec 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaseizinger commented Dec 12, 2022

divagant-martian commented Dec 12, 2022

thomaseizinger commented Dec 14, 2022

thomaseizinger commented Dec 14, 2022

thomaseizinger commented Dec 19, 2022

feat(swarm)!: Allow `NetworkBehaviour`s to manage incoming connections #3099

feat(swarm)!: Allow `NetworkBehaviour`s to manage incoming connections #3099

thomaseizinger commented Nov 9, 2022 •

edited

thomaseizinger Nov 23, 2022 •

edited

thomaseizinger commented Dec 6, 2022 •

edited