Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Local nodes keep disconnecting. #4272

Closed
arkpar opened this issue Dec 2, 2019 · 16 comments
Closed

Local nodes keep disconnecting. #4272

arkpar opened this issue Dec 2, 2019 · 16 comments
Assignees
Labels
I3-bug The node fails to follow expected behavior.

Comments

@arkpar
Copy link
Member

arkpar commented Dec 2, 2019

Two nodes running on localhost keep disconnecting from each other. Disconnect is initiated by libp2p and not by sync or reputation change apparently. Logs have no useful information on why the disconnect has happened.

Node 1 started as:

polkadot -d /tmp/polkadot/ -lpeerset=trace,sync=trace,sub-libp2p=trace,libp2p=trace --out-peers 0

Node2 started as:

polkadot -d /tmp/polkadot2 --reserved-nodes /ip4/127.0.0.1/tcp/30333/p2p/QmYiJFSrLQdvWJLLF64RnWGvk92zSjaN49EP9rtNLDbNEo --reserved-only -l peerset=trace,sub-libp2p=trace,libp2p=trace

Nodes stay connected for about 10 seconds before disconnect and reconnect happens.

  • Nodes should stay connected.
  • Logs should have disconnect reason.
@arkpar arkpar added the I3-bug The node fails to follow expected behavior. label Dec 2, 2019
@tomaka
Copy link
Contributor

tomaka commented Dec 3, 2019

For what it's worth, disabling the discovery mechanism fixes the issue. It's still unclear to me what is happening.

@romanb romanb self-assigned this Jan 9, 2020
@romanb
Copy link
Contributor

romanb commented Jan 13, 2020

What seems to happen is the following:

  1. Node 2 connects to node 1.
  2. Both nodes also perform regular (Kusama?) DHT discovery queries in regular (exponentially increasing) intervals, starting with seconds and capped at 60 seconds.
  3. It appears that local nodes connecting to the (Kusama?) DHT put their local address into the DHT, i.e. /ip4/127.0.0.1/tcp/30333. This is obviously necessary for nodes in the same local network to discover and talk to each other through the DHT, but these are also seen by remote nodes.
  4. As a result, Node 2 will discover a node with a different peer ID from Node 1 but with address /ip4/127.0.0.1/tcp/30333. It will try to connect to it as part of the DHT lookups.
  5. Thus Node 2 will make another connection attempt to Node 1, thinking it is some other node (i.e. expecting a different peer ID this time). It still has the existing connection to Node 1 for the moment.
  6. Node 1 receives the incoming connection from Node 2, replacing the existing connection (i.e. dropping it) due to the single-connection-per-peer policy. The reason(s) for always taking the new connection over the old, even if a node has the role of listener in both connections (i.e. it is not a "simultaneous connect" scenario) are not entirely clear to me (@tomaka?).
  7. Node 2 finishes setting up the new connection accepted by Node 1, but then discovers that the expected peer ID does not match the actual peer ID, so it drops this connection (MITM protection). (Node 2 quickly gets a BrokenPipe error on the old connection through the StreamMuxer usually. )
  8. Node 1 gets a ConnectionReset error on the new connection.

This sequence more or less repeats ad infinitum, since Node 2's discovery will continue to encounter these peer IDs differing from that of Node 1 with address /ip4/127.0.0.1/tcp/30333 to which it will try to connect during DHT lookups.

If I'm not mistaken, this seems to be an interesting way for a node C to directly influence the connectivity between some nodes A and B, e.g. by advertising public addresses of A in the DHT under its own peer ID. When node B picks these up during lookups, it would disturb the connection between A and B in the above manner. This may be primarily a pitfall of the single-connection-per-node policy, though it may possibly be prevented by establishing a preference for the old (existing) connection, if a node receives a second connection as a listener when it is already a listener in the existing connection, but I'm not entirely clear about all the possible consequences at the moment.

@tomaka
Copy link
Contributor

tomaka commented Jan 13, 2020

The reason(s) for always taking the new connection over the old, even if a node has the role of listener in both connections (i.e. it is not a "simultaneous connect" scenario) are not entirely clear to me (@tomaka?).

The reason is that typically when a node opens a new connection, it's because the old one is dead.

Example situation: Node 1 and Node 2 are connected. Node 2 loses its Internet connection, realizes it, and kills all existing sockets. No FIN is actually being sent because of no Internet access. Node 2 then gains back its Internet connection and tries to re-connect to Node 1. Node 1 isn't aware that the previous connection is dead.
In this situation, the new connection is the right choice.

@tomaka
Copy link
Contributor

tomaka commented Jan 13, 2020

In my opinion the solution is to handle multiple simultaneous connections per node (libp2p/rust-libp2p#912).
There have already been several tricky issues caused by the decision of enforcing a unique connection, proving that it probably wasn't a good idea.

@arkpar
Copy link
Member Author

arkpar commented Jan 14, 2020

Node 1 isn't aware that the previous connection is dead.

Doesn't libp2p have keep-alive or ping protocol to handle this?

@tomaka
Copy link
Contributor

tomaka commented Jan 14, 2020

Doesn't libp2p have keep-alive or ping protocol to handle this?

It does, but it takes something like 30 seconds to trigger.

I'm not actually sure that my scenario above is realistic, but the general idea is that we expect that when a node opens a second connection it is because the existing one is unusable.

@romanb
Copy link
Contributor

romanb commented Jan 16, 2020

Doesn't libp2p have keep-alive or ping protocol to handle this?

It does, but it takes something like 30 seconds to trigger.

I'm not actually sure that my scenario above is realistic, but the general idea is that we expect that when a node opens a second connection it is because the existing one is unusable.

Note though that even when permitting multiple connections per peer, which I'm currently looking into, you will want to have a configurable limit (per peer). In a sense, the current single-connection-per-peer policy can be seen as a hard-coded limit of 1. Whatever the limit, I don't think it is a good idea to enforce the limit by dropping existing connections in favor of new ones at the lower networking layers. Rather, timely detection of broken connections is up to application protocols (or by configuring timeouts on a lower-layer protocol), and in particular what "timely" is supposed to be exactly as per the requirements of the protocol. The ping protocol can be aptly configured and used for this purpose, if desired.

romanb pushed a commit to romanb/rust-libp2p that referenced this issue Feb 4, 2020
Instead of trying to enforce a single connection per peer,
which involves quite a bit of additional complexity e.g.
to prioritise simultaneously opened connections and can
have other undesirable consequences [1], we now
make multiple connections per peer a feature.

The gist of these changes is as follows:

The concept of a "node" with an implicit 1-1 correspondence
to a connection has been replaced with the "first-class"
concept of a "connection". The code from `src/nodes` has moved
(with varying degrees of modification) to `src/connection`.
A `HandledNode` has become a `Connection`, a `NodeHandler` a
`ConnectionHandler`, the `CollectionStream` was the basis for
the new `connection::Pool`, and so forth.

Conceptually, a `Network` contains a `connection::Pool` which
in turn internally employs the `connection::Manager` for
handling the background `connection::manager::Task`s, one
per connection, as before. These are all considered implementation
details. On the public API, `Peer`s are managed as before through
the `Network`, except now the API has changed with the shift of focus
to (potentially multiple) connections per peer. The `NetworkEvent`s have
accordingly also undergone changes.

The Swarm APIs remain largely unchanged, except for the fact that
`inject_replaced` is no longer called. It may now practically happen
that multiple `ProtocolsHandler`s are associated with a single
`NetworkBehaviour`, one per connection. If implementations of
`NetworkBehaviour` rely somehow on communicating with exactly
one `ProtocolsHandler`, this may cause issues, but it is unlikely.

[1]: paritytech/substrate#4272
romanb pushed a commit to romanb/rust-libp2p that referenced this issue Feb 4, 2020
Instead of trying to enforce a single connection per peer,
which involves quite a bit of additional complexity e.g.
to prioritise simultaneously opened connections and can
have other undesirable consequences [1], we now
make multiple connections per peer a feature.

The gist of these changes is as follows:

The concept of a "node" with an implicit 1-1 correspondence
to a connection has been replaced with the "first-class"
concept of a "connection". The code from `src/nodes` has moved
(with varying degrees of modification) to `src/connection`.
A `HandledNode` has become a `Connection`, a `NodeHandler` a
`ConnectionHandler`, the `CollectionStream` was the basis for
the new `connection::Pool`, and so forth.

Conceptually, a `Network` contains a `connection::Pool` which
in turn internally employs the `connection::Manager` for
handling the background `connection::manager::Task`s, one
per connection, as before. These are all considered implementation
details. On the public API, `Peer`s are managed as before through
the `Network`, except now the API has changed with the shift of focus
to (potentially multiple) connections per peer. The `NetworkEvent`s have
accordingly also undergone changes.

The Swarm APIs remain largely unchanged, except for the fact that
`inject_replaced` is no longer called. It may now practically happen
that multiple `ProtocolsHandler`s are associated with a single
`NetworkBehaviour`, one per connection. If implementations of
`NetworkBehaviour` rely somehow on communicating with exactly
one `ProtocolsHandler`, this may cause issues, but it is unlikely.

[1]: paritytech/substrate#4272
romanb pushed a commit to romanb/rust-libp2p that referenced this issue Feb 6, 2020
Instead of trying to enforce a single connection per peer,
which involves quite a bit of additional complexity e.g.
to prioritise simultaneously opened connections and can
have other undesirable consequences [1], we now
make multiple connections per peer a feature.

The gist of these changes is as follows:

The concept of a "node" with an implicit 1-1 correspondence
to a connection has been replaced with the "first-class"
concept of a "connection". The code from `src/nodes` has moved
(with varying degrees of modification) to `src/connection`.
A `HandledNode` has become a `Connection`, a `NodeHandler` a
`ConnectionHandler`, the `CollectionStream` was the basis for
the new `connection::Pool`, and so forth.

Conceptually, a `Network` contains a `connection::Pool` which
in turn internally employs the `connection::Manager` for
handling the background `connection::manager::Task`s, one
per connection, as before. These are all considered implementation
details. On the public API, `Peer`s are managed as before through
the `Network`, except now the API has changed with the shift of focus
to (potentially multiple) connections per peer. The `NetworkEvent`s have
accordingly also undergone changes.

The Swarm APIs remain largely unchanged, except for the fact that
`inject_replaced` is no longer called. It may now practically happen
that multiple `ProtocolsHandler`s are associated with a single
`NetworkBehaviour`, one per connection. If implementations of
`NetworkBehaviour` rely somehow on communicating with exactly
one `ProtocolsHandler`, this may cause issues, but it is unlikely.

[1]: paritytech/substrate#4272
romanb pushed a commit to romanb/rust-libp2p that referenced this issue Feb 7, 2020
Instead of trying to enforce a single connection per peer,
which involves quite a bit of additional complexity e.g.
to prioritise simultaneously opened connections and can
have other undesirable consequences [1], we now
make multiple connections per peer a feature.

The gist of these changes is as follows:

The concept of a "node" with an implicit 1-1 correspondence
to a connection has been replaced with the "first-class"
concept of a "connection". The code from `src/nodes` has moved
(with varying degrees of modification) to `src/connection`.
A `HandledNode` has become a `Connection`, a `NodeHandler` a
`ConnectionHandler`, the `CollectionStream` was the basis for
the new `connection::Pool`, and so forth.

Conceptually, a `Network` contains a `connection::Pool` which
in turn internally employs the `connection::Manager` for
handling the background `connection::manager::Task`s, one
per connection, as before. These are all considered implementation
details. On the public API, `Peer`s are managed as before through
the `Network`, except now the API has changed with the shift of focus
to (potentially multiple) connections per peer. The `NetworkEvent`s have
accordingly also undergone changes.

The Swarm APIs remain largely unchanged, except for the fact that
`inject_replaced` is no longer called. It may now practically happen
that multiple `ProtocolsHandler`s are associated with a single
`NetworkBehaviour`, one per connection. If implementations of
`NetworkBehaviour` rely somehow on communicating with exactly
one `ProtocolsHandler`, this may cause issues, but it is unlikely.

[1]: paritytech/substrate#4272
@arkpar
Copy link
Member Author

arkpar commented Feb 7, 2020

How exactly allowing multiple connections will solve this issue? This is a fairly straightforward scenario, where none of the peers misbehave or lose connectivity. Why would we want multiple connections here?

As a result, Node 2 will discover a node with a different peer ID from Node 1 but with address /ip4/127.0.0.1/tcp/30333. It will try to connect to it as part of the DHT lookups.

It seems to me that it should not attempt dialing an address that's already connected in the first place.

@romanb
Copy link
Contributor

romanb commented Feb 8, 2020

How exactly allowing multiple connections will solve this issue? This is a fairly straightforward scenario, where none of the peers misbehave or lose connectivity. Why would we want multiple connections here?

As I explained in an earlier comment, the immediate cause of this issue is that the "listener" closes its existing connection, preferring the new over the old connection in the attempt to enforce a single connection per peer (the "dialer" then later, upon discovering the peer ID mismatch closes the new connection as well, and the connect/disconnect dance begins in this way). While my first reaction was that it doesn't seem right that new connections are preferred over old ones in this scheme, and I'd rather swap that around, @tomaka had some concerns on doing that. In any case, removing the single-connection-per-peer policy is a strictly more general and desirable solution, not just in light of this issue. In this particular scenario the "listener" then no longer has to make a choice between these connections, the old connection remains unaffected, the "dialer" eventually closes its new connection attempt upon discovering the peer ID mismatch.

As a result, Node 2 will discover a node with a different peer ID from Node 1 but with address /ip4/127.0.0.1/tcp/30333. It will try to connect to it as part of the DHT lookups.

It seems to me that it should not attempt dialing an address that's already connected in the first place.

In general and at the level of libp2p, I don't think it is desirable to disallow multiple connections to the same address. Of course - and I think that is what you are referring to - in the context of a specific protocol, like Kademlia, one could argue that connections should be uniquely identified by such an address. However, the (logical) overlay network of Kademlia only operates opaquely on a uniformly distributed keyspace, which also contains the node / peer IDs. Kademlia only uniquely identifies peers by these IDs and addresses to connect to are a secondary implementation artifact. In the scenario here Kademlia sees a different peer ID for the same address, i.e. another peer that supposedly also has that address (among others, possibly). While you could argue that it should disregard the peer ID in this case, seeing that it already has a connection to the same address, even though with a different peer ID, I'm really not sure this is a good idea. If multiple peers are seen advertising the same address, who is to decide which is "right", i.e. which to connect to, respectively which connection to keep and which others to ignore.

@arkpar
Copy link
Member Author

arkpar commented Feb 10, 2020

Example situation: Node 1 and Node 2 are connected. Node 2 loses its Internet connection, realizes it, and kills all existing sockets. No FIN is actually being sent because of no Internet access. Node 2 then gains back its Internet connection and tries to re-connect to Node 1. Node 1 isn't aware that the previous connection is dead.
In this situation, the new connection is the right choice.

I'd argue that new connection should not replace existing. Dead connections will eventually drop because we have keep-alive or ping protocols. Waiting for 30 seconds to restore connectivity is fine for substrate.

If multiple peers are seen advertising the same address, who is to decide which is "right", i.e. which to connect to, respectively which connection to keep and which others to ignore.

You don't drop existing connections. Otherwise it's an attack vector.

I'd like to clarify that "multiple connections" being discussed are to support connections to the same address with different node IDs. Multiple connections to the same address/node_id still won't be allowed, right?

Regarding multiple connections to the same node id, we probably don't want that in substrate/polkadot. Proposed use case sounds like: "Let's allow the second connection because the first one might be actually dead" sounds like a hack. What if the first one never closes after all? It look like we are struggling with managing connections even now, when duplicates are not allowed. This looks like it will introduce a lot of unneeded complexity for no good reason.

Additionally, can there be an additional authentication mechanism introduced to Kademlia layer? Devp2p discovery would not propagate unconfirmed addresses. "Confirmed" here means that there was a signed UDP ping/pong exchange with that address first.

@romanb
Copy link
Contributor

romanb commented Feb 10, 2020

Example situation: Node 1 and Node 2 are connected. Node 2 loses its Internet connection, realizes it, and kills all existing sockets. No FIN is actually being sent because of no Internet access. Node 2 then gains back its Internet connection and tries to re-connect to Node 1. Node 1 isn't aware that the previous connection is dead.
In this situation, the new connection is the right choice.

I'd argue that new connection should not replace existing. Dead connections will eventually drop because we have keep-alive or ping protocols. Waiting for 30 seconds to restore connectivity is fine for substrate.

I agree, hence my first reaction was to change that, as I mentioned at the end of my first comment. The same thing (i.e. not dropping the existing connection) also happens with libp2p-core permitting multiple connections.

If multiple peers are seen advertising the same address, who is to decide which is "right", i.e. which to connect to, respectively which connection to keep and which others to ignore.

You don't drop existing connections. Otherwise it's an attack vector.

Sure, I hinted at the same thing at the end of my first comment. I think we are on the same page here.

I'd like to clarify that "multiple connections" being discussed are to support connections to the same address with different node IDs. Multiple connections to the same address/node_id still won't be allowed, right?

I see no reason for a general-purpose networking library like libp2p to disallow that, so yes, that will be allowed.

Regarding multiple connections to the same node id, we probably don't want that in substrate/polkadot. Proposed use case sounds like: "Let's allow the second connection because the first one might be actually dead" sounds like a hack. What if the first one never closes after all? It look like we are struggling with managing connections even now, when duplicates are not allowed. This looks like it will introduce a lot of unneeded complexity for no good reason.

It's fine if substrate/polkadot do not intentionally make use of multiple connections per peer. Indeed, in libp2p/rust-libp2p#1440 even libp2p-swarm retains these semantics. Nevertheless, two peers may connect to each other "simultaneously" and that is the part where trying to enforce a single connection per peer at all times adds complexity that is removed in libp2p/rust-libp2p#1440. If even a temporary second connection is undesirable for a specific application protocol, it is up to that protocol to decide which connection to close and when.

Additionally, can there be an additional authentication mechanism introduced to Kademlia layer? Devp2p discovery would not propagate unconfirmed addresses. "Confirmed" here means that there was a signed UDP ping/pong exchange with that address first.

That would need to be laid out in more detail in order for me to make an informed comment. In general I expressed my desire in the past to allow better curation of Kademlia's k-buckets through the public API offered by libp2p-kad, i.e. to provide more control over which peers and addresses are in the routing table (and thus advertised to others) at any time. This may or may not already be sufficient to implement such a use-case. There is also some related work proposed in libp2p/rust-libp2p#1352, though that is primarily a means for prioritizing entries in already full k-buckets.

romanb pushed a commit to romanb/rust-libp2p that referenced this issue Feb 13, 2020
Instead of trying to enforce a single connection per peer,
which involves quite a bit of additional complexity e.g.
to prioritise simultaneously opened connections and can
have other undesirable consequences [1], we now
make multiple connections per peer a feature.

The gist of these changes is as follows:

The concept of a "node" with an implicit 1-1 correspondence
to a connection has been replaced with the "first-class"
concept of a "connection". The code from `src/nodes` has moved
(with varying degrees of modification) to `src/connection`.
A `HandledNode` has become a `Connection`, a `NodeHandler` a
`ConnectionHandler`, the `CollectionStream` was the basis for
the new `connection::Pool`, and so forth.

Conceptually, a `Network` contains a `connection::Pool` which
in turn internally employs the `connection::Manager` for
handling the background `connection::manager::Task`s, one
per connection, as before. These are all considered implementation
details. On the public API, `Peer`s are managed as before through
the `Network`, except now the API has changed with the shift of focus
to (potentially multiple) connections per peer. The `NetworkEvent`s have
accordingly also undergone changes.

The Swarm APIs remain largely unchanged, except for the fact that
`inject_replaced` is no longer called. It may now practically happen
that multiple `ProtocolsHandler`s are associated with a single
`NetworkBehaviour`, one per connection. If implementations of
`NetworkBehaviour` rely somehow on communicating with exactly
one `ProtocolsHandler`, this may cause issues, but it is unlikely.

[1]: paritytech/substrate#4272
@ghost
Copy link

ghost commented Feb 23, 2020

We are seeing this behavior in our private network as well due to IP reuse of the nodes, here's a way of producing it using Docker:

  1. First create an internal docker network for this:
docker network create \
  --internal \
  --subnet 172.19.0.0/16 \
  --opt "com.docker.network.bridge.name=substrate" \
  substrate
  1. Start nodes alice, bob, and charlie:
docker run --rm --name alice --network substrate --ip 172.19.1.1 \
  parity/substrate:2.0.0-646e7fb \
  --chain local \
  --validator \
  --node-key 0000000000000000000000000000000000000000000000000000000000000001 \
  --alice \
  --no-mdns
docker run --rm --name bob --network substrate --ip 172.19.1.2 \
  parity/substrate:2.0.0-646e7fb \
  --chain local \
  --validator \
  --node-key 0000000000000000000000000000000000000000000000000000000000000002 \
  --bob \
  --no-mdns \
  --bootnodes /ip4/172.19.1.1/tcp/30333/p2p/QmRpheLN4JWdAnY7HGJfWFNbfkQCb6tFf4vvA6hgjMZKrR
docker run --rm --name charlie --network substrate --ip 172.19.1.3 \
  parity/substrate:2.0.0-646e7fb \
  --chain local \
  --validator \
  --node-key 0000000000000000000000000000000000000000000000000000000000000003 \
  --charlie \
  --no-mdns \
  --bootnodes /ip4/172.19.1.1/tcp/30333/p2p/QmRpheLN4JWdAnY7HGJfWFNbfkQCb6tFf4vvA6hgjMZKrR
  1. Ctrl-C to Stop bob and charlie.

  2. Start charlie but this time using bob's old IP address (172.19.1.2):

docker run --rm --name charlie --network substrate --ip 172.19.1.2 \
  parity/substrate:2.0.0-646e7fb \
  --chain local \
  --validator \
  --node-key 0000000000000000000000000000000000000000000000000000000000000003 \
  --charlie \
  --no-mdns \
  --bootnodes /ip4/172.19.1.1/tcp/30333/p2p/QmRpheLN4JWdAnY7HGJfWFNbfkQCb6tFf4vvA6hgjMZKrR

Then charlie will repeatedly try to connect to alice which the connection will stay connected for very short moment then gets dropped.

And alice will repeatedly try to find bob at 172.19.1.2 but every time it sees charlie's peer ID instead so the connection then gets dropped. This is a problem on its own too as bob will never be at that IP anymore.

tomaka added a commit to libp2p/rust-libp2p that referenced this issue Mar 4, 2020
* Allow multiple connections per peer in libp2p-core.

Instead of trying to enforce a single connection per peer,
which involves quite a bit of additional complexity e.g.
to prioritise simultaneously opened connections and can
have other undesirable consequences [1], we now
make multiple connections per peer a feature.

The gist of these changes is as follows:

The concept of a "node" with an implicit 1-1 correspondence
to a connection has been replaced with the "first-class"
concept of a "connection". The code from `src/nodes` has moved
(with varying degrees of modification) to `src/connection`.
A `HandledNode` has become a `Connection`, a `NodeHandler` a
`ConnectionHandler`, the `CollectionStream` was the basis for
the new `connection::Pool`, and so forth.

Conceptually, a `Network` contains a `connection::Pool` which
in turn internally employs the `connection::Manager` for
handling the background `connection::manager::Task`s, one
per connection, as before. These are all considered implementation
details. On the public API, `Peer`s are managed as before through
the `Network`, except now the API has changed with the shift of focus
to (potentially multiple) connections per peer. The `NetworkEvent`s have
accordingly also undergone changes.

The Swarm APIs remain largely unchanged, except for the fact that
`inject_replaced` is no longer called. It may now practically happen
that multiple `ProtocolsHandler`s are associated with a single
`NetworkBehaviour`, one per connection. If implementations of
`NetworkBehaviour` rely somehow on communicating with exactly
one `ProtocolsHandler`, this may cause issues, but it is unlikely.

[1]: paritytech/substrate#4272

* Fix intra-rustdoc links.

* Update core/src/connection/pool.rs

Co-Authored-By: Max Inden <mail@max-inden.de>

* Address some review feedback and fix doc links.

* Allow responses to be sent on the same connection.

* Remove unnecessary remainders of inject_replaced.

* Update swarm/src/behaviour.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Update swarm/src/lib.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Update core/src/connection/manager.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Update core/src/connection/manager.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Update core/src/connection/pool.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Incorporate more review feedback.

* Move module declaration below imports.

* Update core/src/connection/manager.rs

Co-Authored-By: Toralf Wittner <tw@dtex.org>

* Update core/src/connection/manager.rs

Co-Authored-By: Toralf Wittner <tw@dtex.org>

* Simplify as per review.

* Fix rustoc link.

* Add try_notify_handler and simplify.

* Relocate DialingConnection and DialingAttempt.

For better visibility constraints.

* Small cleanup.

* Small cleanup. More robust EstablishedConnectionIter.

* Clarify semantics of `DialingPeer::connect`.

* Don't call inject_disconnected on InvalidPeerId.

To preserve the previous behavior and ensure calls to
`inject_disconnected` are always paired with calls to
`inject_connected`.

* Provide public ConnectionId constructor.

Mainly needed for testing purposes, e.g. in substrate.

* Move the established connection limit check to the right place.

* Clean up connection error handling.

Separate connection errors into those occuring during
connection setup or upon rejecting a newly established
connection (the `PendingConnectionError`) and those
errors occurring on previously established connections,
i.e. for which a `ConnectionEstablished` event has
been emitted by the connection pool earlier.

* Revert change in log level and clarify an invariant.

* Remove inject_replaced entirely.

* Allow notifying all connection handlers.

Thereby simplify by introducing a new enum `NotifyHandler`,
used with a single constructor `NetworkBehaviourAction::NotifyHandler`.

* Finishing touches.

Small API simplifications and code deduplication.
Some more useful debug logging.

Co-authored-by: Max Inden <mail@max-inden.de>
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
Co-authored-by: Toralf Wittner <tw@dtex.org>
@romanb romanb mentioned this issue Mar 17, 2020
@tomaka
Copy link
Contributor

tomaka commented Apr 9, 2020

Should have been fixed by #5278, although I didn't verify that it actually is.

@arkpar
Copy link
Member Author

arkpar commented Apr 9, 2020

It is now much worse

020-04-09 11:55:41.460 main-tokio- TRACE sync  Connecting QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ
2020-04-09 11:55:41.460 main-tokio- TRACE sync  New peer QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ Status { version: 6, min_supported_version: 3, roles: FULL, best_number: 1181553, best_hash: 0x06ba23f8e56cd2b99ef5998bb5eab05bd6ed81afc76699b9f313abcaf59e92d1, genesis_hash: 0xb0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe, chain_status: [] }
2020-04-09 11:55:41.460 main-tokio- DEBUG sync  Connected QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ
2020-04-09 11:55:45.808 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:45.824 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:51.164 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:51.264 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:53.20 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:53.30 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:53.148 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:56.144 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:56.233 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:56:03.729 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:56:03.872 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:56:03.904 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:56:05.132 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected

The peer is immediately reported as disconnected after connection. After that TCP connection stays open, and block requests from that peer come through. So they still consider the connection to be active.

Also, there are multiple disconnect notifications.

Update: Apparently this is resolved by #5595

@romanb romanb removed their assignment May 7, 2020
@h4x3rotab
Copy link
Contributor

Does this affect nodes behind NAT? E.g. if we have two nodes running behind a NAT router, they share the same IP address, but of course with different port number and peer id. Is there any reference how the DHT stores the peer id?

@tomaka
Copy link
Contributor

tomaka commented May 3, 2021

Closing as stale and probably resolved.

@tomaka tomaka closed this as completed May 3, 2021
santos227 pushed a commit to santos227/rustlib that referenced this issue Jun 20, 2022
* Allow multiple connections per peer in libp2p-core.

Instead of trying to enforce a single connection per peer,
which involves quite a bit of additional complexity e.g.
to prioritise simultaneously opened connections and can
have other undesirable consequences [1], we now
make multiple connections per peer a feature.

The gist of these changes is as follows:

The concept of a "node" with an implicit 1-1 correspondence
to a connection has been replaced with the "first-class"
concept of a "connection". The code from `src/nodes` has moved
(with varying degrees of modification) to `src/connection`.
A `HandledNode` has become a `Connection`, a `NodeHandler` a
`ConnectionHandler`, the `CollectionStream` was the basis for
the new `connection::Pool`, and so forth.

Conceptually, a `Network` contains a `connection::Pool` which
in turn internally employs the `connection::Manager` for
handling the background `connection::manager::Task`s, one
per connection, as before. These are all considered implementation
details. On the public API, `Peer`s are managed as before through
the `Network`, except now the API has changed with the shift of focus
to (potentially multiple) connections per peer. The `NetworkEvent`s have
accordingly also undergone changes.

The Swarm APIs remain largely unchanged, except for the fact that
`inject_replaced` is no longer called. It may now practically happen
that multiple `ProtocolsHandler`s are associated with a single
`NetworkBehaviour`, one per connection. If implementations of
`NetworkBehaviour` rely somehow on communicating with exactly
one `ProtocolsHandler`, this may cause issues, but it is unlikely.

[1]: paritytech/substrate#4272

* Fix intra-rustdoc links.

* Update core/src/connection/pool.rs

Co-Authored-By: Max Inden <mail@max-inden.de>

* Address some review feedback and fix doc links.

* Allow responses to be sent on the same connection.

* Remove unnecessary remainders of inject_replaced.

* Update swarm/src/behaviour.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Update swarm/src/lib.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Update core/src/connection/manager.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Update core/src/connection/manager.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Update core/src/connection/pool.rs

Co-Authored-By: Pierre Krieger <pierre.krieger1708@gmail.com>

* Incorporate more review feedback.

* Move module declaration below imports.

* Update core/src/connection/manager.rs

Co-Authored-By: Toralf Wittner <tw@dtex.org>

* Update core/src/connection/manager.rs

Co-Authored-By: Toralf Wittner <tw@dtex.org>

* Simplify as per review.

* Fix rustoc link.

* Add try_notify_handler and simplify.

* Relocate DialingConnection and DialingAttempt.

For better visibility constraints.

* Small cleanup.

* Small cleanup. More robust EstablishedConnectionIter.

* Clarify semantics of `DialingPeer::connect`.

* Don't call inject_disconnected on InvalidPeerId.

To preserve the previous behavior and ensure calls to
`inject_disconnected` are always paired with calls to
`inject_connected`.

* Provide public ConnectionId constructor.

Mainly needed for testing purposes, e.g. in substrate.

* Move the established connection limit check to the right place.

* Clean up connection error handling.

Separate connection errors into those occuring during
connection setup or upon rejecting a newly established
connection (the `PendingConnectionError`) and those
errors occurring on previously established connections,
i.e. for which a `ConnectionEstablished` event has
been emitted by the connection pool earlier.

* Revert change in log level and clarify an invariant.

* Remove inject_replaced entirely.

* Allow notifying all connection handlers.

Thereby simplify by introducing a new enum `NotifyHandler`,
used with a single constructor `NetworkBehaviourAction::NotifyHandler`.

* Finishing touches.

Small API simplifications and code deduplication.
Some more useful debug logging.

Co-authored-by: Max Inden <mail@max-inden.de>
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
Co-authored-by: Toralf Wittner <tw@dtex.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I3-bug The node fails to follow expected behavior.
Projects
None yet
Development

No branches or pull requests

4 participants