Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure inject_dial_failure is called #1549

Merged
merged 9 commits into from Apr 17, 2020
Merged

Conversation

tomaka
Copy link
Member

@tomaka tomaka commented Apr 17, 2020

Context: paritytech/substrate#5684

There is an assumption right now in Substrate's code (and probably other) that, after we call dial or emit a DialPeer, the inject_dial_failure method is called if we know that we won't be able to reach the given node.

This is not the case right now in the situation where addresses_of_peer returns an empty list. In that situation, we ask the Swarm to connect to a node, and don't get any event whatsoever about that node ever again.

This PR modifies dial to emit the event if necessary, and simplifies the handling of DialPeer to rely more on the behaviour of dial.

In generally, I wish it was precisely documented and explained what happens in which situation, and that it is possible in any situation to track the state of each connection attempt, but that's a later change.
While I think it's be reasonable to ask @romanb to handle and review that, he's unfortunately on vacation right now.

@tomaka tomaka requested review from twittner and mxinden April 17, 2020 14:30
@tomaka
Copy link
Member Author

tomaka commented Apr 17, 2020

I've reverted a change in the last commit because it makes the libp2p-kad tests fail.

To me the change is correct: if we emit a DialPeer with a condition that evaluates to false, we should emit an inject_dial_failure.
While I think the change is correct, an even better change would be to make this non-ambiguous in the future.

I'm testing whether this fixes paritytech/substrate#5684.

Copy link
Contributor

@twittner twittner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

swarm/src/lib.rs Outdated Show resolved Hide resolved
@tomaka
Copy link
Member Author

tomaka commented Apr 17, 2020

To me the change is correct: if we emit a DialPeer with a condition that evaluates to false, we should emit an inject_dial_failure.
While I think the change is correct, an even better change would be to make this non-ambiguous in the future.

After investigating a bit, I was mistaken. The problem is that my change would call inject_dial_failure even if we're already connected to that node.
I pushed another commit that does it properly.

Co-Authored-By: Toralf Wittner <tw@dtex.org>
@romanb
Copy link
Contributor

romanb commented Apr 17, 2020

After investigating a bit, I was mistaken. The problem is that my change would call inject_dial_failure even if we're already connected to that node.

Why would that not be correct? I was actually aware of the current behaviour described in this PR w.r.t. inject_dial_failure (See the second paragraph of the PR description of #1506), but I thought this was actually pre-existing behaviour, i.e. predating #1440 - am I mistaken and I introduced this behaviour? I was quite sure that e.g. inject_dial_failure was never called when DialPeer was emitted with the local peer ID in any earlier libp2p version, but I may have missed something.

swarm/src/lib.rs Outdated
},
Peer::Connected(_) => {},
Peer::Disconnected(..) | Peer::Local => {
this.behaviour.inject_dial_failure(&peer_id);
Copy link
Contributor

@romanb romanb Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure about calling inject_dial_failure when the dialing condition was not met. I would only expect a call to inject_dial_failure if my behaviour emitted DialPeer and the dialing condition was actually met, since by specifying the condition I'm saying under which conditions I actually want to dial and wouldn't expect to be informed about a dial failure if I didn't actually intend to dial as per my condition. Admittedly, as I wrote in #1506, if a behaviour actually wants to know if the dialing condition is not met because it would otherwise still keep some state, another callback may be needed then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be ok if we clearly document that inject_dial_failure is called also if the dialing condition is not met, i.e. such that every DialPeer is guaranteed to be paired with inject_connection_established or inject_dial_failure, but then there should be no exception for when the peer is already connected, again because the behaviour may otherwise keep some state around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert that.

@tomaka
Copy link
Member Author

tomaka commented Apr 17, 2020

am I mistaken and I introduced this behaviour? I was quite sure that e.g. inject_dial_failure was never called when DialPeer was emitted with the local peer ID, but I may have missed something.

I don't know exactly when this got changed, but in libp2p 0.16.2, when addresses_of_peer returned an empty list, we would indeed call inject_dial_failure:

For the local peer ID, this has indeed never been the case, but it makes sense to me for it to fail as well.

This reverts commit 80c0d3d.
@romanb
Copy link
Contributor

romanb commented Apr 17, 2020

am I mistaken and I introduced this behaviour? I was quite sure that e.g. inject_dial_failure was never called when DialPeer was emitted with the local peer ID, but I may have missed something.

I don't know when, but in libp2p 0.16.2, when addresses_of_peer returned an empty list, we would indeed call inject_dial_failure: [..]

I see, thanks for digging that out. This was a regression then, my apologies.

@tomaka
Copy link
Member Author

tomaka commented Apr 17, 2020

No worries! Is that PR good for you?

@tomaka tomaka merged commit 87b5efb into libp2p:master Apr 17, 2020
@tomaka tomaka deleted the fix-dial branch April 17, 2020 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants