Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm/src/lib: Continue polling network when behaviour is blocked #2304

Merged
merged 4 commits into from Oct 26, 2021

Conversation

mxinden
Copy link
Member

@mxinden mxinden commented Oct 20, 2021

With #2248 a connection task
awaits sending an event to the behaviour before polling for new events
from the behaviour 1.

When Swarm::poll is unable to deliver an event to a connection task it
returns Poll::Pending even though (a) polling Swarm::network might be
able to make progress (network_not_ready being false) and (b) it
does not register a waker to be woken up 2.

In combination this can lead to a deadlock where a connection task waits
to send an event to the behaviour and Swarm::poll returns
Poll::Pending failing to send an event to the connection task, not
registering a waiker in order to be polled again.

With this commit Swarm::poll will only return Poll::Pending, when
failing to deliver an event to a connection task, if the network is
unable to make progress (i.e. network_not_ready being true).

In the long-run Swarm::poll should likely be redesigned, prioritizing
the behaviour over the network, given the former is the control plane
and the latter potentially yields new work from the outside.

With libp2p#2248 a connection task
`await`s sending an event to the behaviour before polling for new events
from the behaviour [1].

When `Swarm::poll` is unable to deliver an event to a connection task it
returns `Poll::Pending` even though (a) polling `Swarm::network` might be
able to make progress (`network_not_ready` being `false`) and (b) it
does not register a waker to be woken up [2].

In combination this can lead to a deadlock where a connection task waits
to send an event to the behaviour and `Swarm::poll` returns
`Poll::Pending` failing to send an event to the connection task, not
registering a waiker in order to be polled again.

With this commit `Swarm::poll` will only return `Poll::Pending`, when
failing to deliver an event to a connection task, if the network is
unable to make progress (i.e. `network_not_ready` being `true`).

In the long-run `Swarm::poll` should likely be redesigned, prioritizing
the behaviour over the network, given the former is the control plane
and the latter potentially yields new work from the outside.

[1]: https://github.com/libp2p/rust-libp2p/blob/ca1b7cf043b4264c69b19fe75de488330a7a1f2f/core/src/connection/pool/task.rs#L224-L232
[2]: https://github.com/libp2p/rust-libp2p/blob/ca1b7cf043b4264c69b19fe75de488330a7a1f2f/swarm/src/lib.rs#L756-L783
@mxinden
Copy link
Member Author

mxinden commented Oct 20, 2021

@AgeManning could you validate whether this fixes the stalls you see on #2290?

@AgeManning
Copy link
Contributor

This corrects the stalls I have been seeing on #2290 🎉

@mxinden mxinden marked this pull request as ready for review October 22, 2021 09:55
@mxinden
Copy link
Member Author

mxinden commented Oct 26, 2021

I am sorry for the delay here. I was sick for a week. I hope to be able to cut a new release candidate tomorrow.

@mxinden mxinden merged commit 4ef5738 into libp2p:master Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants