Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

network: Detect early that NotificationOutSubstream was closed by the remote #13396

Merged
merged 5 commits into from
Feb 17, 2023

Conversation

dmitry-markin
Copy link
Contributor

@dmitry-markin dmitry-markin commented Feb 16, 2023

Detect that NotificationOutSubstream was closed by the remote even if we don't write data into it.
Fixes paritytech/polkadot-sdk#526.

Description

In Notifications we use two duplex substreams for outgoing and incoming notifications. After the initial handshake, each substream becames a unidirectional stream of data. Unfortunately, Sink::poll_flush called on libp2p socket does not produce an error if the substream was dropped on the remote until we write some data into it. In order to overcome this limitation, this PR adds polling of the out_substream for incoming messages (which should be none) in <NotificationOutSubstream as Sink>::poll_flush to detect that the stream has terminated (was dropped/closed by the remote).

It may be worth filing a bug to libp2p to expose that the stream was dropped (reset) by the remote via Sink::poll_flush even if no data was written into it before the flushing.

@dmitry-markin dmitry-markin added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Feb 16, 2023
Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@altonen altonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case showing that it detects the closed substream correctly without writing to it would be nice

Co-authored-by: Anton <anton.kalyaev@gmail.com>
@bkchr
Copy link
Member

bkchr commented Feb 16, 2023

It may be worth filing a bug to libp2p to expose that the stream was dropped (reset) by the remote via Sink::poll_flush even if no data was written into it before the flushing.

Sounds like a good idea!

@dmitry-markin
Copy link
Contributor Author

Test case showing that it detects the closed substream correctly without writing to it would be nice

Done! As a bonus we got a test for poll_process() :)

@dmitry-markin
Copy link
Contributor Author

It may be worth filing a bug to libp2p to expose that the stream was dropped (reset) by the remote via Sink::poll_flush even if no data was written into it before the flushing.

Sounds like a good idea!

Actually, I observed the same behavior on a raw tokio socket, so probably libp2p just mimics sockets with its substreams.

@dmitry-markin
Copy link
Contributor Author

It may be worth filing a bug to libp2p to expose that the stream was dropped (reset) by the remote via Sink::poll_flush even if no data was written into it before the flushing.

Sounds like a good idea!

Actually, I observed the same behavior on a raw tokio socket, so probably libp2p just mimics sockets with its substreams.

To be honest, I'm not quite sure now that extending poll_flush() API to detect the stream termination was the best idea — may be we should add a different method poll_terminated() and call it on all substreams from Notifications?

CC @altonen

@altonen
Copy link
Contributor

altonen commented Feb 17, 2023

When and where would you call poll_terminated()? I'm in favor of having this functionality either way but I feel as though having a separate call for just polling termination is not the right way to go, unless I missed something. Having said that, there's an underlying deeper problem both with the handshakes and block request handling that we need to address ASAP, which hopefully will alleviate these problems we have with peering mismatches and syncing.

@dmitry-markin
Copy link
Contributor Author

When and where would you call poll_terminated()? I'm in favor of having this functionality either way but I feel as though having a separate call for just polling termination is not the right way to go, unless I missed something. Having said that, there's an underlying deeper problem both with the handshakes and block request handling that we need to address ASAP, which hopefully will alleviate these problems we have with peering mismatches and syncing.

Okay, I will just add a comment to Notifications that poll_flush in NotificationsOutSubstream carries information about substream termination, so we need to call poll_flush on all substreams, and not only on those we have written some data to (this is already the case).

@dmitry-markin dmitry-markin merged commit dfa654d into master Feb 17, 2023
@dmitry-markin dmitry-markin deleted the dm-peering-mismatch branch February 17, 2023 10:09
dmitry-markin added a commit that referenced this pull request Feb 17, 2023
dmitry-markin added a commit that referenced this pull request Feb 17, 2023
rossbulat pushed a commit that referenced this pull request Feb 19, 2023
rossbulat pushed a commit that referenced this pull request Feb 19, 2023
rossbulat pushed a commit that referenced this pull request Feb 19, 2023
… was closed by the remote (#13396)" (#13409)"

This reverts commit 2075262.
rossbulat pushed a commit that referenced this pull request Feb 19, 2023
ltfschoen pushed a commit to ltfschoen/substrate that referenced this pull request Feb 22, 2023
ltfschoen pushed a commit to ltfschoen/substrate that referenced this pull request Feb 22, 2023
ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023
ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023
Ank4n pushed a commit that referenced this pull request Feb 28, 2023
Ank4n pushed a commit that referenced this pull request Feb 28, 2023
ukint-vs pushed a commit to gear-tech/substrate that referenced this pull request Apr 10, 2023
ukint-vs pushed a commit to gear-tech/substrate that referenced this pull request Apr 10, 2023
nathanwhit pushed a commit to nathanwhit/substrate that referenced this pull request Jul 19, 2023
nathanwhit pushed a commit to nathanwhit/substrate that referenced this pull request Jul 19, 2023
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Apr 9, 2024
…3983)

This PR brings the fix
paritytech/substrate#13396 to polkadot-sdk.

In the past, due to insufficient inbound slot count on polkadot &
kusama, this fix led to low peer count. The situation has improved since
then after changing the default ratio between `--in-peers` &
`--out-peers`.

Nevertheless, it's expected that the reported total peer count with this
fix is going to be lower than without it. This should be seen as the
correct number of working connections reported, as opposed to also
reporting already closed connections, and not as lower count of working
connections with peers.

This PR also removes the peer eviction mechanism, as closed substream
detection is a more granular way of detecting peers that stopped syncing
with us.

The burn-in has been already performed as part of testing these changes
in #3426.

---------

Co-authored-by: Aaro Altonen <a.altonen@hotmail.com>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Apr 9, 2024
…3983)

This PR brings the fix
paritytech/substrate#13396 to polkadot-sdk.

In the past, due to insufficient inbound slot count on polkadot &
kusama, this fix led to low peer count. The situation has improved since
then after changing the default ratio between `--in-peers` &
`--out-peers`.

Nevertheless, it's expected that the reported total peer count with this
fix is going to be lower than without it. This should be seen as the
correct number of working connections reported, as opposed to also
reporting already closed connections, and not as lower count of working
connections with peers.

This PR also removes the peer eviction mechanism, as closed substream
detection is a more granular way of detecting peers that stopped syncing
with us.

The burn-in has been already performed as part of testing these changes
in #3426.

---------

Co-authored-by: Aaro Altonen <a.altonen@hotmail.com>
girazoki pushed a commit to moondance-labs/polkadot-sdk that referenced this pull request Apr 23, 2024
…aritytech#3983)

This PR brings the fix
paritytech/substrate#13396 to polkadot-sdk.

In the past, due to insufficient inbound slot count on polkadot &
kusama, this fix led to low peer count. The situation has improved since
then after changing the default ratio between `--in-peers` &
`--out-peers`.

Nevertheless, it's expected that the reported total peer count with this
fix is going to be lower than without it. This should be seen as the
correct number of working connections reported, as opposed to also
reporting already closed connections, and not as lower count of working
connections with peers.

This PR also removes the peer eviction mechanism, as closed substream
detection is a more granular way of detecting peers that stopped syncing
with us.

The burn-in has been already performed as part of testing these changes
in paritytech#3426.

---------

Co-authored-by: Aaro Altonen <a.altonen@hotmail.com>
girazoki pushed a commit to moondance-labs/polkadot-sdk that referenced this pull request Apr 23, 2024
…aritytech#3983)

This PR brings the fix
paritytech/substrate#13396 to polkadot-sdk.

In the past, due to insufficient inbound slot count on polkadot &
kusama, this fix led to low peer count. The situation has improved since
then after changing the default ratio between `--in-peers` &
`--out-peers`.

Nevertheless, it's expected that the reported total peer count with this
fix is going to be lower than without it. This should be seen as the
correct number of working connections reported, as opposed to also
reporting already closed connections, and not as lower count of working
connections with peers.

This PR also removes the peer eviction mechanism, as closed substream
detection is a more granular way of detecting peers that stopped syncing
with us.

The burn-in has been already performed as part of testing these changes
in paritytech#3426.

---------

Co-authored-by: Aaro Altonen <a.altonen@hotmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Peering mismatch
4 participants