Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Wait for all notifications protocols to be open before reporting opening #6821

Merged
merged 9 commits into from
Aug 23, 2020

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Aug 5, 2020

This is the next step towards #5670

Before this PR:

  • When we establish a connection to a node, we try to open a legacy substream and all the notifications protocols substreams.
  • As soon as the legacy substream is open, we report on the API level that we are "connected" to this node, even when the notifications substreams haven't been opened yet.
  • If the API user tries to send a notification on a notifications protocol substream while it isn't open, this notification is instead sent on the legacy substream. This is done for backwards-compatibility reasons, in the situation where the remote doesn't support notifications substreams.

After this PR:

  • (same as before)
  • We only report the connection on the API level after the legacy substream is open and that all the notifications substreams have been either opened or refused.
  • (same as before)

While this change is quite small, it eliminates the possibility of using the legacy substream for notifications with a remote that does support notifications substreams.

The next step, after this PR, would be to report the connection on the API level after all the notifications substreams have been either opened or refused, no matter whether the legacy substream is open. Then, afterwards, to no longer open a legacy substream.

@tomaka tomaka added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Aug 5, 2020
@tomaka tomaka requested review from romanb and mxinden August 5, 2020 09:27
@tomaka tomaka added the A1-needs_burnin Pull request needs to be tested on a live validator node before merge. DevOps is notified via matrix label Aug 5, 2020
@tomaka
Copy link
Contributor Author

tomaka commented Aug 5, 2020

Err... the notifications_back_pressure test just failed on CI.

@tomaka
Copy link
Contributor Author

tomaka commented Aug 6, 2020

bot rebase

@ghost
Copy link

ghost commented Aug 6, 2020

Rebasing.

@tomaka
Copy link
Contributor Author

tomaka commented Aug 6, 2020

Did the burnin on my node. Nothing unusual or suspicious happened.

@tomaka tomaka removed the A1-needs_burnin Pull request needs to be tested on a live validator node before merge. DevOps is notified via matrix label Aug 6, 2020
@tomaka
Copy link
Contributor Author

tomaka commented Aug 6, 2020

Err... the notifications_back_pressure test just failed on CI.

I've rebased the PR over #6826, which I expect to fix the test.

@romanb
Copy link
Contributor

romanb commented Aug 6, 2020

The next step, after this PR, would be to report the connection on the API level after all the notifications substreams have been either opened or refused, no matter whether the legacy substream is open. Then, afterwards, to no longer open a legacy substream.

At the end, isn't it desirable to report the notification protocols as open/closed individually and independently of each other? Such that the user of the API doesn't even know whether a notification protocol is implemented as a substream or connection. It shouldn't be necessary to have the notion of "reporting a connection open/closed" on the API, should it? (I didn't look at the diff yet but wanted to do that now).

@tomaka
Copy link
Contributor Author

tomaka commented Aug 6, 2020

At the end, isn't it desirable to report the notification protocols as open/closed individually and independently of each other? Such that the user of the API doesn't even know whether a notification protocol is implemented as a substream or connection. It shouldn't be necessary to have the notion of "reporting a connection open/closed" on the API, should it? (I didn't look at the diff yet but wanted to do that now).

It's indeed the objective.

There are unfortunately some questions concerning the handshake.
The legacy substream sends a handshake whose content is reported on the API level. The Role of the peer, in particular, is also reported to other parts of Substrate such as GrandPa.
I've made sure a while ago that we were sending this handshake when opening the block-announces/transactions notifications substream, but I don't know yet how to change the various layers within sc-network to properly account for these changes. We might have to special-case the block-announces/transactions protocol.

@tomaka
Copy link
Contributor Author

tomaka commented Aug 6, 2020

To be clear: the public-facing API of sc-network already hides all the implementation details of notifications substream. A user of the API is only informed of individual susbtreams being open and has no way to know whether there exists a connection with a certain peer.

All the uncertainties and potential refactoring only concerns the internals of sc-network.

@tomaka
Copy link
Contributor Author

tomaka commented Aug 20, 2020

The tricky part of this PR is that we can now be in an "intermediate stage" where some substreams are open, but we haven't notified the upper layers of it yet.

It is important, in this situation, to not poll the open substreams for incoming notifications until we have notified the upper layers, otherwise we have no choice but to discard these incoming notifications.

This is the reason why the tests were failing earlier, and should now be fixed.

@tomaka tomaka mentioned this pull request Aug 20, 2020
Co-authored-by: Max Inden <mail@max-inden.de>
@mxinden
Copy link
Contributor

mxinden commented Aug 20, 2020

For what it is worth while cargo test -p sc-network gossip::tests::basic_works -- --nocapture; do :; done has not timed out the last 15 minutes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants