Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLIP] Network API change #1306

Merged
merged 4 commits into from Oct 7, 2021
Merged

[FLIP] Network API change #1306

merged 4 commits into from Oct 7, 2021

Conversation

synzhu
Copy link
Contributor

@synzhu synzhu commented Sep 17, 2021

This assumption is no longer true today. The access node, for example, may communicate with multiple (unstaked) consensus followers. It's perceivable that in the future there will be even more cases where communication with unstaked parties may happen (for example, execution nodes talking to DPS).

Currently, a [`Message`](https://github.com/onflow/flow-go/blob/698c77460bc33d1a8ee8a154f7fe4877bc518a02/network/message/message.proto) which is sent over the network contains many unnecessary fields which can be deduced by the receiver of the message. The only exceptions to this are the `Payload` field (which contains the actual message data) and the `TargetIDs` field (which is used by `Multicast`).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean excluding the OriginID from the message? if that is the case, I would have this concern about breaking self-certifying feature of messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my response to that comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things on top of that comment:

  • besides the libp2p-based message signing, which is specifically a network-level whole-message signature by the source of the message, we do not have protocol message signing today,
  • unless and until we mandate whole-message signatures, which we currently do not do, the messages are not self-certifying through any feature of the protocol,
  • but libp2p does mandate this, and does provide whole-message self-certification,
  • we also do have legitimate places where we have signatures for components of the message (e.g. block proposal signatures from the block proposal emitter, or votes for a block within a QC), in general using the staking key, and not always from the emitter (e.g. QC). I do not believe that's because the staking keys are more authoritative.

Besides the fact that I fail to understand any additional guarantee that protocol-level whole-message signing would bring us:

  • adding message signatures (and verifying them) using the networking key (on top of what already exists in libp2p) is an order of magnitude more expensive in (each of) CPU & bandwidth resources than any correspondence we may compute between a PeerID and a FlowID.
  • adding message signatures (and verifying them) using the staking key is an order of magnitude more expensive in CPU than doing the same thing with the networking key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of my discussion with @yhassanzadeh13:

  • I think we're on the same page on this, but that the FLIP, as it is, fails to capture a nuance, let me try to explain:
  • There is and should remain a space in the application for messages which application-level sender is not the application-level creator (and application-level signer) of the claimed artefact (including, but not limited to votes on a block, when retransmitted by the next views' leader)
  • the FLIP should outline that this application-level indication of origination is legitimate, but should be handled within the application's message payload.
  • Indeed the point of the FLIP is that we're not removing an already-used claim to application-level origin.
  • The fact of the matter is we're suggesting that in the following:
libp2p.msg { From, msg.Payload { OriginID, msg { <application-level protobuf> }, targetID? }, SIgningKey, Signature} 

From, for a valid message, is always equal (modulo format conversion) to OriginID. Hence the OriginID information is redundant.

  • But the use case for application-level origination is still valid. So we should also explain that, should we need one, we should (and do) encapsulate in the application data, in it this way:
libp2p.msg { From, msg.Payload { msg { <application-level part 1> { OriginID, sub-msg, OriginSignature}, <application-level part 2> }, targetID? }, SigningKey, Signature} 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of my discussion with @yhassanzadeh13:

  • I think we're on the same page on this, but that the FLIP, as it is, fails to capture a nuance, let me try to explain:
  • There is and should remain a space in the application for messages which application-level sender is not the application-level creator (and application-level signer) of the claimed artefact (including, but not limited to votes on a block, when retransmitted by the next views' leader)
  • the FLIP should outline that this application-level indication of origination is legitimate, but should be handled within the application's message payload.
  • Indeed the point of the FLIP is that we're not removing an already-used claim to application-level origin.
  • The fact of the matter is we're suggesting that in the following:
libp2p.msg { From, msg.Payload { OriginID, msg { <application-level protobuf> }, targetID? }, SIgningKey, Signature} 

From, for a valid message, is always equal (modulo format conversion) to OriginID. Hence the OriginID information is redundant.

  • But the use case for application-level origination is still valid. So we should also explain that, should we need one, we should (and do) encapsulate in the application data, in it this way:
libp2p.msg { From, msg.Payload { msg { <application-level part 1> { OriginID, sub-msg, OriginSignature}, <application-level part 2> }, targetID? }, SigningKey, Signature} 

Thanks for summarizing the discussions @huitseeker, it perfectly addresses the concern I initially explained.

### Motivations
- Having a strict separation between the public and private networks provides better safety by preventing unintended __ of messages between the two networks, and makes it easier to implement mechanisms for message prioritization / rate-limiting on staked nodes which participate in both.
- Passing `Context`s gives the network layer the ability to cancel the processing of a network message. This can be leveraged to implement [timeouts](https://pkg.go.dev/context#WithTimeout), but may also be useful for other situations. For example, if the network layer becomes aware that a certain peer has become unreachable, it can cancel the processing of any sync requests from that peer.
- Since existing calls to `Multicast` only target 3 peers, changing the implementation to use direct connections instead of gossip will reduce traffic on the network and make it more efficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on benchmarking we have done using pubsub instead of multicast not only does not degrade the performance but also reduces the goroutines and memory of nodes by a gain of at least 15%. Unicast is an expensive operation on the underlying network since it requires establishing a TCP connection to the receiver in case one does not exist. Establishing and maintaining such connections seem more expensive than running topic validators. Also, note that the type of connections and their lifecycle are quite different in libp2p vs pubsub, the former utilizes streams (i.e., TCP-based in our case) and delegates lifecycle management to the developer, while the latter relies on RPC with the lifecycle managed by libp2p. Also going BFT, we should limit the surface as well as number of connections one node can make to another node, especially on the unstaked network.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the size of the network the benchmarking was done with, but the message overhead from using pubsub vs direct connections scales with the number of nodes on the network, so the larger the network is the more of an issue this will be.

Imagine a network with 10000 nodes. Under the current implementation, if one of those nodes sends a multicast targeting three other nodes, all 10000 nodes will need to receive and process that message, and 99997 of them will drop the message after it reaches the application level message validator (note we cannot use libp2p topic validators for this), while only 3 of them actually process the message. This seems very unnecessary.

Also going BFT, we should limit the surface as well as number of connections one node can make to another node, especially on the unstaked network.

This will need to be dealt with anyways, whether or not we choose to change the multicast implementation. As long as there are any valid scenarios where Unicast is used, we will need to have a mechanism to rate-limit these connections

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhassanzadeh13 Are there any artefacts from this benchmark? Where could we read about the data you collected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huitseeker yes, please find the report and data here, also the links on the report will lead you to Grafana dashboards containing the experimentation data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the size of the network the benchmarking was done with, but the message overhead from using pubsub vs direct connections scales with the number of nodes on the network, so the larger the network is the more of an issue this will be.

Imagine a network with 10000 nodes. Under the current implementation, if one of those nodes sends a multicast targeting three other nodes, all 10000 nodes will need to receive and process that message, and 99997 of them will drop the message after it reaches the application level message validator (note we cannot use libp2p topic validators for this), while only 3 of them actually process the message. This seems very unnecessary.

Also going BFT, we should limit the surface as well as number of connections one node can make to another node, especially on the unstaked network.

This will need to be dealt with anyways, whether or not we choose to change the multicast implementation. As long as there are any valid scenarios where Unicast is used, we will need to have a mechanism to rate-limit these connections

Your argument is valid @smnzhu, and by the way, the experimentation was in the order of ~150 nodes. When both scalability and security are a concern, maybe we need to refactor the solution with the following problem definition:

  • Currently, we drive multicast on top of pubsub which causes every node subscribed to the topic to receive the message. However, practically, we multicast to a limited number of nodes ~3, this results in both bandwidth overhead for non-targeted nodes to receive the message and drop it, as well as attacking surface for malicious peers to flood the system.

  • Driving a multicast over independent unicasts seems to resolve the above problem, however, it adds its own cons: see this and this benchmarking. So, my point is solving the multicast issue through switching to unicast may lead us to bigger performance + security problems. Moreover, even if we choose to go with driving them on multicast, we also need to be cautious that in unicast cases we should distinguish whether to drive on unicast or pubsub based on the target ids size, otherwise, one multicast may cause hundreds of unicasts in the underlying network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwinkler2083233 @simonhf
I'd love to have your input on the above: this FLIP is suggesting a change in the implementation of multicast messages (messages destined for a plurality of recipients) from a pubsub gossip message to several unicast deliveries.

The gist of the benchmarks above is that this change would cost us in terms of resources (connections, memory and goroutines). What do you think of our ability to:

  • attempt that change ? Do we have some margin of maneuver, today?
  • track the performance changes involved in this change?

More generally, do you have alternative suggestions? @JeffreyDoyle I know you were suggesting QUIC in https://github.com/dapperlabs/flow-go/issues/5708, where QUIC has real multicast delivery. Any direciton we should explore there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huitseeker thanks for the notification!

Unicast is an expensive operation on the underlying network since it requires establishing a TCP connection to the receiver in case one does not exist. Establishing and maintaining such connections seem more expensive than running topic validators.

I agree with this comment in general for a naive implementation, however, if the implementation focuses on the following aspects then I believe it is possible to reap the benefits, and make it the least expensive option:

  • Aspect: Pre-establish all TCP connections and only re-open them when absolutely necessary.
  • Aspect: Do not use / tie one or more goroutines per TCP connection.
  • Aspect: Pre-allocate all main data structures associated with sending and receiving via a TCP connection.

In this scenario, the number of goroutines stays the same whether 1000, 10000 or 1M TCP connections. There is no overhead for establishing or closing a TCP connection (however, there will be stragglers due to network issues and bugs) during node operation because all TCP connections are pre-established at startup or before use. If all lower level data structures (e.g. input and output buffers to read packets into) per TCP connection are pre-allocated, then only a fixed bump in RAM usage will be noticed upon node start, with no extra GC necessary and/or dependent upon the number of TCP connections.

In a previous job (around 2008) I worked on a TCP connection socket server daemon which used these techniques with data structures for up to 1M TCP connections pre-created upon daemon startup, and using the Linux kernel's epoll mechanism which decouples TCP connections from threads. Although we don't need anywhere near 1M TCP connections, this serves as proof that such a model works and is technically possible. Back then a Linux kernel could handle opening 1M concurrent TCP connections without too much fuss, but it could only establish about 30k new TCP connections per second regardless of RAM and CPUs due to internal kernel locks, etc. I would expect these types of limitations to be much better in the meantime, and numbers like 30k are likely well above numbers that we need to use anyway :-)

But how to implement such a model in Golang and/or libp2p? This would require more research... There seems to be no technical reason why Golang cannot achieve the same result. However, existing packages may have not been designed that way -- or if more flexible -- used that way. Although Golang is capable of having 1M goroutines, just because it's capable, doesn't mean it's a good idea... :-)

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.


> While this may seem to break the general rule of not storing `Context`s in structs, storing `Context`s in structs which are being passed like parameters is one of the exceptions to this rule. See [this](https://github.com/golang/go/issues/22602#:~:text=While%20we%27ve%20told,documentation%20and%20examples.) and [this](https://medium.com/@cep21/how-to-correctly-use-context-context-in-go-1-7-8f2c0fafdf39#:~:text=The%20one%20exception%20to%20not%20storing%20a%20context%20is%20when%20you%20need%20to%20put%20it%20in%20a%20struct%20that%20is%20used%20purely%20as%20a%20message%20that%20is%20passed%20across%20a%20channel.%20This%20is%20shown%20in%20the%20example%20below.). The idea is that `Context`s should not be **stored** but should **flow** through the program, which is what they do in this usecase.

When the message is dequeued, the engine should check the `Context` to see whether the message is still valid before processing it. At this point, we will have two distinct `Context`s in scope:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we carefully differentiate between message validity and whether a message might be obsolete. Here my suggestion:

  • validity: is an objective property independent of time. Criteria for message validity include: are all the required fields set? Is the signature valid? etc.
    • for all messages, the protocol generally has rigid criteria for message validity that all have to be satisfied. Invalid messages must be dropped and potentially slashed (depending on the violated protocol criteria).
  • However, for most messages, the recipient also has a notion of "is it still necessary for me to process the message". Maybe we could term this notion as "obsolete" or "outdated" (open to any other suggestions)
    • in contrast to message validity, obsolescence is often defined heuristically. Obsolete messages might be dropped but don't have to be.
Suggested change
When the message is dequeued, the engine should check the `Context` to see whether the message is still valid before processing it. At this point, we will have two distinct `Context`s in scope:
When the message is dequeued, the engine should check the `Context` to see whether the message might already be obsolete before processing it. At this point, we will have two distinct `Context`s in scope:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great distinction, thanks :)

flips/network-api.md Outdated Show resolved Hide resolved
flips/network-api.md Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2021

Codecov Report

Merging #1306 (e4bb033) into master (28862e1) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1306      +/-   ##
==========================================
- Coverage   55.33%   55.29%   -0.04%     
==========================================
  Files         516      516              
  Lines       32167    32168       +1     
==========================================
- Hits        17798    17787      -11     
- Misses      11979    11993      +14     
+ Partials     2390     2388       -2     
Flag Coverage Δ
unittests 55.29% <100.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
engine/execution/computation/computer/computer.go 84.40% <100.00%> (+0.07%) ⬆️
...sus/approvals/assignment_collector_statemachine.go 42.30% <0.00%> (-9.62%) ⬇️
...ngine/common/synchronization/finalized_snapshot.go 68.75% <0.00%> (-4.17%) ⬇️
admin/command_runner.go 77.58% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5fc8ac...e4bb033. Read the comment docs.

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FLIP has shapen up wonderfully, there's just the matter of the multicast change where I'd still love to get feedback from @simonhf / @jwinkler2083233

### Motivations
- Having a strict separation between the public and private networks provides better safety by preventing unintended __ of messages between the two networks, and makes it easier to implement mechanisms for message prioritization / rate-limiting on staked nodes which participate in both.
- Passing `Context`s gives the network layer the ability to cancel the processing of a network message. This can be leveraged to implement [timeouts](https://pkg.go.dev/context#WithTimeout), but may also be useful for other situations. For example, if the network layer becomes aware that a certain peer has become unreachable, it can cancel the processing of any sync requests from that peer.
- Since existing calls to `Multicast` only target 3 peers, changing the implementation to use direct connections instead of gossip will reduce traffic on the network and make it more efficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwinkler2083233 @simonhf
I'd love to have your input on the above: this FLIP is suggesting a change in the implementation of multicast messages (messages destined for a plurality of recipients) from a pubsub gossip message to several unicast deliveries.

The gist of the benchmarks above is that this change would cost us in terms of resources (connections, memory and goroutines). What do you think of our ability to:

  • attempt that change ? Do we have some margin of maneuver, today?
  • track the performance changes involved in this change?

More generally, do you have alternative suggestions? @JeffreyDoyle I know you were suggesting QUIC in https://github.com/dapperlabs/flow-go/issues/5708, where QUIC has real multicast delivery. Any direciton we should explore there?

@jwinkler2083233
Copy link
Contributor

The FLIP has shapen up wonderfully, there's just the matter of the multicast change where I'd still love to get feedback from @simonhf / @jwinkler2083233

I haven't seen evidence of true multicasting, with multiple intended recipients per packet. Logically, the difference is with the management of individual connections, then. If we move to more unicast style pubsub management, then the onus to manage the connection list shifts to other parts of the code. Ultimately, it gives us more flexibility.
Regarding memory management, resource management, speed of connections, latencies, etc., this libp2p library is what we've been working with, and so we have to make the best of it. I suspect that, within a year, operational restrictions will mean we have to start to harden the networking layer, and use the native C++ version of libp2p to get finer-grained control. We're already seeing strain in the grpc connections.

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, while highlighting the suggestions of thread-pooling connections of @simonhf and the suggestions of @jwinkler2083233, to serve as implementation notes.

@synzhu
Copy link
Contributor Author

synzhu commented Oct 7, 2021

bors merge

@bors
Copy link
Contributor

bors bot commented Oct 7, 2021

@bors bors bot merged commit a099492 into master Oct 7, 2021
@bors bors bot deleted the smnzhu/network-api-flip branch October 7, 2021 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants