Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLIP] Network API change #1306

Merged
merged 4 commits into from
Oct 7, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
93 changes: 93 additions & 0 deletions flips/network-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Network Layer API (Core Protocol)

| Status | Proposed |
:-------------- |:--------------------------------------------------------- |
| **FLIP #** | [1306](https://github.com/onflow/flow-go/pull/1306) |
| **Author(s)** | Simon Zhu (simon.zhu@dapperlabs.com) |
| **Sponsor** | Simon Zhu (simon.zhu@dapperlabs.com) |
| **Updated** | 9/16/2021 |

## Objective

Refactor the networking layer to split it into separate APIs for the public and private network, allow us to implement a strict separation in the code between these two networks.

Enable registering a custom message ID function for the gossip layer.

## Current Implementation

When the network layer receives a message, it will pass the message to the [`Engine`](https://github.com/onflow/flow-go/blob/7763000ba5724bb03f522380e513b784b4597d46/network/engine.go) registered on
the corresponding channel by [calling the engine's `Process` method](https://github.com/onflow/flow-go/blob/d31fd63eb651ed9faf0f677e9934baef6c4d9792/network/p2p/network.go#L406), passing it the Flow ID of the message sender.

[`Multicast`](https://github.com/onflow/flow-go/blob/4ddc17d1bee25c2ab12ceabcf814b702980fdebe/network/conduit.go#L82) is implemented by including a [`TargetIDs`](https://github.com/onflow/flow-go/blob/4ddc17d1bee25c2ab12ceabcf814b702980fdebe/network/message/message.proto#L12) field inside the message, which is published to a specific topic on the underlying gossip network. Upon receiving a new message on the gossip network, nodes must first [validate](https://github.com/onflow/flow-go/blob/4ddc17d1bee25c2ab12ceabcf814b702980fdebe/network/validator/targetValiator.go) that they are one of the intended recipients of the message before processing it.

### Potential problems

The current network layer API was designed with the assumption that all messages sent and received either target or originate from staked Flow nodes. This is why an engine's [`Process`](https://github.com/onflow/flow-go/blob/master/network/engine.go#L28) method accepts a Flow ID identifying the message sender, and outgoing messages [must specify Flow ID(s)](https://github.com/onflow/flow-go/blob/master/network/conduit.go#L62) as targets.

This assumption is no longer true today. The access node, for example, may communicate with multiple (unstaked) consensus followers. It's perceivable that in the future there will be even more cases where communication with unstaked parties may happen (for example, execution nodes talking to DPS).

Currently, a [`Message`](https://github.com/onflow/flow-go/blob/698c77460bc33d1a8ee8a154f7fe4877bc518a02/network/message/message.proto) which is sent over the network contains many unnecessary fields which can be deduced by the receiver of the message. The only exceptions to this are the `Payload` field (which contains the actual message data) and the `TargetIDs` field (which is used by `Multicast`).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean excluding the OriginID from the message? if that is the case, I would have this concern about breaking self-certifying feature of messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my response to that comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things on top of that comment:

  • besides the libp2p-based message signing, which is specifically a network-level whole-message signature by the source of the message, we do not have protocol message signing today,
  • unless and until we mandate whole-message signatures, which we currently do not do, the messages are not self-certifying through any feature of the protocol,
  • but libp2p does mandate this, and does provide whole-message self-certification,
  • we also do have legitimate places where we have signatures for components of the message (e.g. block proposal signatures from the block proposal emitter, or votes for a block within a QC), in general using the staking key, and not always from the emitter (e.g. QC). I do not believe that's because the staking keys are more authoritative.

Besides the fact that I fail to understand any additional guarantee that protocol-level whole-message signing would bring us:

  • adding message signatures (and verifying them) using the networking key (on top of what already exists in libp2p) is an order of magnitude more expensive in (each of) CPU & bandwidth resources than any correspondence we may compute between a PeerID and a FlowID.
  • adding message signatures (and verifying them) using the staking key is an order of magnitude more expensive in CPU than doing the same thing with the networking key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of my discussion with @yhassanzadeh13:

  • I think we're on the same page on this, but that the FLIP, as it is, fails to capture a nuance, let me try to explain:
  • There is and should remain a space in the application for messages which application-level sender is not the application-level creator (and application-level signer) of the claimed artefact (including, but not limited to votes on a block, when retransmitted by the next views' leader)
  • the FLIP should outline that this application-level indication of origination is legitimate, but should be handled within the application's message payload.
  • Indeed the point of the FLIP is that we're not removing an already-used claim to application-level origin.
  • The fact of the matter is we're suggesting that in the following:
libp2p.msg { From, msg.Payload { OriginID, msg { <application-level protobuf> }, targetID? }, SIgningKey, Signature} 

From, for a valid message, is always equal (modulo format conversion) to OriginID. Hence the OriginID information is redundant.

  • But the use case for application-level origination is still valid. So we should also explain that, should we need one, we should (and do) encapsulate in the application data, in it this way:
libp2p.msg { From, msg.Payload { msg { <application-level part 1> { OriginID, sub-msg, OriginSignature}, <application-level part 2> }, targetID? }, SigningKey, Signature} 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of my discussion with @yhassanzadeh13:

  • I think we're on the same page on this, but that the FLIP, as it is, fails to capture a nuance, let me try to explain:
  • There is and should remain a space in the application for messages which application-level sender is not the application-level creator (and application-level signer) of the claimed artefact (including, but not limited to votes on a block, when retransmitted by the next views' leader)
  • the FLIP should outline that this application-level indication of origination is legitimate, but should be handled within the application's message payload.
  • Indeed the point of the FLIP is that we're not removing an already-used claim to application-level origin.
  • The fact of the matter is we're suggesting that in the following:
libp2p.msg { From, msg.Payload { OriginID, msg { <application-level protobuf> }, targetID? }, SIgningKey, Signature} 

From, for a valid message, is always equal (modulo format conversion) to OriginID. Hence the OriginID information is redundant.

  • But the use case for application-level origination is still valid. So we should also explain that, should we need one, we should (and do) encapsulate in the application data, in it this way:
libp2p.msg { From, msg.Payload { msg { <application-level part 1> { OriginID, sub-msg, OriginSignature}, <application-level part 2> }, targetID? }, SigningKey, Signature} 

Thanks for summarizing the discussions @huitseeker, it perfectly addresses the concern I initially explained.

However, all of the existing calls to `Multicast` only target a very small number of recipients (3 to be exact), which means that there is a lot of noise on the network causing nodes to waste CPU cycles processing messages only to ignore them once they realize they are not one of the intended recipients.

## Proposal

We should split the existing network layer API into two distinct APIs / packages for the public and private network, and the `Engine` API should be modified so that the [`Process`](https://github.com/onflow/flow-go/blob/master/network/engine.go#L28) and [`Submit`](https://github.com/onflow/flow-go/blob/master/network/engine.go#L20) methods receive a `Context` as the first argument:

* Private network
```golang
type Engine interface {
Submit(ctx context.Context, channel Channel, originID flow.Identifier, event interface{})
Process(ctx context.Context, channel Channel, originID flow.Identifier, event interface{}) error
}

type Conduit interface {
Publish(event interface{}, targetIDs ...flow.Identifier) error
Unicast(event interface{}, targetID flow.Identifier) error
Multicast(event interface{}, num uint, targetIDs ...flow.Identifier) error
}
```
* Public network
```golang
type Engine interface {
Submit(ctx context.Context, channel Channel, senderPeerID peer.ID, event interface{})
Process(ctx context.Context, channel Channel, senderPeerID peer.ID, event interface{}) error
}

type Conduit interface {
Publish(event interface{}, targetIDs ...peer.ID) error
Unicast(event interface{}, targetID peer.ID) error
Multicast(event interface{}, num uint, targetIDs ...peer.ID) error
}
```

Various types of request-scoped data may be included in the `Context` as [values](https://pkg.go.dev/context#WithValue). For example, if a message sent on the public network originates from a staked node, that node's Flow ID may be included as a value. Once engine-side message queues are standardized as described in [FLIP 343](https://github.com/onflow/flow/pull/343), the given `Context` can be placed in the message queue along with the message itself in a wrapper struct:

```golang
type Message struct {
ctx context.Context
event interface{}
}
```

> While this may seem to break the general rule of not storing `Context`s in structs, storing `Context`s in structs which are being passed like parameters is one of the exceptions to this rule. See [this](https://github.com/golang/go/issues/22602#:~:text=While%20we%27ve%20told,documentation%20and%20examples.) and [this](https://medium.com/@cep21/how-to-correctly-use-context-context-in-go-1-7-8f2c0fafdf39#:~:text=The%20one%20exception%20to%20not%20storing%20a%20context%20is%20when%20you%20need%20to%20put%20it%20in%20a%20struct%20that%20is%20used%20purely%20as%20a%20message%20that%20is%20passed%20across%20a%20channel.%20This%20is%20shown%20in%20the%20example%20below.). The idea is that `Context`s should not be **stored** but should **flow** through the program, which is what they do in this usecase.

When the message is dequeued, the engine should check the `Context` to see whether the message might already be obsolete before processing it. At this point, we will have two distinct `Context`s in scope:
* The message `Context`
* The `Context` of the goroutine which is dequeing / processing the message

These can be combined into a [single context](https://github.com/teivah/onecontext) which can be used by the message processing business logic, so that the processing can be cancelled either by the network or by the engine. This will allow us to deprecate [`engine.Unit`](https://github.com/onflow/flow-go/blob/master/engine/unit.go), which uses a single `Context` for the entire engine.

There are certain types of messages (e.g block proposals) which may transit between the private and public networks via relay nodes (e.g Access Nodes). Libp2p's [default message ID function](https://github.com/libp2p/go-libp2p-pubsub/blob/0c7092d1f50091ae88407ba93103ac5868da3d0a/pubsub.go#L1040-L1043) will treat a message originating from one network, relayed to the other network by `n` distinct relay nodes, as `n` distinct messages, causing unnacceptable message duplification / traffic amplification. In order to prevent this, we will need to define a [custom message ID function](https://pkg.go.dev/github.com/libp2p/go-libp2p-pubsub#WithMessageIdFn) which returns the hash of the message [`Payload`](https://github.com/onflow/flow-go/blob/698c77460bc33d1a8ee8a154f7fe4877bc518a02/network/message/message.proto#L13).

In order to avoid making the message ID function deserialize the `Message` to access the `Payload`, we need to remove all other fields from the `Message` protobuf so that the message ID function can simply take the hash of the the pubsub [`Data`](https://github.com/libp2p/go-libp2p-pubsub/blob/0c7092d1f50091ae88407ba93103ac5868da3d0a/pb/rpc.pb.go#L145) field without needing to do any deserialization.

The `Multicast` implementation will need to be changed to make direct connections to the target peers instead of sending messages with a `TargetIDs` field via gossip.

### Motivations
- Having a strict separation between the public and private networks provides better safety by preventing unintended passage of messages between the two networks, and makes it easier to implement mechanisms for message prioritization / rate-limiting on staked nodes which participate in both.
- Passing `Context`s gives the network layer the ability to cancel the processing of a network message. This can be leveraged to implement [timeouts](https://pkg.go.dev/context#WithTimeout), but may also be useful for other situations. For example, if the network layer becomes aware that a certain peer has become unreachable, it can cancel the processing of any sync requests from that peer.
- Since existing calls to `Multicast` only target 3 peers, changing the implementation to use direct connections instead of gossip will reduce traffic on the network and make it more efficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on benchmarking we have done using pubsub instead of multicast not only does not degrade the performance but also reduces the goroutines and memory of nodes by a gain of at least 15%. Unicast is an expensive operation on the underlying network since it requires establishing a TCP connection to the receiver in case one does not exist. Establishing and maintaining such connections seem more expensive than running topic validators. Also, note that the type of connections and their lifecycle are quite different in libp2p vs pubsub, the former utilizes streams (i.e., TCP-based in our case) and delegates lifecycle management to the developer, while the latter relies on RPC with the lifecycle managed by libp2p. Also going BFT, we should limit the surface as well as number of connections one node can make to another node, especially on the unstaked network.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the size of the network the benchmarking was done with, but the message overhead from using pubsub vs direct connections scales with the number of nodes on the network, so the larger the network is the more of an issue this will be.

Imagine a network with 10000 nodes. Under the current implementation, if one of those nodes sends a multicast targeting three other nodes, all 10000 nodes will need to receive and process that message, and 99997 of them will drop the message after it reaches the application level message validator (note we cannot use libp2p topic validators for this), while only 3 of them actually process the message. This seems very unnecessary.

Also going BFT, we should limit the surface as well as number of connections one node can make to another node, especially on the unstaked network.

This will need to be dealt with anyways, whether or not we choose to change the multicast implementation. As long as there are any valid scenarios where Unicast is used, we will need to have a mechanism to rate-limit these connections

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhassanzadeh13 Are there any artefacts from this benchmark? Where could we read about the data you collected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huitseeker yes, please find the report and data here, also the links on the report will lead you to Grafana dashboards containing the experimentation data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the size of the network the benchmarking was done with, but the message overhead from using pubsub vs direct connections scales with the number of nodes on the network, so the larger the network is the more of an issue this will be.

Imagine a network with 10000 nodes. Under the current implementation, if one of those nodes sends a multicast targeting three other nodes, all 10000 nodes will need to receive and process that message, and 99997 of them will drop the message after it reaches the application level message validator (note we cannot use libp2p topic validators for this), while only 3 of them actually process the message. This seems very unnecessary.

Also going BFT, we should limit the surface as well as number of connections one node can make to another node, especially on the unstaked network.

This will need to be dealt with anyways, whether or not we choose to change the multicast implementation. As long as there are any valid scenarios where Unicast is used, we will need to have a mechanism to rate-limit these connections

Your argument is valid @smnzhu, and by the way, the experimentation was in the order of ~150 nodes. When both scalability and security are a concern, maybe we need to refactor the solution with the following problem definition:

  • Currently, we drive multicast on top of pubsub which causes every node subscribed to the topic to receive the message. However, practically, we multicast to a limited number of nodes ~3, this results in both bandwidth overhead for non-targeted nodes to receive the message and drop it, as well as attacking surface for malicious peers to flood the system.

  • Driving a multicast over independent unicasts seems to resolve the above problem, however, it adds its own cons: see this and this benchmarking. So, my point is solving the multicast issue through switching to unicast may lead us to bigger performance + security problems. Moreover, even if we choose to go with driving them on multicast, we also need to be cautious that in unicast cases we should distinguish whether to drive on unicast or pubsub based on the target ids size, otherwise, one multicast may cause hundreds of unicasts in the underlying network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwinkler2083233 @simonhf
I'd love to have your input on the above: this FLIP is suggesting a change in the implementation of multicast messages (messages destined for a plurality of recipients) from a pubsub gossip message to several unicast deliveries.

The gist of the benchmarks above is that this change would cost us in terms of resources (connections, memory and goroutines). What do you think of our ability to:

  • attempt that change ? Do we have some margin of maneuver, today?
  • track the performance changes involved in this change?

More generally, do you have alternative suggestions? @JeffreyDoyle I know you were suggesting QUIC in https://github.com/dapperlabs/flow-go/issues/5708, where QUIC has real multicast delivery. Any direciton we should explore there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huitseeker thanks for the notification!

Unicast is an expensive operation on the underlying network since it requires establishing a TCP connection to the receiver in case one does not exist. Establishing and maintaining such connections seem more expensive than running topic validators.

I agree with this comment in general for a naive implementation, however, if the implementation focuses on the following aspects then I believe it is possible to reap the benefits, and make it the least expensive option:

  • Aspect: Pre-establish all TCP connections and only re-open them when absolutely necessary.
  • Aspect: Do not use / tie one or more goroutines per TCP connection.
  • Aspect: Pre-allocate all main data structures associated with sending and receiving via a TCP connection.

In this scenario, the number of goroutines stays the same whether 1000, 10000 or 1M TCP connections. There is no overhead for establishing or closing a TCP connection (however, there will be stragglers due to network issues and bugs) during node operation because all TCP connections are pre-established at startup or before use. If all lower level data structures (e.g. input and output buffers to read packets into) per TCP connection are pre-allocated, then only a fixed bump in RAM usage will be noticed upon node start, with no extra GC necessary and/or dependent upon the number of TCP connections.

In a previous job (around 2008) I worked on a TCP connection socket server daemon which used these techniques with data structures for up to 1M TCP connections pre-created upon daemon startup, and using the Linux kernel's epoll mechanism which decouples TCP connections from threads. Although we don't need anywhere near 1M TCP connections, this serves as proof that such a model works and is technically possible. Back then a Linux kernel could handle opening 1M concurrent TCP connections without too much fuss, but it could only establish about 30k new TCP connections per second regardless of RAM and CPUs due to internal kernel locks, etc. I would expect these types of limitations to be much better in the meantime, and numbers like 30k are likely well above numbers that we need to use anyway :-)

But how to implement such a model in Golang and/or libp2p? This would require more research... There seems to be no technical reason why Golang cannot achieve the same result. However, existing packages may have not been designed that way -- or if more flexible -- used that way. Although Golang is capable of having 1M goroutines, just because it's capable, doesn't mean it's a good idea... :-)

- While `engine.Unit` provides some useful functionalities, it also uses the anti-pattern of [storing a `Context` inside a struct](https://github.com/onflow/flow-go/blob/b50f0ffe054103a82e4aa9e0c9e4610c2cbf2cc9/engine/unit.go#L117), something which is [specifically advised against](https://pkg.go.dev/context#:~:text=Do%20not%20store%20Contexts%20inside%20a%20struct%20type%3B%20instead%2C%20pass%20a%20Context%20explicitly%20to%20each%20function%20that%20needs%20it.%20The%20Context%20should%20be%20the%20first%20parameter%2C%20typically%20named%20ctx%3A) by [the developers of Go](https://go.dev/blog/context-and-structs#TOC_2.). Here is an [example](https://go.dev/blog/context-and-structs#:~:text=Storing%20context%20in%20structs%20leads%20to%20confusion) illustrating some of the problems with this approach.

## Implementation (TODO)