-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FLIP for Handing Messages from Networking Layer to Engines #343
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/onflow/flow-docs/7YV9SjxcWYRV3FRoQ9QqBqNtsTDM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great writeup 👏 . Generally very much in favour of this change. A few suggestions and comments noted below.
Networking Layer Behaviour
Something you point out here is that the application layer is better positioned to manage queued messages, so it should be able to manage message queueing. I think we should take the full step here and say the engine must manage message queueing and implementations of MessageProcessor
must be non-blocking.
This makes the separation of concerns more clear: the networking layer passes messages to engines as quickly as it can, does not store/queue messages, and will drop a message if the MessageProcessor
doesn't accept it; the engine is responsible for all queueing, management of the message queue, and message processing. I also think this is necessary to fully solve the problem you point out:
If one of the engines (e.g.
Engine C
) gets overwhelmed with large number of messages, this engine will likely delay messages for all other engines as the workers will soon all be blocked by the overwhelmed engine.
Allowing non-blocking MessageProcessor
s isn't sufficient to fix this -- we need to enforce that they are non-blocking. Explicit use of channels in the API would make this fairly simple to implement:
// in networking layer
channel := messageProcessor.RecvChan()
select {
case channel <- msg:
case time.After(timeout):
log.Str("msg_type", msg.Type()).Msg("dropped message")
}
Interface Changes
I think we should remove the network.Engine
interface and the SubmitLocal
, Process
functions that are copied in every engine to satisfy it. These are rarely used and only exist because network.Engine
was originally created with more methods than it needed. I think we should replace the instances of in-process engines communicating using these methods with context-specific methods (ie. rather than doing engine.ProcessLocal(block)
, do engine.HandleBlock(block)
.
Minor suggestion: make MessageProcessor
an interface rather than a function type to discourage the use of anonymous functions.
Thanks for the write up.
Then following on what Jordan mentioned, the network layer will try to push the incoming message to the I can remove the |
…Process` methods (from the interface) and replace them with context-specific methods
Thanks @jordanschalm and @vishalchangrani for the great feedback and detailed suggestions. Updated Flipincorporated your suggestions:
Regarding the suggestion to use a channel as part of the API:
I feel that channels are fairly restrictive in such high-level APIs. Some thoughts:
|
@AlexHentschel - agree with what you say - we would need boilerplate code if we use channels for sure. However, I would like to suggest that the Thanks for putting this proposal together 👍 |
💯 |
Great write up and improvement! 👍 I like the idea to simplify the network interface. Just have some comments:
|
Some clarification: the idea is to have some bound on the invocation of But I agree with Alex's point that this is restrictive. Given that it's a private API I'm happy to not enforce the non-blocking requirement the way I suggested. We could detect misbehaving engines with metrics on the timing of |
Spoke with @yhassanzadeh13 and @noisygerman and we think it will be best to retain the message queue at the networking layer even after the queuing logic has been added to the individual engines because:
However, once the engines have the queue implemented, the networking queue can:
|
totally on-board with retaining the queue at the networking level.
Not sure what the utility of more workers will be (?). When Engines are non-blocking (and queueing messages internally), a few workers should totally suffice to deliver the messages to the engines. I am worried that an excessive amount of workers might actually obfuscate problems on the engine level, e.g. if an engine has a bug and blocks a worker we would only see this very rarely when there are 1000 workers. |
FLIP 343 to generalize the API though which the Networking Layer hands messages to Engines