[FLIP] Component API #1167

synzhu · 2021-08-19T02:50:39Z

TODO

Add implementation steps

synzhu · 2021-08-19T02:53:50Z

This touches on two somewhat distinct issues, but they are related at a high level so I've included them together for now to give the bigger picture.

They can be separated later on into two FLIPs.

codecov-commenter · 2021-08-19T03:07:30Z

Codecov Report

Merging #1167 (e70f781) into master (08b9420) will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1167      +/-   ##
==========================================
- Coverage   55.32%   55.27%   -0.05%     
==========================================
  Files         516      516              
  Lines       32181    32181              
==========================================
- Hits        17803    17789      -14     
- Misses      11988    12004      +16     
+ Partials     2390     2388       -2

Flag	Coverage Δ
unittests	`55.27% <ø> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...sus/approvals/assignment_collector_statemachine.go	`42.30% <0.00%> (-7.70%)`	⬇️
...ngine/common/synchronization/finalized_snapshot.go	`68.75% <0.00%> (-4.17%)`	⬇️
admin/command_runner.go	`77.58% <0.00%> (-1.73%)`	⬇️
engine/collection/synchronization/engine.go	`62.90% <0.00%> (-1.08%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08b9420...e70f781. Read the comment docs.

m4ksio · 2021-08-19T04:46:57Z

flips/context-propagation.md

+
+### Potential problems 
+#### Non-idempotency of `ReadyDoneAware`
+The current `ReadyDoneAware` interface is misleading, as by the name one might expect that it is only used to check the state of a component. However, in almost all current implementations the `Ready` method is used to both start the component *and* check when it has started up, and similarly for the `Done` method. 


method is used to both start the component and check when it has started up

I'm not sure if I agree. It's used to start the component and returns a way to await for the startup to finish, its not intended as a check. Check is done by reading returned channel.
I do agree that Ready name can be a bit misleading though. Start?

I think this poses a few difficulties.

First, if multiple components want to wait for the startup of another to complete (ie common dependency), then with this approach we would need to pass the channel returned from Ready to all of those components. This also means that we cannot construct those components until we've started the dependency, because we cannot get the ready channel until we call Ready to start it. Ideally, we don't want this restriction because it forces sequentialism of certain steps that could potentially run in parallel.

Example: A needs to wait for B to be ready before it is ready, but it also needs to start its own subcomponent C and wait for that to be ready as well. With the approach in the above paragraph, we need to wait for B to be started before we can even construct A (which then constructs and starts C). Ideally, we should be able to construct A as soon as B is constructed, so that it can construct and start C. This means the startup code of B and C can run in parallel.

Now, what if these components also want to wait for the dependency's shutdown to complete later on? Then we have no way of doing this, because we only get a done channel when we call Done to shutdown the dependency. But these components have already been created by then, meaning they certainly don't have this done channel, and any mechanism we come up with of injecting it afterwards is probably gonna be ugly.

If instead we have a ReadyDoneAware interface which is idempotent, we can pass this to each of the components and they can use it to wait for the dependency to be started up or fully shut down.

I thought about another solution, that might be easier to implement.

Since the goal is to make ReadyDoneAware idempotent, we could achieve by making unit.Ready and unit.Done idempotent, and each engine could wraps its Ready work with unit.Ready.

How to make unit.Ready idempotent?
We could cache the ready channel. If Ready is called more than once, then the cached ready channel will be returned. This way the second call will return the same ready channel and won't trigger additional initialization work.

In fact, it doesn't make sense to me that a module would initialize more than once, and I don't think there is such case exist now.

So what I'm proposing is to just adjust unit.Ready:

func (u *Unit) Ready(checks ...func()) <-chan struct{} { // concurrent call will be blocking by the lock, this ensures only the first process will // call the `checks` function, the second process will wait, and get notified once the first // process finish all the checks. u.Lock() defer u.Unlock() if u.ready != nil { return u.ready } ready := make(chan struct{}) go func() { for _, check := range checks { check() } close(ready) }() c.ready = ready return ready }

In your example, A could be constructed with B whose Ready has been called by its owner, and A can call B's Ready again, which will not trigger another initialization, but wait on the previous initialization call to finish

I think this is one valid solution, and I agree it may be easier to implement given the current state of the codebase. However, I would argue that if I were to design the API from scratch today, I would design it the way described in this FLIP, for all the reasons described in the "Motivations" section. Using contexts to start / stop components is so much more idiomatic in Go, and makes the APIs much more self explanatory. It also makes the separation of concerns more explicit.

m4ksio · 2021-08-19T04:49:32Z

flips/context-propagation.md

+#### Non-idempotency of `ReadyDoneAware`
+The current `ReadyDoneAware` interface is misleading, as by the name one might expect that it is only used to check the state of a component. However, in almost all current implementations the `Ready` method is used to both start the component *and* check when it has started up, and similarly for the `Done` method. 
+
+This introduces issues of concurrency safety, as most implementations do not properly handle the case where the `Ready` or `Done` methods are called more than once. See [this example](https://github.com/onflow/flow-go/pull/1026).


Defensive programming isn't the standard Go approach and I would argue it's not needed here.
Component should have clearly defined lifecycle and the owner should manage the lifecycle.
If there is a need for synchronization this can be provided externally.

My point here is more about the semantics of ReadyDoneAware.

You are correct that the owner of a component should manager the lifecycle. The problem here is that we have not separated lifecycle management from state observation, because ReadyDoneAware currently implements both.

While only one owner should be able to start a component, multiple other components may want to wait for it to be ready. This is where it becomes a problem if calling Ready also starts the component each time.

m4ksio · 2021-08-19T04:56:50Z

flips/context-propagation.md

+  ```golang
+  type Runnable interface {
+    // Run will start the component
+    Run(context.Context)


This might be a JVM talking, but Runnable and Run seems unrelated to starting a component. Wouldn't simply Start be better? Or Initialise? Also, does this function returns immediately or after the component has finished initialisation?
I see no added benefit over using Ready() (with this or other name). Returned channel gives great flexibility, including options to build notification or multiple waits system,

Sure, I'm not tied to the name Runnable haha 😛.

Also, does this function returns immediately or after the component has finished initialisation?

Let me add a better comment for this. It should say "Run will start the component and returns immediately".

I see no added benefit over using Ready() (with this or other name). Returned channel gives great flexibility, including options to build notification or multiple waits system,

The benefits are listed below. We are not getting rid of a returned channel! The channel will just not be returned from this particular method, but from the Ready method (below) 🙂

This approach gives us strictly more flexibility than having a single Ready method that is responsible for both starting the component and waiting for startup to complete.

I can see that one of the reasons this FLIP proposed was to handle the non-idempotency of the ReadyDoneAware. However, it seems the problem is bleeding to a separate domain by introducing Runnable. I consider the solution does not fully comply with the problem defined by the FLIP. In my opinion, nothing can still prevent a Runnable component to be Run several times. By accepting distinct contexts, Run does not show an idempotent behavior.

Hypothetically, even introducing the capability of binding the same Run function to several distinct contexts looks unsafer than the existing ReadyDoneAware non-idempotency vulnerability. As we still totally delegate this responsibility to the developer to treat Run as a non-idempotent component. This can cause more serious safety vulnerabilities.

So, while checking the readiness of a component is idempotent in this solution, the way Run method is defined extends the surface of vulnerability.

m4ksio · 2021-08-19T05:06:48Z

flips/context-propagation.md

+  - If context propagation is done properly, there is no need to worry about any cleanup code in the `Done` method.
+- This allows us to separate the capability to check a component's state from the capability to start / stop it. We may want to give multiple other components the capability to check its state, without giving them the capability to start or stop it. Here is an [example](https://github.com/onflow/flow-go/blob/b50f0ffe054103a82e4aa9e0c9e4610c2cbf2cc9/engine/common/splitter/network/network.go#L112) of where this would be useful.
+- This provides a clearer way of defining ownership of components, and hence may potentially eliminate the need to deal with concurrency-safety altogether. Whoever creates a component should be responsible for starting it, and therefore they should be the only one with access to its `Runnable` interface. If each component only has a single parent that is capable of starting it, then we should never run into concurrency issues.
+- This enables us to define components which startup upon creation, and don't need to be explicitly `Run()` at a later point in time. Such components would only implement the `ReadyDoneAware` interface.


I probably miss something, but if component starts up immediately and has no startup/shutdown code, why even bother with ReadyDoneAware? It can just be simple Go object without special lifecycle right?
We use .Module() in NodeBuilder, which is also badly named, as this is just a callback, nothing modular there.

So this would refer to a component which does have startup / shutdown code, but the startup code isn't explicitly triggered by a call to Run (or Start, which is probably a better name as you suggested).

In other words, it could look like this:

ctx, cancel := context.WithCancel(context.TODO()) // create and start component component := NewComponent(ctx) // wait for it to be ready <-component.Ready() // stop component cancel() // wait for it to be done <-component.Done()

m4ksio · 2021-08-19T05:08:58Z

flips/context-propagation.md

+#### Motivations
+- `Context`s are the standard way of doing go-routine lifecycle management in Go, and adhering to standards helps eliminate confusion and ambiguity for anyone interacting with the `flow-go` codebase. This is especially true now that we are beginning to provide API's and interfaces for third parties to interact with the codebase (e.g DPS).
+  - Even to someone unfamiliar with our codebase (but familiar with Go idioms), it is clear how a method signature like `Run(context.Context)` will behave. A method signature like `Ready()` is not so clear.
+  - If context propagation is done properly, there is no need to worry about any cleanup code in the `Done` method.


I understand the idea, but ReadyDoneAware was plugged because cancelling context doesn't provide a way for component to signalise cleanup.
Its especially useful where components rely on being shutdown in order - for example. Ledger and WAL and Checkpointers.

I understand the idea, but ReadyDoneAware was plugged because cancelling context doesn't provide a way for component to signalise cleanup.

Yes, and this is precisely why I'm not suggesting we get rid of ReadyDoneAware. We should keep it, as a way for components to signal that they have finished startup, or finished shutdown / cleanup. However, it should not also be responsible for actually starting or stopping the component.

m4ksio · 2021-08-19T05:19:44Z

I have only reviewed proposed changes regarding ReadyDoneAware. While I agree we there are certain drawbacks I feel that proposed rework would increase implied complexity by introducing by pushing component readiness as a status all components need to be aware of.
There are places that would benefit from the proposed approach, but I think this can be fixed by adding extra checks/propagation on top of existing system and only apply them where needed.

synzhu · 2021-08-19T16:57:45Z

I have only reviewed proposed changes regarding ReadyDoneAware. While I agree we there are certain drawbacks I feel that proposed rework would increase implied complexity by introducing by pushing component readiness as a status all components need to be aware of.
There are places that would benefit from the proposed approach, but I think this can be fixed by adding extra checks/propagation on top of existing system and only apply them where needed.

@m4ksio Thanks for the review :)

I've answered all of your comments above and I hope I've addressed them clearly, but happy to discuss more.

I feel that proposed rework would increase implied complexity by introducing by pushing component readiness as a status all components need to be aware of.

This sentence is a little hard to understand for me, can you explain?

yhassanzadeh13

Thanks for the write-up. Added a few comments, would be happy to discuss in details.

yhassanzadeh13 · 2021-08-27T23:58:46Z

flips/context-propagation.md

+  ```golang
+  type Runnable interface {
+    // Run will start the component
+    Run(context.Context)


I can see that one of the reasons this FLIP proposed was to handle the non-idempotency of the ReadyDoneAware. However, it seems the problem is bleeding to a separate domain by introducing Runnable. I consider the solution does not fully comply with the problem defined by the FLIP. In my opinion, nothing can still prevent a Runnable component to be Run several times. By accepting distinct contexts, Run does not show an idempotent behavior.

Hypothetically, even introducing the capability of binding the same Run function to several distinct contexts looks unsafer than the existing ReadyDoneAware non-idempotency vulnerability. As we still totally delegate this responsibility to the developer to treat Run as a non-idempotent component. This can cause more serious safety vulnerabilities.

So, while checking the readiness of a component is idempotent in this solution, the way Run method is defined extends the surface of vulnerability.

yhassanzadeh13 · 2021-08-28T00:22:10Z

flips/context-propagation.md

+      case <-ctx.Done():
+        return
+      default:
+        // do work...


Not sure if this is a reliable way of life-cycle management for Flow. I acknowledge the Golang best practices around contexts that you also elaborated. The problem is with this approach we keep delegating the life-cycle management of sub-components to themselves, which is in contrast to managing the life-cycle by the parent component.

For example, assume the // do work part here is a long-running task. Canceling the context wouldn't kill that task until it returns and the subsequent iteration of for starts. However, the parent component called ctx.Cancel() assumes all sub-components have been terminated immediately. Right?

The problem is a faulty component that faced deadlock may never get terminated in this approach, and there is no way that the parent component is aware of it.

For example, assume the // do work part here is a long-running task. Canceling the context wouldn't kill that task until it returns and the subsequent iteration of for starts. However, the parent component called ctx.Cancel() assumes all sub-components have been terminated immediately. Right?

No, it doesn't assume that all sub-components have been terminated immediately. That's what the Done method is for. You used the channel returned from Done to actually wait for shutdown to complete. the cancelling of the context just triggers the shutdown.

yhassanzadeh13 · 2021-08-28T00:37:00Z

flips/context-propagation.md

+Finally, we should either deprecate or modify the behavior of [`engine.Unit`](https://github.com/onflow/flow-go/blob/master/engine/unit.go).
+
+#### Motivations
+- The `OriginID` field really provides no use. The value of this field is set by the sender of the message, so we certainly cannot rely on its correctness. We can validate it by [checking it against the peer ID of the sender](https://github.com/onflow/flow-go/pull/1163), but this is actually unnecessary since the peer ID is enough to tell us who the sender is. We can extract the sender's network key from their peer ID, and if a downstream engine needs to know the Flow ID corresponding to this network key, we can determine this from the protocol state. If we cannot find a corresponding Flow ID, then the sender is not a staked node.


Wouldn't this be easier to keep the OriginID as part of the payload, require every message being signed by its OriginID, and then on the recipient side verify the signature on payload against the verification key of the OriginID?

The advantage of this signature checking is reducing the heavy lifting of proposed cross-referencing between peer ID and flow ID at the recipient side to just a signature verification (which we literally need to do). It also saves a lot from the application layer running the assumption that incoming messages are already checked against authenticity and integrity and reduces the surface of redundancy that decoupling the authenticity (at the network layer) and integrity checking (at the application layer) in the proposed solution introduces.

Moreover, we keep an incoming message a self-contained data structure by including its OriginID. To account for unstaked senders, we can have OriginID as *flow.Identifier type, and let unstaked senders emit a message with a nil value for OriginID, which is a compliant representation as we won't assume a flow.Identifier for unstaked nodes.

P.S: I think most of our message types are literally meant to have a valid signature. So, the only networking level compromise we make is to send an extra 32 bytes of OriginID while at return we have a more coherent authenticity and integrity checking for our messages.

The advantage of this signature checking is reducing the heavy lifting of proposed cross-referencing between peer ID and flow ID at the recipient side to just a signature verification (which we literally need to do). It also saves a lot from the application layer running the assumption that incoming messages are already checked against authenticity and integrity and reduces the surface of redundancy that decoupling the authenticity (at the network layer) and integrity checking (at the application layer) in the proposed solution introduces.

I don't think the cross-referencing between peer and flow ID is actually a lot of work, in fact with a hashmap it would just be a (approximately) constant-time lookup. LibP2P already does signature verification, so I'm not sure why we would want to do again at the application level as this seems redundant. I don't think the proposed approach introduces more redundancy, because authenticity and integrity checking are completely separate, so the two stages are doing two separate but necessary things. We would be introducing redundancy by checking signatures at the application layer, as this combines authenticity and integrity checking into a single step but authentication is already done by libp2p

Moreover, we keep an incoming message a self-contained data structure by including its OriginID. To account for unstaked senders, we can have OriginID as *flow.Identifier type, and let unstaked senders emit a message with a nil value for OriginID, which is a compliant representation as we won't assume a flow.Identifier for unstaked nodes

We can do this, but how will a receiver of a message send a response to an unstaked node with only a nil origin ID?

LibP2P already does signature verification, so I'm not sure why we would want to do again at the application level as this seems redundant

This application layer signature verification is not redundant:

When a peer sends message M it signs it with its networking key, which is verified at any intermediary pubsub node as well as the receiver. This signature verification is done at libp2p level against the networking key of the sender. The aim of this signature verification provided by libp2p is solely to ensure integrity on the end-to-end delivery of the message.

The reason we can't only rely on the libp2p signature verification is it does not account for the authenticity of the message, i.e., Alice receiving a message from Bob can ensure the integrity of message sent by Bob and not the authenticity of the message as whether it has been created by Bob. Moving down the BFT road and relying only on signature verification at networking layer, the messages fail to be attributable to their creator and lack non-repudiation. Backing to our example, when Alice receives message M from Bob, she cannot verify whether Bob created that message or Eve, or anyone else. In other words, the origin of the message remains in mist from the cryptographic point of view.

That is why we have a protocol level key (i.e., staking key) as well, and at the protocol level the messages are supposed to be signed by the staking key of their corresponding creator. In this way, no matter who Alice receives message M from, she always can check it against the public staking key of the creator to ensure it is originated from that creator. This later on enables Alice to submit message M as an evidence of a malicious behavior originated from Eve (i.e., slashing), and anyone can verify the correctness of Alice's claim by (1) checking whether content of M represents a misbehavior, and (2) whether M includes a signature on its context that is verifiable against Eve's public staking key. This is just an example of a vital protocol feature of Flow that sticking solely on signature verification at libp2p level fails to provide. Another example is the general provider scenario that we aim that literally nodes can cache some entities they receive (e.g., ChunkDataPack) and provide them to other nodes requesting on the same topic. In this case, a verification node receiving a ChunkDataPack from another verification node needs to make sure that this ChunkDataPack has been created by a staked execution node, and this assurance is provided by checking the content of ChunkDataPack for a signature on its payload that is valid against the public key of its origin ID, i.e., the execution node that created that ChunkDataPack.

Now the reason including OriginID in the message M is crucial is that it enables any node receiving M to lookup public staking key of M from protocol state and check the signature of the message against OriginID's public key.
Including the OriginID together with signature in the messages makes them self-certifying, i.e., the message contains all forensic information needed to verify it later on by any party. Note that libp2p signature and protocol level signatures are different. The latter is part of the protocol-level message.

I don't think the cross-referencing between peer and flow ID is actually a lot of work, in fact with a hashmap it would just be a (approximately) constant-time lookup.

Looking up from hash table is a constant-time operation with respect to the number of entries in the hash table. However, it is not necessarily cheap, as it includes a hash operation which can be computation-heavy. I admit that we have hundreds of places that we do hashmap lookup, and I am not against adding such lookup at networking layer. But as we go BFT and especially on unstaked network, can we need to be careful about adding extra operations at networking layer. Such operations can be exploited by adversarial nodes to perform DDoS attacks, i.e., flood of messages that cause additional hashmap lookups. To the best of my knowledge, our security approach at networking layer is mostly proactive than reactive especially when it comes to availability-based attacks such as flooding. So the more we can save networking layer from extra bookkeeping, the more secure networking layer we have against such DDoS attacks. Of course DDoS preventions and rate limiting are possible, but they have their own overheads as well.

The reason we can't only rely on the libp2p signature verification is it does not account for the authenticity of the message, i.e., Alice receiving a message from Bob can ensure the integrity of message sent by Bob and not the authenticity of the message as whether it has been created by Bob. Moving down the BFT road and relying only on signature verification at networking layer, the messages fail to be attributable to their creator and lack non-repudiation. Backing to our example, when Alice receives message M from Bob, she cannot verify whether Bob created that message or Eve, or anyone else. In other words, the origin of the message remains in mist from the cryptographic point of view.

Actually, the cool thing is that libp2p verifies both :)

ReceivedFrom is for integrity, and From is for authenticity.

In your example, ReceivedFrom would be Bob, and From would be the origin of the message. The entire message envelope is signed by the origin and the signature is included along with the message, and so libp2p is able to verify both of these :)

"Verifies both" is a bit of a simplification. For the nitty-gritty:

libp2p as configured in the Flow codebase today requires and checks a message signature from the source of a message (the From) at every carrier of the message,

libp2p also has transport encryption and authentication on every direct p2p link, using a Noise key exchange with an XX pattern, which guarantees that the ReceivedFrom field is the PeerID of the actor on the other end of the direct link (i.e. the bearer of the message upon delivery).

Checking that this ReceivedFrom is the Flow node we expect on the other end of the line, or a fortiori a valid Flow node, is left as an exercise to the out-of-libp2p implementer. There are places where we do this upon link establishment (e.g. subscription validator, readConnection & its callback).

this last part (bearer is as-expected) could be done more systematically (and cheaply) by allowing libp2p to be configured with different key exchange patterns (a KK comes to mind)

But overall this is correct:

libp2p as configured today enforces non-repudiation for all messages, because it forces whole-message signing at the source

as long as the correspondence between staking key and networking key is one-to-one and non-repudiable (i.e. enforced by the protocol), there is no doubt as to the origin of a message, and that network signature can (and in the short-to-middle term future, will) be exhibited as slashable evidence.

synzhu · 2021-08-30T17:13:51Z

I can see that one of the reasons this FLIP proposed was to handle the non-idempotency of the ReadyDoneAware. However, it seems the problem is bleeding to a separate domain by introducing Runnable. I consider the solution does not fully comply with the problem defined by the FLIP. In my opinion, nothing can still prevent a Runnable component to be Run several times. By accepting distinct contexts, Run does not show an idempotent behavior.

Hypothetically, even introducing the capability of binding the same Run function to several distinct contexts looks unsafer than the existing ReadyDoneAware non-idempotency vulnerability. As we still totally delegate this responsibility to the developer to treat Run as a non-idempotent component. This can cause more serious safety vulnerabilities.

So, while checking the readiness of a component is idempotent in this solution, the way Run method is defined extends the surface of vulnerability.

@yhassanzadeh13 If we really want to make Run idempotent, we can easily do so using something like sync.Once. Then, both Run and readiness checking will idempotent.

However, as @m4ksio and @zhangchiqing have both noted, defensive programming isn't the standard Go approach and is not actually needed in most cases.

The reason why non-idempotency was an issue in the first place is because there was no separation of the readiness checking capability and starting capability for a component. Readiness checking is the one that actually needs to be idempotent, because this capability must be available to multiple other components and readiness checking may be done multiple times. However, starting a component doesn't necessarily need to be idempotent, because a component only needs to be started once. Because these two currently aren't separated, then we have to make them either idempotent or non-idempotent together. By separating them, we can make the necessary one idempotent (readiness checking), while avoiding the complexity of introducing idempotency for the one that's not necessary (starting the component)

AlexHentschel · 2021-09-10T20:59:58Z

Summary of discussion

Alex & Simon; Sept 10, 2021

Networking layer must provide strict admission control for staked nodes
- when some node attempts to open connection, the networking layer must check
  that this is an authorized connection and refuse it otherwise
- this must happen at the point of the handshake
- only messages from authorized peers should be decoded (!)
  Otherwise a potential attack vector
For gossip network: messages should be de-duplicated based on payload alone
- message origin should not factor into whether a message is considered a duplicate or not
- for example, consider a block proposal from Alice. If Bob re-broadcasts the proposal, we need to consider it as a duplicate. Otherwise, 1/3 of byzantine nodes (about 40 consensus nodes) could 40x message processing overhead by just re-broadcasting all blocks.
- this also allows us to properly separate networks between staked nodes and unstaked nodes, without having to worry about repetition of the same message (e.g block) broadcasted by multiple access nodes
There should be a strong separation between network of staked nodes and network for unstaked nodes
- Minimum separation on an engine level: Engines should either deal with messages from staked or unstaked nodes; but not both. Processing both by the same engine is an invitation for liveness attacks, because then messages from the unstaked nodes can starve liveness-critical messages from staked nodes of processing resources.
- better: complete process separation (different OS processes for processing messages from staked vs unstaked nodes)
- ideally (eventual goal): hardware separation
In all cases, we only require one identifier:
flowID (for staked nodes) or network-derived peerID (unstaked nodes)
For messages from staked nodes, we would like an Inspector object:
- networking layer for staked network must guarantees message integrity and authenticity
- if the recipient finds that something is wrong with the message, we can always slash the message origin
- proof that origin sent this message, would be generated on-demand
- could be an extension of context:
  - provide proof that peer really sent the message
  - we can also include the peer.ID of the sender in this inspector object
  - it makes for requests from staked nodes that they can be canceled

For API safety, it would probably make sense to have different APIs for the staked and unstaked network. Thereby, the API reflects that messages from both networks should not be mixed.

synzhu · 2021-09-13T16:13:32Z

Once we have message queues implemented on all engines according to onflow/flow#343, we should be able to combine the context passed in to engine.Process with the context passed into any threads processing the queue with something like https://github.com/teivah/onecontext

huitseeker

Requesting changes to keep track of this as needing updates from #1275 (an possibly #1355 too).

huitseeker

Thanks a bunch, this is definitely going in the right direction˜

flips/component-interface.md

huitseeker

Excellent! Thanks a lot!

huitseeker · 2021-10-08T19:47:25Z

flips/component-interface.md

+		s.errChan <- err
+		close(s.errChan)
+	} else {
+		// Another thread, possibly from the same component, has already thrown


Nit: indentation

peterargue

thanks @smnzhu this looks great

synzhu · 2021-10-08T21:04:26Z

bors merge

1167: [FLIP] Component API r=smnzhu a=smnzhu [FLIP 1167](https://github.com/onflow/flow-go/blob/smnzhu/runnable-flip/flips/component-interface.md) ### TODO - [ ] Add implementation steps Co-authored-by: Simon Zhu <simon.zsiyan@gmail.com>

bors · 2021-10-08T21:33:02Z

Build failed:

Unit Tests (1.16)

synzhu · 2021-10-09T01:51:44Z

bors retry

1167: [FLIP] Component API r=smnzhu a=smnzhu [FLIP 1167](https://github.com/onflow/flow-go/blob/smnzhu/runnable-flip/flips/component-interface.md) ### TODO - [ ] Add implementation steps Co-authored-by: Simon Zhu <simon.zsiyan@gmail.com>

bors · 2021-10-09T02:10:46Z

Canceled.

synzhu · 2021-10-09T02:10:57Z

bors merge

bors · 2021-10-09T02:30:17Z

Build succeeded:

1355: [Network] Middleware component r=smnzhu a=smnzhu * Refactor middleware to implement the `Component` interface. * Introduces new `ComponentManager` struct to help implement `Component` interface * Various refactoring in network layer and scaffold to enable the changes above. ### TODO - [x] As mentioned in #1167 (comment), we should probably explicitly throw an error when `Start` is called multiple times, instead of simply ignoring subsequent calls - [x] Update the godoc for Startable to reflect this - [x] Add tests for ComponentManager Co-authored-by: Simon Zhu <simon.zsiyan@gmail.com>

synzhu added 5 commits August 18, 2021 09:47

Up to partway through "Design Proposal"

a9411bc

Update context-propagation.md

5649ce0

Update context-propagation.md

e023980

Update context-propagation.md

09a34d7

Update context-propagation.md

1d6e000

synzhu requested review from AlexHentschel, huitseeker, tarakby, vishalchangrani, yhassanzadeh13 and zhangchiqing August 19, 2021 02:54

Update context-propagation.md

c0a6ca3

m4ksio reviewed Aug 19, 2021

View reviewed changes

Update context-propagation.md

2445591

Update context-propagation.md

9d28e87

synzhu marked this pull request as ready for review August 27, 2021 03:46

Merge branch 'master' into smnzhu/runnable-flip

bb5158c

yhassanzadeh13 reviewed Aug 28, 2021

View reviewed changes

synzhu mentioned this pull request Sep 3, 2021

[network] topic validator for staked channels #1225

Merged

1 task

synzhu requested review from durkmurder, yhassanzadeh13 and m4ksio September 13, 2021 16:14

synzhu mentioned this pull request Sep 13, 2021

Define error aware interface #1275

Merged

4 tasks

huitseeker suggested changes Sep 29, 2021

View reviewed changes

Update component-interface.md

9172256

synzhu requested a review from huitseeker October 4, 2021 05:40

synzhu added 5 commits October 3, 2021 23:58

update indentation

5d1bf48

Add details about RunComponent

5007170

Add comments and fix indentation

9236106

add period

e4b2e01

fix syntax error

b72da24

huitseeker reviewed Oct 4, 2021

View reviewed changes

synzhu added 2 commits October 6, 2021 11:00

Update component-interface.md

1b37466

address comments

a54d62b

synzhu requested a review from huitseeker October 6, 2021 19:15

Merge branch 'master' into smnzhu/runnable-flip

e557416

huitseeker approved these changes Oct 8, 2021

View reviewed changes

peterargue approved these changes Oct 8, 2021

View reviewed changes

synzhu added 2 commits October 8, 2021 13:48

Update component-interface.md

a93b46d

indentation

f99bf7f

Merge branch 'master' into smnzhu/runnable-flip

e70f781

bors bot merged commit cdd3ee6 into master Oct 9, 2021

bors bot deleted the smnzhu/runnable-flip branch October 9, 2021 02:30

[FLIP] Component API #1167

[FLIP] Component API #1167

Conversation

synzhu commented Aug 19, 2021 • edited

TODO

synzhu commented Aug 19, 2021

codecov-commenter commented Aug 19, 2021 • edited

Codecov Report

Choose a reason for hiding this comment

synzhu Aug 19, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

synzhu Aug 19, 2021 • edited

Choose a reason for hiding this comment

yhassanzadeh13 Aug 27, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

synzhu Aug 19, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

synzhu Aug 19, 2021 • edited

Choose a reason for hiding this comment

m4ksio commented Aug 19, 2021

synzhu commented Aug 19, 2021

yhassanzadeh13 left a comment

Choose a reason for hiding this comment

yhassanzadeh13 Aug 27, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhassanzadeh13 Aug 28, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhassanzadeh13 Sep 23, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huitseeker Sep 24, 2021 • edited

Choose a reason for hiding this comment

synzhu commented Aug 30, 2021

AlexHentschel commented Sep 10, 2021

Summary of discussion

synzhu commented Sep 13, 2021

huitseeker left a comment

Choose a reason for hiding this comment

huitseeker left a comment

Choose a reason for hiding this comment

huitseeker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterargue left a comment

Choose a reason for hiding this comment

synzhu commented Oct 8, 2021

bors bot commented Oct 8, 2021

synzhu commented Oct 9, 2021

bors bot commented Oct 9, 2021

synzhu commented Oct 9, 2021

bors bot commented Oct 9, 2021

synzhu commented Aug 19, 2021 •

edited

codecov-commenter commented Aug 19, 2021 •

edited

synzhu Aug 19, 2021 •

edited

synzhu Aug 19, 2021 •

edited

yhassanzadeh13 Aug 27, 2021 •

edited

synzhu Aug 19, 2021 •

edited

synzhu Aug 19, 2021 •

edited

yhassanzadeh13 Aug 27, 2021 •

edited

yhassanzadeh13 Aug 28, 2021 •

edited

yhassanzadeh13 Sep 23, 2021 •

edited

huitseeker Sep 24, 2021 •

edited