New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Verification] rolls-in persistent architecture #728
Conversation
…yahya/5208-fetcher-engine
@@ -94,9 +94,6 @@ func (e *Engine) WithChunkConsumerNotifier(notifier module.ProcessingNotifier) { | |||
|
|||
// Ready initializes the engine and returns a channel that is closed when the initialization is done | |||
func (e *Engine) Ready() <-chan struct{} { | |||
if e.chunkConsumerNotifier == nil { | |||
e.log.Fatal().Msg("missing chunk consumer notifier callback in verification fetcher engine") | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this change is to resolve a cyclic dependency that causes node crash on startup:
- In
cmd/verification/main
, the fetcher engine is both created and started prior to thechunkConsumer
(which has thechunkConsumerNotifier
). - When
chunkConsumer
is created, it sets thechunkConsumerNotifier
of thefetcher
engine. - So starting the
fetcher
engine beforechunkConsumer
initialization crashes the node with afatal
error.
Hence, we discard this checking and rely on the fact that similar to any other component of a node, missing a chunkConsumerNotifier
literally leads the node to crash with panic upon processing the first chunk. This is the same strategy we take about other components, e.g., network, which missing assignment of that on a node panics the first time node tries sending a message.
The other solution would be to decouple the component initialization and startup on the scaffold, but not sure of its side effects on other nodes (considerable on a separate PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it will introduce a race condition at startup where sometimes the node will crash. I don't think the network component has the same behaviour (it may error before the mesh network is set up but it won't crash the node). It would be better to avoid this if we can.
What we do for some other engines with a similar dependency structure (eg. epochmgr
, compliance
) is embed the dependency within the engine and have the engine manage the lifecycle of the dependency. Then both the dependency and the engine can be instantiated in one Component
block in the scaffold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good besides the dependency issue I commented yesterday. Just noticed that appeared as a single comment rather than review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
e6a748e
to
892598a
Compare
@@ -97,7 +97,9 @@ func (e *Engine) Ready() <-chan struct{} { | |||
if e.chunkConsumerNotifier == nil { | |||
e.log.Fatal().Msg("missing chunk consumer notifier callback in verification fetcher engine") | |||
} | |||
return e.unit.Ready() | |||
return e.unit.Ready(func() { | |||
<-e.requester.Ready() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should do <-e.requester.Done()
in Done
method so teardown is handled as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 1cd712e
Codecov Report
@@ Coverage Diff @@
## master #728 +/- ##
=======================================
Coverage 56.44% 56.45%
=======================================
Files 426 426
Lines 25049 25052 +3
=======================================
+ Hits 14140 14142 +2
- Misses 8995 8998 +3
+ Partials 1914 1912 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
This PR refactors the
cmd/verification/main.go
to roll-in the new verification architecture and replace the current engines with the new ones. We however let the old architecture stay on master for a while passively. This lets rolling back the architecture by just rolling back the main file (mainly). We clean the old components in this issue https://github.com/dapperlabs/flow-go/issues/5548 later.