-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Network] Sychronize ready and done #1026
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1026 +/- ##
==========================================
- Coverage 54.85% 54.75% -0.11%
==========================================
Files 286 286
Lines 18913 18916 +3
==========================================
- Hits 10375 10357 -18
- Misses 7140 7160 +20
- Partials 1398 1399 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great catch, but just for my understanding:
- in what conditions is the middleware start called twice?
- in so far as it is, why do we use two independent uint flags (rather than one) to check whether it's running? Would we fail a necessary restart, or a shutdown-after-restart in a situation where started == stopped == 1?
@huitseeker The middleware start would have been called twice whenever The convention here is that a
There is no way to "restart" (ie get from "stopped" to "starting up" state). Instead, one should just create a new instance of the module. However, thanks for the comment because I've now changed the code to account for some edge cases that I didn't think were necessary before (08c887a, c6ef6bf) In particular, the logic will now behave as follows:
|
network/p2p/network.go
Outdated
ready chan struct{} | ||
done chan struct{} | ||
startupCommenced bool | ||
shutdownCommenced bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add godoc for this part reflecting why do we need them and what are they used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. Considering a hotfix to avoid race conditions, the PR looks good. However, I have concerns about the maintainability of the code, as there are literally several member fields added to the Network
solely to manage its start-stop lifecycle. I think these components (i.e., ready
, done
, startupCommand
, and shutdownCommand
), can be efficiently encapsulated into a separate (sub)module that manages the lifecycle of the Network
. For example, the current implementation utilizes stateTransition
lock solely to isolate the ready
and done
channels and startupCommand
, and shutdownCommand
. It is clear that we must never use that lock for any other purpose. Over the time, however, the concept of state transition can be misunderstood, or misused by other developers, hence posing correctness issues for the code, e.g., one uses stateTransition
as the lock to update the state of topology, which can cause liveness issues for our code. Hence, I strongly recommend considering encapsulation of this synchronization logic for lifecycle management into a separate sub(module). You may do that as either part of this PR or another issue (please feel free to issue it up and assign it to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✨
…w/flow-go into smnzhu/synchronize-network
|
This PR fixes a bug with the existing network implementation.
The
middleware.Start
method is not idempotent, and calling it twice results in attempting to bind to the same address twice, which results in the error:This is a problem because the splitter network relies on an underlying network implementation, and it will wait for the underlying network to be ready inside its own
Ready
method. However, the underlying network already has itsReady
method called as part of the node startup.Regardless of how many times
Ready
andDone
are called, we should only callmiddleware.Start
andmiddleware.Stop
once.