[18.03] [manager/dispatcher] Replace call to isRunning() to isRunningLocked() in dispatcher Heartbeat()#2702
Conversation
… in dispatcher Heartbeat() Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit caee4da) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
|
Interesting; when backporting this to the 17.06 branch, I got a conflict that indicates this may not be needed; suspecting there has been a patch that was applied to the 17.06 branch, but never back ported to master; diff --cc manager/dispatcher/dispatcher.go
index f201f5fc,407ced0b..00000000
--- a/manager/dispatcher/dispatcher.go
+++ b/manager/dispatcher/dispatcher.go
@@@ -1095,12 -1145,9 +1095,18 @@@ func (d *Dispatcher) Heartbeat(ctx cont
d.rpcRW.RLock()
defer d.rpcRW.RUnlock()
++<<<<<<< HEAD
+ // Its OK to call isRunning() here instead of isRunningLocked()
+ // because of the rpcRW readlock above.
+ // TODO(anshul) other uses of isRunningLocked() can probably
+ // also be removed.
+ if !d.isRunning() {
+ return nil, grpc.Errorf(codes.Aborted, "dispatcher is stopped")
++=======
+ // TODO(anshul) Explore if its possible to check context here without locking.
+ if _, err := d.isRunningLocked(); err != nil {
+ return nil, status.Errorf(codes.Aborted, "dispatcher is stopped")
++>>>>>>> caee4da2... [manager/dispatcher] Replace call to isRunning() to isRunningLocked() in dispatcher Heartbeat()
}
nodeInfo, err := ca.RemoteNode(ctx)oh right, it's 17.06 that's missing that patch; not sure why it conflicts; ok looks to be #2519 (master), and cherry-picked into 17.06 through #2524 digging further |
|
Also getting the same failures as in #2700 - may be an actual issue after all? |
|
So for the 17.06 branch, #2519 (master) was;
And on the 18.03 branch;
|
|
So current status: 17.06: // Its OK to call isRunning() here instead of isRunningLocked()
// because of the rpcRW readlock above.
// TODO(anshul) other uses of isRunningLocked() can probably
// also be removed.
if !d.isRunning() {
return nil, grpc.Errorf(codes.Aborted, "dispatcher is stopped")
}18.03: // Its OK to call isRunning() here instead of isRunningLocked()
// because of the rpcRW readlock above.
// TODO(anshul) other uses of isRunningLocked() can probably
// also be removed.
if !d.isRunning() {
return nil, status.Errorf(codes.Aborted, "dispatcher is stopped")
}master (and 18.06): // TODO(anshul) Explore if its possible to check context here without locking.
if _, err := d.isRunningLocked(); err != nil {
return nil, status.Errorf(codes.Aborted, "dispatcher is stopped")
}@anshulpundir I can use some input here which of the two is the right one; if it's what's in 17.06/18.03, then there's something to revert/update on master |
|
@thaJeztah I believe there was a change made originally to |
Codecov Report
@@ Coverage Diff @@
## bump_v18.03 #2702 +/- ##
===============================================
- Coverage 61.82% 61.47% -0.36%
===============================================
Files 134 134
Lines 21820 21820
===============================================
- Hits 13491 13413 -78
- Misses 6888 6970 +82
+ Partials 1441 1437 -4 |
cyli
left a comment
There was a problem hiding this comment.
LGTM - I think this can be merged, but leaving it just in case there is still some doubt about 17.06. Please merge at will @thaJeztah, though.
backport of #2664 for 18.03
cherry-pick was clean; no conflicts
We noticed repeated CI failures pointing to data races because of unlocked access to the dispatcher context in Heartbeat(). Golang also does not provide any guarantees around read-only operations on objects which are otherwise locked.
This will likely have a performance impact, which we will evaluate and some of the dispatcher code might need to be re-written accordingly. But correctness is first.