-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reapTime logic in NetworkDB #1944
Conversation
f3a3aca
to
d2c57f4
Compare
Codecov Report
@@ Coverage Diff @@
## master #1944 +/- ##
==========================================
- Coverage 38% 37.94% -0.07%
==========================================
Files 137 137
Lines 27149 27308 +159
==========================================
+ Hits 10317 10361 +44
- Misses 15562 15674 +112
- Partials 1270 1273 +3
Continue to review full report at Codecov.
|
b388729
to
78f80e7
Compare
return false | ||
} | ||
|
||
params := strings.Split(path[1:], "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we 100% sure the path will have 4 /
in it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 4? it is actually 3 like: fmt.Sprintf("/%s/%s/%s", nid, tname, key)
Also the insertion and deletion is now contained in single functions: createOrUpdateEntry and deleteEntry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I meant 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. this is the current expectation in the code and as @fcrisciani mentioned it is all organized now under the same functions and this assumption is valid.
networkdb/cluster.go
Outdated
|
||
okTable, okNetwork := nDB.deleteEntry(nid, tname, key) | ||
if !okTable { | ||
logrus.Errorf("Could not delete entry in table %s with network id %s and key %s as it does not exist", tname, nid, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having some trouble reading the message, could we replace it
by the table
here and the network
a couple of lines down ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually left the original messages, I just had to change the bool on top.
Anyway here it
refers to the entry itself not the table.
Maybe a better rephrase can be: Table delete failed, entry with key:%s does not exists in the table:%s network:%s
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fcrisciani yeah, sound good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k changing it
networkdb/delegate.go
Outdated
// This case can happen if the cluster is running different versions of the engine where the old version does not have the | ||
// field. If that is not the case, this can be a BUG | ||
if e.deleting && e.reapTime == 0 { | ||
logrus.Warnf("handleTableEvent object %+v has a 0 reapTime, is the cluster running the same docker engine version?", tEvent) | ||
e.reapTime = reapInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change of behavior expected here ?
shouldn't we log when e.reapTime == 0
but still do e.reapTime = reapInterval
if it is not ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, this is actually the point of this patch, the reapTime should be the one passed in the proto (line 237).
If that is 0 and the entry is marked for deletion we are in front of a bug (nodes does not set the value properly) or we are dealing with an old client that does not set it because does not have the new proto.
The old behavior was such that no matter when you get the notification the clock was being set to the max value of 30 min
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except in that very unfortunate circumstance that the ResidualReapTime populated by the peer has infact gone down to exactly 0 seconds at the time of populating the message (https://github.com/docker/libnetwork/pull/1944/files#diff-ced1ed36c3141cb15bdf5acafed78d67R583). If that happens in such a rare situation, we will reset this back to 30 seconds and the event will get cleaned up in 30 seconds (or in the more rarest of occasions we get another genuine 0 seconds ResidualReapTime).
@fcrisciani is that correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW. yes. This is the actual point of this patch and I think it is a good idea to get this done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mavenugo your point is correct only for the case of the truncation of the number. I will add an extra check that the number truncated is still strictly greater than 0.
The improvement here is that the lock is taken only 1 time and not following the same pattern of saving the keys to delete and then do another semaphore race to delete them. So right now the only case in this patch that turns out to be 0 is when the residual time in seconds is 0.xxxx that truncated will be 0.
LGTM (with the log format update) |
78f80e7
to
3350cb8
Compare
msg: buf, | ||
id: tEvent.NetworkID, | ||
tname: tEvent.TableName, | ||
key: tEvent.Key, | ||
node: nDB.config.NodeName, | ||
node: tEvent.NodeName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being a rebroadcast event and based on the current logic of depending on the residual reaptimer, what is the purpose of this change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understood correctly flavio's explanation, it's not strictly related to this PR. it's an issue he saw while he was looking at the code.
Without this change, when the event was rebroadcasted, the node original node emitting the event was replaced with the one rebroadcasting it. So you could receive event saying it originated from one node, but actually came from another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node is simply rebroadcasting the message of another node, at the top of the function there is the check that skips messages where the owner matches with the current node.
Without this change the ownership of the entry is messed up and can cause issues on deletion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. this is a bit subtle and I think a good testing coverage will help mitigate the risk.
networkdb/networkdb.go
Outdated
defer nDB.RUnlock() | ||
return len(nDB.networkNodes[nid]) | ||
}, | ||
RetransmitMult: 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Pls reorganize this code so that not-so-important changes doesn't popup. The only real change in this blob is the addition of entriesNumber
and the rest of if is purely code reorganization. It is very distracting to see such changes in a PR with other sensitive changes.
networkdb/networkdb.go
Outdated
@@ -141,6 +141,9 @@ type network struct { | |||
|
|||
// Number of gossip messages sent related to this network during the last stats collection period | |||
qMessagesSent int | |||
|
|||
// Number of entries on the network | |||
entriesNumber int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is used only for debugging purposes... It would be good to write a comment reflecting the purpose and we should not be tempted to use this as a reference counter for any logic in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add extra comments
@fcrisciani overall the changes LGTM. Only a few minor questions/comments. |
- Added remainingReapTime field in the table event. Wihtout it a node that did not have a state for the element was marking the element for deletion setting the max reapTime. This was creating the possibility to keep the entry being resync between nodes forever avoding the purpose of the reap time itself. - On broadcast of the table event the node owner was rewritten with the local node name, this was not correct because the owner should continue to remain the original one of the message Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- Changed the loop per network. Previous implementation was taking a ReadLock to update the reapTime but now with the residualReapTime also the bulkSync is using the same ReadLock creating possible issues in concurrent read and update of the value. The new logic fetches the list of networks and proceed to the cleanup network by network locking the database and releasing it after each network. This should ensure a fair locking avoiding to keep the database blocked for too much time. Note: The ticker does not guarantee that the reap logic runs precisely every reapTimePeriod, actually documentation says that if the routine is too long will skip ticks. In case of slowdown of the process itself it is possible that the lifetime of the deleted entries increases, it still should not be a huge problem because now the residual reaptime is propagated among all the nodes a slower node will let the deleted entry being repropagate multiple times but the state will still remain consistent. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
3350cb8
to
3feb3aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments. LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Make sure that the network is garbage collected after the entries. Entries to be deleted requires that the network is present. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
3537443
to
fbba555
Compare
New commit LGTM |
Fix for networkDB garbage collection (PR: moby/libnetwork#1944) Added extra logs to monitor the netowrkDB status and number of entries per network Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Fix for networkDB garbage collection (PR: moby/libnetwork#1944) Added extra logs to monitor the netowrkDB status and number of entries per network Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> Upstream-commit: 0404342 Component: engine
Fix for networkDB garbage collection (PR: moby/libnetwork#1944) Added extra logs to monitor the netowrkDB status and number of entries per network Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Added remainingReapTime field in the table event.
Wihtout it a node that did not have a state for the element
was marking the element for deletion setting the max reapTime.
This was creating the possibility to keep the entry being resync
between nodes forever avoding the purpose of the reap time
itself.
On broadcast of the table event the node owner was rewritten
with the local node name, this was not correct because the owner
should continue to remain the original one of the message
Signed-off-by: Flavio Crisciani flavio.crisciani@docker.com