-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete node attachments when node is removed #2409
Conversation
manager/controlapi/node.go
Outdated
for _, task := range tasks { | ||
// if the task is an attachment, then we just delete it. the | ||
// allocator will do the heavy lifting. | ||
// basically, GetAttachment will return the an attachment if that's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove an/the
manager/controlapi/node.go
Outdated
@@ -313,6 +333,8 @@ func (s *Server) RemoveNode(ctx context.Context, request *api.RemoveNodeRequest) | |||
return err | |||
} | |||
|
|||
removeNodeAttachments(t, request.NodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check error and bubble it up. Probably fail the node remove.
How are the changes tested? |
72ba4d7
to
8b9ef83
Compare
They're not yet. I need to write a unit test for the |
8b9ef83
to
a806a0a
Compare
Added unit test |
a806a0a
to
a2fd247
Compare
Codecov Report
@@ Coverage Diff @@
## master #2409 +/- ##
==========================================
+ Coverage 60.51% 60.76% +0.24%
==========================================
Files 128 128
Lines 26339 26361 +22
==========================================
+ Hits 15940 16018 +78
+ Misses 9004 8938 -66
- Partials 1395 1405 +10 |
// if the task is an attachment, then we just delete it. the allocator | ||
// will do the heavy lifting. basically, GetAttachment will return the | ||
// attachment if that's the kind of runtime, or nil if it's not. | ||
if task.Spec.GetAttachment() != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke with @dperny offline but capturing here for everyone else: I guess it makes sense to enforce a model where the task reaper is the only place where tasks are deleted. We could mark these tasks as orphaned and let them be cleaned out by the reaper, with the caveat that the network attachments are removable for orphaned tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with marking the tasks Orphaned
when the associated node is deleted. I think that's exactly the right behavior.
This should be done in the orchestrators, which is what controls task lifecycle. For example, in the replicated orchestrator:
func (r *Orchestrator) handleTaskEvent(ctx context.Context, event events.Event) {
switch v := event.(type) {
case api.EventDeleteNode:
r.restartTasksByNodeID(ctx, v.Node.ID)
We would want to modify this code to set the old task's state (not desired state) to Orphaned
. The code here might be a little hard to follow, because the tasks get added to a map that's eventually processed as a batch, with those tasks getting passed to Restart
. We could potentially have a separate map for tasks that need to become orphaned, and pass them to the restart manager in the same way, but then set the state to Orphaned
after calling Restart
.
The global orchestrator would need similar changes.
For network attachment tasks, I'm not entirely sure what the best way is. We could create a simple orchestrator for those that just watches for node deletion events. I think that's the cleanest way, but as a simpler kludge we could handle this in the network allocator.
I don't think any changes in the task reaper or network allocator are necessary, because once a task's state is set to Orphaned
, its network resources are supposed to be freed. But it's definitely worth confirming that this works as expected after the node has been deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that when we remove a node, we delete all tasks for that node from the store. What is the reason to not keep the history around in that case ? Since the service may still be around, doesn't it make sense to keep the task history from that node around ? @aaronlehmann
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was done for global service tasks because otherwise these tasks would stay in the store forever. The task reaper keeps a certain number per node, so there's no provision for removing all the tasks from dead nodes. I'm not sure I see a better way than deleting it immediately, because with the node no longer in the system, we'll never know when the task has shut truly shut down, and is finally safe to delete. Possibly we could use the orphaned state, as discussed above, though if we wanted to be really careful, we would set that state after a delay passes, like we do for unresponsive nodes.
For reference, here's the commit that made the change: 56463e4
a2fd247
to
4e11431
Compare
I think the correct approach to take is to have the node delete operation also mark all of the node's attachments as |
When a node is removed, delete all of its attachment tasks, so that any networks being used by those tasks can be successfully removed. Provides a workaround to the state where a node with attachments is somehow removed from the cluster while attached to a network, preventing the network from being removed. Does not fix many other related bugs. Includes a unit test for the function that removes node attachment tasks. Signed-off-by: Drew Erny <drew.erny@docker.com>
4e11431
to
0c7b2fc
Compare
applied this patch to a dev build of the docker daemon and i can confirm it does work; when a node is removed, the tasks are removed with it and the network can be successfully removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the task lifecycle within the orchestrator does make sense to me, although this does create the problem of lacking crash consistency: when a node is removed, the orchestrator changes the desired state of the deleted node tasks outside the node remove transaction.
We can either live with that small window where its possible to lose task deleting from a removed node, or we could allow task state transactions outside the orchestrator. Maybe the first approach is better, and can be supplemented by possibly having another way to clean up the store in the background.
So, we're pressed by time, I am OK with treating the current approach as a short-term fix and working on the longer-term fix to possibly have a separate light weight orchestrator for the network attachments.
Thoughts ? @dperny @aaronlehmann @nishanttotla
@@ -313,6 +336,10 @@ func (s *Server) RemoveNode(ctx context.Context, request *api.RemoveNodeRequest) | |||
return err | |||
} | |||
|
|||
if err := removeNodeAttachments(tx, request.NodeID); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense to add a comment here to say why we're doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a quick discussion between @dperny @andrewhsu and I, we decided to merge this. Speaking with @dperny, this should serve as the workaround for the most common scenario where the network is not removable because the serviceless task is running on a node that is gone. @dperny will provide a writeup.
Does it sense to run the e2e suite before merging this ?
I'd also suggest opening another issue in swarmkit for the long term fix capturing some of the discussion from this PR.
When a node is removed, delete all of its attachment tasks, so that any networks being used by those tasks can be successfully removed. Provides a workaround to the state where a node with attachments is somehow removed from the cluster while attached to a network, preventing the network from being removed. Does not fix many other related bugs. Includes a unit test for the function that removes node attachment tasks. Cherry picks moby#2409 to 17.03. Cherry-pick applies cleanly. (cherry picked from commit 0c7b2fc) Signed-off-by: Drew Erny <drew.erny@docker.com>
When a node is removed, delete all of its attachment tasks, so that any networks being used by those tasks can be successfully removed. Provides a workaround to the state where a node with attachments is somehow removed from the cluster while attached to a network, preventing the network from being removed. Does not fix many other related bugs. Includes a unit test for the function that removes node attachment tasks. Cherry picks moby#2409 to 17.03. Cherry-pick applies cleanly. (cherry picked from commit 0c7b2fc) Signed-off-by: Drew Erny <drew.erny@docker.com>
Upgrade swarmkit dependency. Changes: moby/swarmkit@ce5f7b8a (HEAD -> master, origin/master, origin/HEAD) Merge pull request moby/swarmkit#2411 from crunchywelch/2401-arm64_support moby/swarmkit@b0856099 Merge pull request moby/swarmkit#2423 from thaJeztah/new-misty-handle moby/swarmkit@2bd294fc Update Misty's GitHub handle moby/swarmkit@0769c605 Comments for orphaned state/task reaper. (moby/swarmkit#2421) moby/swarmkit@de950a7e Generic resource cli (moby/swarmkit#2347) moby/swarmkit@312be598 Provide custom gRPC dialer to override default proxy dialer (moby/swarmkit#2419) moby/swarmkit@4f12bf79 Merge pull request moby/swarmkit#2415 from cheyang/master moby/swarmkit@8f9f7dc1 add pid limits moby/swarmkit@da5ee2a6 Merge pull request moby/swarmkit#2409 from dperny/workaround-attachments moby/swarmkit@0c7b2fc2 Delete node attachments when node is removed moby/swarmkit@9d702763 normalize "aarch64" architectures to "arm64" moby/swarmkit@28f91d8...ce5f7b8 Signed-off-by: Marcus Martins <marcus@docker.com>
When a node is removed, delete all of its attachment tasks, so that any networks being used by those tasks can be successfully removed.
Provides a workaround to the state where a node with attachments is somehow removed from the cluster while attached to a network, preventing the network from being removed. Does not fix many other related bugs.
Signed-off-by: Drew Erny drew.erny@docker.com