Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete node attachments when node is removed #2409

Merged
merged 1 commit into from
Oct 18, 2017

Conversation

dperny
Copy link
Collaborator

@dperny dperny commented Oct 12, 2017

When a node is removed, delete all of its attachment tasks, so that any networks being used by those tasks can be successfully removed.

Provides a workaround to the state where a node with attachments is somehow removed from the cluster while attached to a network, preventing the network from being removed. Does not fix many other related bugs.

Signed-off-by: Drew Erny drew.erny@docker.com

for _, task := range tasks {
// if the task is an attachment, then we just delete it. the
// allocator will do the heavy lifting.
// basically, GetAttachment will return the an attachment if that's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove an/the

@@ -313,6 +333,8 @@ func (s *Server) RemoveNode(ctx context.Context, request *api.RemoveNodeRequest)
return err
}

removeNodeAttachments(t, request.NodeID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check error and bubble it up. Probably fail the node remove.

@marcusmartins
Copy link

How are the changes tested?

@dperny dperny force-pushed the workaround-attachments branch from 72ba4d7 to 8b9ef83 Compare October 13, 2017 16:55
@dperny
Copy link
Collaborator Author

dperny commented Oct 13, 2017

They're not yet. I need to write a unit test for the removeNodeAttachments function. That's why I broke it out. It's just a pain because it touches the store so much.

@dperny dperny force-pushed the workaround-attachments branch from 8b9ef83 to a806a0a Compare October 13, 2017 18:38
@dperny
Copy link
Collaborator Author

dperny commented Oct 13, 2017

Added unit test

@dperny dperny force-pushed the workaround-attachments branch from a806a0a to a2fd247 Compare October 13, 2017 20:24
@codecov
Copy link

codecov bot commented Oct 13, 2017

Codecov Report

Merging #2409 into master will increase coverage by 0.24%.
The diff coverage is 59.09%.

@@            Coverage Diff             @@
##           master    #2409      +/-   ##
==========================================
+ Coverage   60.51%   60.76%   +0.24%     
==========================================
  Files         128      128              
  Lines       26339    26361      +22     
==========================================
+ Hits        15940    16018      +78     
+ Misses       9004     8938      -66     
- Partials     1395     1405      +10

// if the task is an attachment, then we just delete it. the allocator
// will do the heavy lifting. basically, GetAttachment will return the
// attachment if that's the kind of runtime, or nil if it's not.
if task.Spec.GetAttachment() != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with @dperny offline but capturing here for everyone else: I guess it makes sense to enforce a model where the task reaper is the only place where tasks are deleted. We could mark these tasks as orphaned and let them be cleaned out by the reaper, with the caveat that the network attachments are removable for orphaned tasks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with marking the tasks Orphaned when the associated node is deleted. I think that's exactly the right behavior.

This should be done in the orchestrators, which is what controls task lifecycle. For example, in the replicated orchestrator:

func (r *Orchestrator) handleTaskEvent(ctx context.Context, event events.Event) {
        switch v := event.(type) {
        case api.EventDeleteNode:
                r.restartTasksByNodeID(ctx, v.Node.ID)

We would want to modify this code to set the old task's state (not desired state) to Orphaned. The code here might be a little hard to follow, because the tasks get added to a map that's eventually processed as a batch, with those tasks getting passed to Restart. We could potentially have a separate map for tasks that need to become orphaned, and pass them to the restart manager in the same way, but then set the state to Orphaned after calling Restart.

The global orchestrator would need similar changes.

For network attachment tasks, I'm not entirely sure what the best way is. We could create a simple orchestrator for those that just watches for node deletion events. I think that's the cleanest way, but as a simpler kludge we could handle this in the network allocator.

I don't think any changes in the task reaper or network allocator are necessary, because once a task's state is set to Orphaned, its network resources are supposed to be freed. But it's definitely worth confirming that this works as expected after the node has been deleted.

Copy link
Contributor

@anshulpundir anshulpundir Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that when we remove a node, we delete all tasks for that node from the store. What is the reason to not keep the history around in that case ? Since the service may still be around, doesn't it make sense to keep the task history from that node around ? @aaronlehmann

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done for global service tasks because otherwise these tasks would stay in the store forever. The task reaper keeps a certain number per node, so there's no provision for removing all the tasks from dead nodes. I'm not sure I see a better way than deleting it immediately, because with the node no longer in the system, we'll never know when the task has shut truly shut down, and is finally safe to delete. Possibly we could use the orphaned state, as discussed above, though if we wanted to be really careful, we would set that state after a delay passes, like we do for unresponsive nodes.

For reference, here's the commit that made the change: 56463e4

@dperny dperny force-pushed the workaround-attachments branch from a2fd247 to 4e11431 Compare October 13, 2017 22:25
@dperny
Copy link
Collaborator Author

dperny commented Oct 17, 2017

I think the correct approach to take is to have the node delete operation also mark all of the node's attachments as ORPHANED. if it's done this way, a node cannot be removed without its attachments being orphaned, and an orphaned attachment cannot exist as long as its node is present. This prevents the issue where we might fail in between deleting a node and orphaning its tasks, which means we don't have to reconcile the state of attachments (yet, at least. we will still need a permanent fix to solve all of the other situations in which an attachment can become "stuck").

When a node is removed, delete all of its attachment tasks, so that any
networks being used by those tasks can be successfully removed.

Provides a workaround to the state where a node with attachments is
somehow removed from the cluster while attached to a network, preventing
the network from being removed. Does not fix many other related bugs.

Includes a unit test for the function that removes node attachment
tasks.

Signed-off-by: Drew Erny <drew.erny@docker.com>
@dperny dperny force-pushed the workaround-attachments branch from 4e11431 to 0c7b2fc Compare October 17, 2017 22:46
@nishanttotla nishanttotla self-requested a review October 18, 2017 18:32
@dperny
Copy link
Collaborator Author

dperny commented Oct 18, 2017

applied this patch to a dev build of the docker daemon and i can confirm it does work; when a node is removed, the tasks are removed with it and the network can be successfully removed

Copy link
Contributor

@anshulpundir anshulpundir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the task lifecycle within the orchestrator does make sense to me, although this does create the problem of lacking crash consistency: when a node is removed, the orchestrator changes the desired state of the deleted node tasks outside the node remove transaction.

We can either live with that small window where its possible to lose task deleting from a removed node, or we could allow task state transactions outside the orchestrator. Maybe the first approach is better, and can be supplemented by possibly having another way to clean up the store in the background.

So, we're pressed by time, I am OK with treating the current approach as a short-term fix and working on the longer-term fix to possibly have a separate light weight orchestrator for the network attachments.

Thoughts ? @dperny @aaronlehmann @nishanttotla

@@ -313,6 +336,10 @@ func (s *Server) RemoveNode(ctx context.Context, request *api.RemoveNodeRequest)
return err
}

if err := removeNodeAttachments(tx, request.NodeID); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to add a comment here to say why we're doing this.

Copy link
Contributor

@anshulpundir anshulpundir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a quick discussion between @dperny @andrewhsu and I, we decided to merge this. Speaking with @dperny, this should serve as the workaround for the most common scenario where the network is not removable because the serviceless task is running on a node that is gone. @dperny will provide a writeup.

Does it sense to run the e2e suite before merging this ?

I'd also suggest opening another issue in swarmkit for the long term fix capturing some of the discussion from this PR.

@dperny dperny merged commit da5ee2a into moby:master Oct 18, 2017
dperny added a commit to dperny/swarmkit-1 that referenced this pull request Oct 23, 2017
When a node is removed, delete all of its attachment tasks, so that any
networks being used by those tasks can be successfully removed.

Provides a workaround to the state where a node with attachments is
somehow removed from the cluster while attached to a network, preventing
the network from being removed. Does not fix many other related bugs.

Includes a unit test for the function that removes node attachment
tasks.

Cherry picks moby#2409 to 17.03. Cherry-pick applies cleanly.

(cherry picked from commit 0c7b2fc)

Signed-off-by: Drew Erny <drew.erny@docker.com>
dperny added a commit to dperny/swarmkit-1 that referenced this pull request Oct 23, 2017
When a node is removed, delete all of its attachment tasks, so that any
networks being used by those tasks can be successfully removed.

Provides a workaround to the state where a node with attachments is
somehow removed from the cluster while attached to a network, preventing
the network from being removed. Does not fix many other related bugs.

Includes a unit test for the function that removes node attachment
tasks.

Cherry picks moby#2409 to 17.03. Cherry-pick applies cleanly.

(cherry picked from commit 0c7b2fc)

Signed-off-by: Drew Erny <drew.erny@docker.com>
marcusmartins added a commit to marcusmartins/docker that referenced this pull request Nov 3, 2017
Upgrade swarmkit dependency.

Changes:
moby/swarmkit@ce5f7b8a (HEAD -> master, origin/master, origin/HEAD) Merge pull request moby/swarmkit#2411 from crunchywelch/2401-arm64_support
moby/swarmkit@b0856099 Merge pull request moby/swarmkit#2423 from thaJeztah/new-misty-handle
moby/swarmkit@2bd294fc Update Misty's GitHub handle
moby/swarmkit@0769c605 Comments for orphaned state/task reaper. (moby/swarmkit#2421)
moby/swarmkit@de950a7e Generic resource cli (moby/swarmkit#2347)
moby/swarmkit@312be598 Provide custom gRPC dialer to override default proxy dialer (moby/swarmkit#2419)
moby/swarmkit@4f12bf79 Merge pull request moby/swarmkit#2415 from cheyang/master
moby/swarmkit@8f9f7dc1 add pid limits
moby/swarmkit@da5ee2a6 Merge pull request moby/swarmkit#2409 from dperny/workaround-attachments
moby/swarmkit@0c7b2fc2 Delete node attachments when node is removed
moby/swarmkit@9d702763 normalize "aarch64" architectures to "arm64"

moby/swarmkit@28f91d8...ce5f7b8

Signed-off-by: Marcus Martins <marcus@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants