Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foreground deletion for daemonset doesn't work #64313

Closed
soltysh opened this issue May 25, 2018 · 3 comments · Fixed by #64797
Closed

Foreground deletion for daemonset doesn't work #64313

soltysh opened this issue May 25, 2018 · 3 comments · Fixed by #64797
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
Milestone

Comments

@soltysh
Copy link
Contributor

soltysh commented May 25, 2018

/kind bug
/priority critical-urgent

What happened:
In #63979 I'm removing all the reapers, unfortunately when invoking foreground deletion on a DaemonSet the delete operation hangs and controller indefinitely tries to manage a removed DS.

What you expected to happen:
A foreground deletion properly removes DaemonSet and all its dependents.

How to reproduce it (as minimally and precisely as possible):
Invoke a delete operation against kube API (without using kubectl delete command).

Anything else we need to know?:
This behavior makes it so that none of the consumers of the API is not able to remove the DS controller due to this custom logic present in the DS reaper. A working fix that passed the tests with the aforementioned PR is in #64306 but I'm not 100% it's correct.

This is an issue that needs fixing before 1.11 goes out.

/assign @kow3ns @janetkuo
/sig apps

/cc @deads2k

@soltysh soltysh added this to the v1.11 milestone May 25, 2018
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels May 25, 2018
@soltysh
Copy link
Contributor Author

soltysh commented May 25, 2018

@kubernetes/sig-apps-bugs fyi

@janetkuo
Copy link
Member

janetkuo commented Jun 6, 2018

I filed #64797 to fix it.

As mentioned in the PR, this is because the DaemonSet controller races with garbage collector when it tries to update DaemonSet status of the DaemonSet being deleted.

Here's what happens when kubectl delete ds hangs:

  1. Someone/something performs a foreground deletion on a DaemonSet
  2. DaemonSet finalizer and DeletionTimestamp are both set
  3. DaemonSet history objects (ControllerRevisions) and pods are being deleted by garbage collector; meanwhile, DaemonSet controller tries to update DaemonSet status.
    • Updating DaemonSet status requires constructing DaemonSet history objects, to figure out current revision and which pods do/don't belong to current revision
  4. When updating DaemonSet status, DaemonSet controller tries to create a DaemonSet history object that matches current DaemonSet spec
  5. Garbage collector then tries to delete that DaemonSet history object. And repeat.

Because we can't make DaemonSet pods be deleted before DaemonSet history objects (DaemonSet history objects don't own DaemonSet pods!), we cannot reliably calculate DaemonSet status without history objects anyways. Therefore, we don't update DaemonSet status for DaemonSet being deleted.

Note that the reason why the kubectl delete hack works is because it forces DaemonSet pods to be removed before history objects.

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Issue: Up-to-date for process

@janetkuo @kow3ns @soltysh

Note: This issue is marked as priority/critical-urgent, and must be updated every 1 day during code freeze.

Example update:

ACK.  In progress
ETA: DD/MM/YYYY
Risks: Complicated fix required
Issue Labels
  • sig/apps: Issue will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move issue out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help

Workloads automation moved this from In Progress to Done Jun 6, 2018
k8s-github-robot pushed a commit that referenced this issue Jun 6, 2018
Automatic merge from submit-queue (batch tested with PRs 64749, 64797). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Handle deleted DaemonSet properly

**What this PR does / why we need it**:
After kubectl reapers are removed (#63979) and foreground deletion are used, DaemonSet controller may race with garbage collector when it tries to update DaemonSet status of the DaemonSet being deleted. 

Here's what happened:
1. Someone/something performs a foreground deletion on a DaemonSet
1. DaemonSet finalizer and DeletionTimestamp are both set
1. DaemonSet history objects (ControllerRevisions) and pods are being deleted by garbage collector; meanwhile, DaemonSet controller tries to update DaemonSet status. 
    * Updating DaemonSet status requires constructing DaemonSet history objects, to figure out current revision and which pods do/don't belong to current revision
1. When updating DaemonSet status, DaemonSet controller tries to create a DaemonSet history object that matches current DaemonSet spec
1. Garbage collector then tries to delete that DaemonSet history object. And repeat. 

Because we can't make DaemonSet pods be deleted before DaemonSet history objects (DaemonSet history objects don't own DaemonSet pods!), we cannot reliably calculate DaemonSet status without history objects anyways. Therefore, we don't update DaemonSet status for DaemonSet being deleted. 

Note that the reason why the kubectl delete hack works is because it forces DaemonSet pods to be removed before history objects. 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #64313

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
k8s-github-robot pushed a commit that referenced this issue Jun 7, 2018
…97-upstream-release-1.10

Automatic merge from submit-queue.

Skip updating status for DaemonSet being deleted

**What this PR does / why we need it**: #64313, #64797 (comment)

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Skip updating status for DaemonSet being deleted.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
Workloads
  
Done
Development

Successfully merging a pull request may close this issue.

5 participants