Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube 1.8 deployment garbage collection doesn't work with --leader-elect=false #57044

Closed
jmcmeek opened this issue Dec 11, 2017 · 10 comments · Fixed by #57340
Closed

kube 1.8 deployment garbage collection doesn't work with --leader-elect=false #57044

jmcmeek opened this issue Dec 11, 2017 · 10 comments · Fixed by #57340
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@jmcmeek
Copy link
Contributor

jmcmeek commented Dec 11, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

If a 1.8.x kubernetes master with a single controller-manager is configured with --leader-elect=false, deleting a deployment through the API with v1meta.DeletePropagationForeground doesn't work.

The deployment has its deletionTimestamp set and a foregroundDeletion finalizer is added, but the pod, rc and deployment are not deleted.

What you expected to happen:

The pod, rc and deployment should be deleted when the controller-manager uses --leader-elect=false.

How to reproduce it (as minimally and precisely as possible):

I used kubeadm with calico networking, and modified the controller-manager configuration via

sudo sed -i /etc/kubernetes/manifests/kube-controller-manager.yaml -e '/leader-elect/ s/true/false/'

and then ran https://github.com/kubernetes/client-go/blob/master/examples/create-update-delete-deployment/ to create and delete deployments.

Anything else we need to know?:

I suspect the problem is related to this change:
253b047

It appears to require leader election before the GC controller will finish initialization.
Looking at controller-manager logs with leader-elect=false, I saw this log entry from garbagecollector/graph_builder.go:

"GraphBuilder running"

I did not see the "started %d new monitors, %d currently running" log entry at the end of the startMonitors() function. That seems consistent with the GC controller waiting for "<-gb.informersStarted"

Environment:

  • Kubernetes version (use kubectl version): tested with 1.8.2 and 1.8.4
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): vagrant virtualbox with the ubuntu/xenial64 box
  • Kernel (e.g. uname -a):
  • Install tools: kubeadm
  • Others:
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 11, 2017
@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 11, 2017
@jmcmeek
Copy link
Contributor Author

jmcmeek commented Dec 11, 2017

@kubernetes/sig-api-machinery-bugs

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Dec 11, 2017
@cmluciano
Copy link

@kubernetes/sig-api-machinery-bugs

cc @deads2k Have you seen this in Openshift?

@k8s-ci-robot
Copy link
Contributor

@jmcmeek: Reiterating the mentions to trigger a notification:
@kubernetes/sig-api-machinery-bugs

In response to this:

@kubernetes/sig-api-machinery-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 11, 2017
@roycaihw
Copy link
Member

cc @ironcladlou @caesarxuchao

@jmcmeek
Copy link
Contributor Author

jmcmeek commented Dec 15, 2017

I think this problem occurs because controllermanager.go initialiuzes the stop channel top nil when leader-elect=false. This causes garbage collector graph_builder to exit early.

The following release-1.8 patch seems to fix the problem. controllermanager.go creates a dummy channel which will never be closed and allows the garbage collector graph builder to work like it does when leader-elect is true.

diff --git a/cmd/kube-controller-manager/app/controllermanager.go b/cmd/kube-controller-manager/app/controllermanager.go
index 33a2e8c7a7..031b815cf7 100644
--- a/cmd/kube-controller-manager/app/controllermanager.go
+++ b/cmd/kube-controller-manager/app/controllermanager.go
@@ -163,7 +163,9 @@ func Run(s *options.CMServer) error {
        }
 
        if !s.LeaderElection.LeaderElect {
-               run(nil)
+               stopCh := make(chan struct{})
+               defer close(stopCh)
+               run(stopCh)
                panic("unreachable")
        }
 

[Edit - just noticed that my initial patch had some unexpected differences]

@rtheis
Copy link

rtheis commented Dec 15, 2017

@jmcmeek: do you really need to remove the following code?

-       if informerUserCloud, ok := cloud.(cloudprovider.InformerUser); ok {
-               informerUserCloud.SetInformers(sharedInformers)
-       }

@jmcmeek
Copy link
Contributor Author

jmcmeek commented Dec 15, 2017

@rtheis That change was not intended. I also noticed that after I saved my change. I redid the patch and edited my previous comment to reflect that.

@ironcladlou
Copy link
Contributor

Agree with the diagnosis and fix, @jmcmeek are you going to open a PR?

@jmcmeek
Copy link
Contributor Author

jmcmeek commented Dec 15, 2017

@ironcladlou Yes, I can do that. Reading your PR guidelines and ours.

jmcmeek added a commit to jmcmeek/kubernetes that referenced this issue Dec 18, 2017
**What this PR does / why we need it**:

In a 1.8.x master with --leader-elect=false, the garbage collector controller
does not work.

When deleting a deployment with v1meta.DeletePropagationForeground, the deployment
had its deletionTimestamp set and a foreground Deletion finalizer was added,
but the deployment, rs and pod were not deleted.

This is an issue with how the garbage collector graph_builder behaves when the
stopCh=nil.  This PR creates a dummy stop channel for the garbage collector controller (and other
controllers started by the controller-manager) so that they can work more like they do when
when the controller-manager is configured with --leader-elect=true.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#57044

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
1. Enter your extended release note in the below block. If the PR requires additional action from users switching to the new release, include the string "action required".
2. If no release note is required, just write "NONE".
-->
```release-note
Garbage collection doesn't work when the controller-manager uses --leader-elect=false

```
k8s-github-robot pushed a commit that referenced this issue Jan 2, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix garbage collector when leader-elect=false

**What this PR does / why we need it**:

In a 1.8.x master with --leader-elect=false, the garbage collector controller
does not work.

When deleting a deployment with v1meta.DeletePropagationForeground, the deployment
had its deletionTimestamp set and a foreground Deletion finalizer was added,
but the deployment, rs and pod were not deleted.

This is an issue with how the garbage collector graph_builder behaves when the
stopCh=nil.  This PR creates a dummy stop channel for the garbage collector controller (and other
controllers started by the controller-manager) so that they can work more like they do when
when the controller-manager is configured with --leader-elect=true.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #57044

**Special notes for your reviewer**:

**Release note**:

```release-note
Fix garbage collection when the controller-manager uses --leader-elect=false
```
@jmcmeek
Copy link
Contributor Author

jmcmeek commented Jan 3, 2018

@ironcladlou Will I need to go through cherry-pick process to get this into the release-1.8 and release-1.9 branches?

jmcmeek added a commit to jmcmeek/kubernetes that referenced this issue Jan 13, 2018
**What this PR does / why we need it**:

In a 1.8.x master with --leader-elect=false, the garbage collector controller
does not work.

When deleting a deployment with v1meta.DeletePropagationForeground, the deployment
had its deletionTimestamp set and a foreground Deletion finalizer was added,
but the deployment, rs and pod were not deleted.

This is an issue with how the garbage collector graph_builder behaves when the
stopCh=nil.  This PR creates a dummy stop channel for the garbage collector controller (and other
controllers started by the controller-manager) so that they can work more like they do when
when the controller-manager is configured with --leader-elect=true.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#57044

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
1. Enter your extended release note in the below block. If the PR requires additional action from users switching to the new release, include the string "action required".
2. If no release note is required, just write "NONE".
-->
```release-note
Garbage collection doesn't work when the controller-manager uses --leader-elect=false

```
jmcmeek added a commit to jmcmeek/kubernetes that referenced this issue Jan 13, 2018
**What this PR does / why we need it**:

In a 1.8.x master with --leader-elect=false, the garbage collector controller
does not work.

When deleting a deployment with v1meta.DeletePropagationForeground, the deployment
had its deletionTimestamp set and a foreground Deletion finalizer was added,
but the deployment, rs and pod were not deleted.

This is an issue with how the garbage collector graph_builder behaves when the
stopCh=nil.  This PR creates a dummy stop channel for the garbage collector controller (and other
controllers started by the controller-manager) so that they can work more like they do when
when the controller-manager is configured with --leader-elect=true.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#57044

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
1. Enter your extended release note in the below block. If the PR requires additional action from users switching to the new release, include the string "action required".
2. If no release note is required, just write "NONE".
-->
```release-note
Garbage collection doesn't work when the controller-manager uses --leader-elect=false

```
liggitt pushed a commit to liggitt/kubernetes that referenced this issue Jan 17, 2018
**What this PR does / why we need it**:

In a 1.8.x master with --leader-elect=false, the garbage collector controller
does not work.

When deleting a deployment with v1meta.DeletePropagationForeground, the deployment
had its deletionTimestamp set and a foreground Deletion finalizer was added,
but the deployment, rs and pod were not deleted.

This is an issue with how the garbage collector graph_builder behaves when the
stopCh=nil.  This PR creates a dummy stop channel for the garbage collector controller (and other
controllers started by the controller-manager) so that they can work more like they do when
when the controller-manager is configured with --leader-elect=true.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#57044

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
1. Enter your extended release note in the below block. If the PR requires additional action from users switching to the new release, include the string "action required".
2. If no release note is required, just write "NONE".
-->
```release-note
Garbage collection doesn't work when the controller-manager uses --leader-elect=false

```
liggitt pushed a commit to liggitt/kubernetes that referenced this issue Jan 17, 2018
**What this PR does / why we need it**:

In a 1.8.x master with --leader-elect=false, the garbage collector controller
does not work.

When deleting a deployment with v1meta.DeletePropagationForeground, the deployment
had its deletionTimestamp set and a foreground Deletion finalizer was added,
but the deployment, rs and pod were not deleted.

This is an issue with how the garbage collector graph_builder behaves when the
stopCh=nil.  This PR creates a dummy stop channel for the garbage collector controller (and other
controllers started by the controller-manager) so that they can work more like they do when
when the controller-manager is configured with --leader-elect=true.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#57044

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
1. Enter your extended release note in the below block. If the PR requires additional action from users switching to the new release, include the string "action required".
2. If no release note is required, just write "NONE".
-->
```release-note
Garbage collection doesn't work when the controller-manager uses --leader-elect=false

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants