New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes are not removed after deleting VMs. #72499
Comments
/assign andrewsykim |
Could be we need to update GKE's configuration for 1.14. I'll take a look. |
GKE still sets --cloud-provider=gce on kube-controller-manager, as far as I can tell, which means LoopMode should still be set to IncludeCloudLoops, and the cloud-specific controller @andrewsykim added in #70344 should be running, so my first guess (that it was just turned off) doesn't appear to be correct. @krzysztof-jastrzebski can you check whether your controller-manager logs contain any messages like "failed to start cloud node lifecycle controller"? |
I haven't had a chance to test the changes I merged end-to-end yet given I was taking some time off for the holidays but I will check if this is reproducible in other cloud providers to gather more data. Thanks for reporting @krzysztof-jastrzebski! |
I checked logs and I don't see any error with string "lifecycle". Flag is set --cloud-provider=gce. |
I was able to reproduce this with an out-of-tree provider as well. Will dig further and report back. |
@mtaufen seems like when nodes are deleted, the status of the Ready condition is actually
In #70344 we added a check to skip deleting nodes with |
@krzysztof-jastrzebski can you confirm if the Ready condition on your node is also |
Opened #72559 (validated with CCM on master) which would put the node deletion logic to what we had prior to #70344. I'm not sure if expecting the Ready condition to be |
@andrewsykim I confirm Ready condition is Unknown. |
@krzysztof-jastrzebski my PR merged to master, can you please test on latest master when you have a chance? |
@andrewsykim It still doesn't work. |
@krzysztof-jastrzebski what's the server version? |
I'm using version 1.14.0-alpha.0.1475+fdf381098bd3e8-kjastrzebski-07-01-19-2 build from HEAD today (fdf3810). |
Are you able to confirm if the kube-controller-manager version is the same? (sorry, not super familiar with how GKE is setup) |
Yes, version is the same. You can download my version from: |
Thanks @krzysztof-jastrzebski. I tested this version on an out-of-tree cloud provider and it works as expected (they run the same controller). I'll try to get a GKE environment setup to debug this further (might take a few days). If you have any new logs (specifically from |
I checked logs and now controller tries to delete pod but there is an error: |
This makes sense because we didn't update bootstrap RBAC rules for |
Should be fixed in #72764. |
@krzysztof-jastrzebski #72764 merged, are you able to test it one more time please? :) |
Now it works. |
Thank you for testing @krzysztof-jastrzebski! /close |
@andrewsykim: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
Nodes are not removed after deleting VMs.
What you expected to happen:
Nodes should be deleted.
How to reproduce it (as minimally and precisely as possible):
Create cluster using HEAD build with 5 nodes. Remove 4 nodes. List nodes. There will be 4 NotReady nodes and 1 Ready. I checked that nodes were not removed after 10 minutes.
Anything else we need to know?:
The bug might be caused by #70344.
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0-alpha.0.1352+a7cb03f4cfbf3b", GitCommit:"a7cb03f4cfbf3b519dc1a0090331a475abbe0321", GitTreeState:"clean", BuildDate:"2019-01-02T19:29:04Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
/kind bug
The text was updated successfully, but these errors were encountered: