-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement InstanceExistsByProviderID() for cloud providers #51409
Implement InstanceExistsByProviderID() for cloud providers #51409
Conversation
/test pull-kubernetes-federation-e2e-gce |
@wlan0 could you please take a look at this PR? I'm gonna be out for a few days |
@FengyunPan Have you tested all the cloudproviders? The code change looks good. |
@wlan0 No, not all, I have tested openstack cloud provider, I have not other cloud to test. |
return false, err | ||
} | ||
|
||
_, err = gce.getInstanceFromProjectInZoneByName(project, zone, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create an instanceByProviderID method? Then both InstanceExistsByProviderID and InstanceTypeByProviderID can both call that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
The change at vsphere cloud provider looks good to me. |
@FengyunPan : I was just trying to figure out what is the need for InstanceExistsByProviderID() when kubernetes can call InstanceID() to know the status of a node? |
@BaluDontu That's no need. InstanceExistsByProviderID() just check instance is exist, no need to know the status of a node. I update it soon. Thank you. |
2858447
to
8dc384b
Compare
/test pull-kubernetes-e2e-gce-bazel |
1 similar comment
/test pull-kubernetes-e2e-gce-bazel |
return false, err | ||
} | ||
|
||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to consider the power state of the Node VM here? Node VM can be obtained from the Inventory but it can be in the powered off state. Powered off Node VM should be immediately deleted by the cloud controller manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! If it should be deleted, then that "Power" state should be checked for, and "false" should be returned if the NodeVM is "powered off". @FengyunPan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jingxu97 : Can you please comment on this. Should the node be deleted by controller manager if a node is powered off? Shouldn't it go into "NotReady" state rather than being deleted from node controller ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@divyenpatel @wlan0 @BaluDontu I do not think we should delete "Power off" state node immediately. In this function, we just check whether node is exist.
Please look this comment: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cloud/node_controller.go#L230
@divyenpatel I know your issuse, we should consider "Power off" state node, but not here, we can not delete "Power off" state node immediately, it is unsafe.
And we have a discuss ( #46442), the discuss is still underway, please take a look and give some comments at there.
@FengyunPan : From vsphere side, the change looks good to us. Its an lgtm from our side. |
return false, nil | ||
} | ||
|
||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to check the running state.
Also, this duplicates logic in ExternalID
, so I propose one calls into the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts sharing code looks complicated, but the logic shouldn't change IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thank you.
@justinsb I have checked the status of AWS instance. PTAL, thank you. |
@jdumars @brendandburns Please review from the Azure side. Thanks! |
/retest |
ping @jdumars |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Fix kubernetes#51406 If cloud providers(like aws, gce etc...) implement ExternalID() and support getting instance by ProviderID , they also implement InstanceExistsByProviderID().
db3721b
to
462087f
Compare
Azure implementation looks okay to me. |
@FengyunPan yes /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheftako, FengyunPan, wlan0 Associated issue: 51406 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest Review the full test history for this PR. |
/test pull-kubernetes-unit |
/retest Review the full test history for this PR. |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 51409, 54616). If you want to cherry-pick this change to another branch, please follow the instructions here. |
@FengyunPan: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
It looks like this isn't working in my case and instances get deleted. The providerID looks correct and everything, yet every new node I add gets deleted right away without any error. The only thing the controller-manager logs is:
While in the events log I see (different node but same result):
While this is the node object before it got removed: https://gist.github.com/discordianfish/f6ba957db7cb8e2875a8bea0505a5097 I'm going to downgrade kubernetes to see if this fixes it. |
@discordianfish sounds serious for 1.9, can you please open a bug so we can track it better? |
@dims Turned out I just missed the KubernetesCluster tags but it took quite long to figure that out due to lack of documentation of the cloud provider itself. The UX is pretty bad, with the cloud intergration just deleting your nodes without giving any clue what might be going on. Would be great if the error returned the filters it used to search for matching instances. |
@discordianfish thanks for tracking it down. yes good news and bad news :) cc @justinsb |
…eExistsByProviderID Automatic merge from submit-queue (batch tested with PRs 51409, 54616). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Implement InstanceExistsByProviderID() for cloud providers Fix kubernetes#51406 If cloud providers(like aws, gce etc...) implement ExternalID() and support getting instance by ProviderID , they also implement InstanceExistsByProviderID(). /assign wlan0 /assign @luxas **Release note**: ```release-note NONE ```
Fix #51406
If cloud providers(like aws, gce etc...) implement ExternalID()
and support getting instance by ProviderID , they also implement
InstanceExistsByProviderID().
/assign wlan0
/assign @luxas
Release note: