Implement InstanceExistsByProviderID() for cloud providers #51409

FengyunPan · 2017-08-26T15:32:57Z

Fix #51406
If cloud providers(like aws, gce etc...) implement ExternalID()
and support getting instance by ProviderID , they also implement
InstanceExistsByProviderID().

/assign wlan0
/assign @luxas

Release note:

NONE

FengyunPan · 2017-08-27T05:54:30Z

/test pull-kubernetes-federation-e2e-gce

luxas · 2017-08-27T15:17:04Z

@wlan0 could you please take a look at this PR? I'm gonna be out for a few days

wlan0 · 2017-08-28T21:55:05Z

@FengyunPan Have you tested all the cloudproviders? The code change looks good.

FengyunPan · 2017-08-29T01:09:48Z

@wlan0 No, not all, I have tested openstack cloud provider, I have not other cloud to test.
I am sorry for that, my codes is very simple and is easy to read, so I submit it.

wlan0 · 2017-08-29T01:38:14Z

I'm ok with the code changes. @luxas are you ok with merging without testing all the cloudproviders?

In the meantime, @cheftako @jdumars @luomiao @justinsb can you review the implementation of InstanceExistsByProviderID for your clouds and let us know if the change LGTY?

Thanks!

jdumars · 2017-08-29T13:57:45Z

@brendandburns @lachie83 @seanknox PTAL

cheftako · 2017-08-29T22:15:06Z

pkg/cloudprovider/providers/gce/gce_instances.go

+		return false, err
+	}
+
+	_, err = gce.getInstanceFromProjectInZoneByName(project, zone, name)


Can we create an instanceByProviderID method? Then both InstanceExistsByProviderID and InstanceTypeByProviderID can both call that.

luomiao · 2017-08-29T22:29:14Z

The change at vsphere cloud provider looks good to me.
@BaluDontu @divyenpatel can you also please verify since these functions are recently updated.

BaluDontu · 2017-08-29T23:12:14Z

@FengyunPan : I was just trying to figure out what is the need for InstanceExistsByProviderID() when kubernetes can call InstanceID() to know the status of a node?

FengyunPan · 2017-08-30T02:15:56Z

@BaluDontu That's no need. InstanceExistsByProviderID() just check instance is exist, no need to know the status of a node. I update it soon. Thank you.

FengyunPan · 2017-08-30T06:12:41Z

/test pull-kubernetes-e2e-gce-bazel

wlan0 · 2017-08-30T14:16:48Z

/test pull-kubernetes-e2e-gce-bazel

divyenpatel · 2017-08-30T16:25:37Z

pkg/cloudprovider/providers/vsphere/vsphere.go

+		return false, err
+	}
+
+	return true, nil


Do we want to consider the power state of the Node VM here? Node VM can be obtained from the Inventory but it can be in the powered off state. Powered off Node VM should be immediately deleted by the cloud controller manager.

Great catch! If it should be deleted, then that "Power" state should be checked for, and "false" should be returned if the NodeVM is "powered off". @FengyunPan

@jingxu97 : Can you please comment on this. Should the node be deleted by controller manager if a node is powered off? Shouldn't it go into "NotReady" state rather than being deleted from node controller ?

@divyenpatel @wlan0 @BaluDontu I do not think we should delete "Power off" state node immediately. In this function, we just check whether node is exist.
Please look this comment: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cloud/node_controller.go#L230
@divyenpatel I know your issuse, we should consider "Power off" state node, but not here, we can not delete "Power off" state node immediately, it is unsafe.
And we have a discuss ( #46442), the discuss is still underway, please take a look and give some comments at there.

BaluDontu · 2017-09-01T21:49:47Z

@FengyunPan : From vsphere side, the change looks good to us. Its an lgtm from our side.

FengyunPan · 2017-09-04T11:25:31Z

@cheftako @jdumars @justinsb @brendandburns @lachie83 @seanknox PTAL
Any thoughts on this PR?

justinsb · 2017-09-04T16:56:40Z

pkg/cloudprovider/providers/aws/aws.go

+		return false, nil
+	}
+
+	return true, nil


I think you need to check the running state.

Also, this duplicates logic in ExternalID, so I propose one calls into the other.

On second thoughts sharing code looks complicated, but the logic shouldn't change IMO

Done, thank you.

FengyunPan · 2017-10-09T12:46:29Z

@justinsb I have checked the status of AWS instance. PTAL, thank you.
@BaluDontu @divyenpatel I have checked the status of VSphere instance, PTAL, thank you.

luxas · 2017-10-09T18:21:29Z

@jdumars @brendandburns Please review from the Azure side. Thanks!

FengyunPan · 2017-10-10T01:20:33Z

/retest

FengyunPan · 2017-10-20T06:26:12Z

ping @jdumars
ping @brendandburns
ping @itowlson
PTAL, thank you.

k8s-ci-robot · 2017-10-20T06:26:14Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: helpdesk@rt.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Fix kubernetes#51406 If cloud providers(like aws, gce etc...) implement ExternalID() and support getting instance by ProviderID , they also implement InstanceExistsByProviderID().

itowlson · 2017-10-20T16:07:11Z

Azure implementation looks okay to me.

FengyunPan · 2017-10-26T13:15:26Z

ping @luxas @wlan0
Is it ok to approve?

wlan0 · 2017-10-26T19:18:04Z

@FengyunPan yes

/lgtm
/approve

k8s-github-robot · 2017-10-26T19:18:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheftako, FengyunPan, wlan0

Associated issue: 51406

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/cloudprovider/providers/aws/OWNERS~~ [wlan0]
~~pkg/cloudprovider/providers/azure/OWNERS~~ [wlan0]
~~pkg/cloudprovider/providers/gce/OWNERS~~ [wlan0]
~~pkg/cloudprovider/providers/openstack/OWNERS~~ [FengyunPan,wlan0]
~~pkg/cloudprovider/providers/vsphere/OWNERS~~ [wlan0]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

fejta-bot · 2017-10-26T22:20:03Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

FengyunPan · 2017-10-27T09:02:34Z

/test pull-kubernetes-unit

fejta-bot · 2017-10-27T11:45:06Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

k8s-github-robot · 2017-10-27T13:14:21Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2017-10-27T13:16:20Z

Automatic merge from submit-queue (batch tested with PRs 51409, 54616). If you want to cherry-pick this change to another branch, please follow the instructions here.

k8s-ci-robot · 2017-10-27T13:38:28Z

@FengyunPan: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kubernetes-unit	`462087f`	link	`/test pull-kubernetes-unit`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

discordianfish · 2017-12-04T13:10:19Z

It looks like this isn't working in my case and instances get deleted. The providerID looks correct and everything, yet every new node I add gets deleted right away without any error. The only thing the controller-manager logs is:

E1204 10:55:40.596096       1 actual_state_of_world.go:483] Failed to set statusUpdateNeeded to needed true because nodeName="ip-172-20-58-74.ec2.internal"  does not exist
E1204 10:55:40.596332       1 actual_state_of_world.go:497] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="ip-172-20-58-74.ec2.internal"  does not exist

While in the events log I see (different node but same result):

24s         24s          1         ip-172-20-79-61.ec2.internal.14fd183319cda415     Node                  Normal    DeletingNode              controllermanager                         Node ip-172-20-79-61.ec2.internal event: Deleting Node ip-172-20-79-61.ec2.internal because it's not present according to cloud provider

While this is the node object before it got removed: https://gist.github.com/discordianfish/f6ba957db7cb8e2875a8bea0505a5097

I'm going to downgrade kubernetes to see if this fixes it.

dims · 2017-12-04T13:18:48Z

@discordianfish sounds serious for 1.9, can you please open a bug so we can track it better?

discordianfish · 2017-12-04T14:51:09Z

@dims Turned out I just missed the KubernetesCluster tags but it took quite long to figure that out due to lack of documentation of the cloud provider itself. The UX is pretty bad, with the cloud intergration just deleting your nodes without giving any clue what might be going on. Would be great if the error returned the filters it used to search for matching instances.

dims · 2017-12-04T16:35:16Z

@discordianfish thanks for tracking it down. yes good news and bad news :)

cc @justinsb

@luxas

…eExistsByProviderID Automatic merge from submit-queue (batch tested with PRs 51409, 54616). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Implement InstanceExistsByProviderID() for cloud providers Fix kubernetes#51406 If cloud providers(like aws, gce etc...) implement ExternalID() and support getting instance by ProviderID , they also implement InstanceExistsByProviderID(). /assign wlan0 /assign @luxas **Release note**: ```release-note NONE ```

k8s-ci-robot assigned luxas and wlan0 Aug 26, 2017

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 26, 2017

k8s-github-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 26, 2017

luxas added this to the v1.8 milestone Aug 26, 2017

cheftako reviewed Aug 29, 2017

View reviewed changes

FengyunPan force-pushed the implement-InstanceExistsByProviderID branch from 2858447 to 8dc384b Compare August 30, 2017 02:28

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 30, 2017

divyenpatel reviewed Aug 30, 2017

View reviewed changes

luxas added area/cloudprovider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Sep 1, 2017

calebamiles modified the milestone: v1.8 Sep 2, 2017

wlan0 mentioned this pull request Sep 3, 2017

Support out-of-process and out-of-tree cloud providers kubernetes/enhancements#88

Closed

justinsb reviewed Sep 4, 2017

View reviewed changes

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 20, 2017

Implement InstanceExistsByProviderID() for cloud providers

462087f

Fix kubernetes#51406 If cloud providers(like aws, gce etc...) implement ExternalID() and support getting instance by ProviderID , they also implement InstanceExistsByProviderID().

FengyunPan force-pushed the implement-InstanceExistsByProviderID branch from db3721b to 462087f Compare October 20, 2017 07:03

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 20, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2017

k8s-github-robot merged commit 55e49ed into kubernetes:master Oct 27, 2017

sjenning mentioned this pull request Jul 31, 2018

cloudprovider: aws: return true on existence check for stopped instances #66835

Merged

Implement InstanceExistsByProviderID() for cloud providers #51409

Implement InstanceExistsByProviderID() for cloud providers #51409

Conversation

FengyunPan commented Aug 26, 2017

FengyunPan commented Aug 27, 2017

luxas commented Aug 27, 2017

wlan0 commented Aug 28, 2017

FengyunPan commented Aug 29, 2017

wlan0 commented Aug 29, 2017 • edited

jdumars commented Aug 29, 2017

cheftako Aug 29, 2017

Choose a reason for hiding this comment

FengyunPan Aug 30, 2017

Choose a reason for hiding this comment

luomiao commented Aug 29, 2017

BaluDontu commented Aug 29, 2017

FengyunPan commented Aug 30, 2017 • edited

FengyunPan commented Aug 30, 2017

wlan0 commented Aug 30, 2017

divyenpatel Aug 30, 2017

Choose a reason for hiding this comment

wlan0 Aug 30, 2017

Choose a reason for hiding this comment

BaluDontu Aug 30, 2017 • edited

Choose a reason for hiding this comment

FengyunPan Aug 31, 2017 • edited

Choose a reason for hiding this comment

BaluDontu commented Sep 1, 2017

FengyunPan commented Sep 4, 2017

justinsb Sep 4, 2017

Choose a reason for hiding this comment

justinsb Sep 4, 2017

Choose a reason for hiding this comment

FengyunPan Oct 9, 2017

Choose a reason for hiding this comment

FengyunPan commented Oct 9, 2017

luxas commented Oct 9, 2017

FengyunPan commented Oct 10, 2017

FengyunPan commented Oct 20, 2017

k8s-ci-robot commented Oct 20, 2017

itowlson commented Oct 20, 2017

FengyunPan commented Oct 26, 2017

wlan0 commented Oct 26, 2017

k8s-github-robot commented Oct 26, 2017

fejta-bot commented Oct 26, 2017

FengyunPan commented Oct 27, 2017

fejta-bot commented Oct 27, 2017

k8s-github-robot commented Oct 27, 2017

k8s-github-robot commented Oct 27, 2017

k8s-ci-robot commented Oct 27, 2017 • edited

discordianfish commented Dec 4, 2017

dims commented Dec 4, 2017

discordianfish commented Dec 4, 2017

dims commented Dec 4, 2017

wlan0 commented Aug 29, 2017 •

edited

FengyunPan commented Aug 30, 2017 •

edited

BaluDontu Aug 30, 2017 •

edited

FengyunPan Aug 31, 2017 •

edited

k8s-ci-robot commented Oct 27, 2017 •

edited