Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement InstanceExistsByProviderID() for cloud providers #51409

Conversation

FengyunPan
Copy link

Fix #51406
If cloud providers(like aws, gce etc...) implement ExternalID()
and support getting instance by ProviderID , they also implement
InstanceExistsByProviderID().

/assign wlan0
/assign @luxas

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 26, 2017
@k8s-github-robot k8s-github-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 26, 2017
@luxas luxas added this to the v1.8 milestone Aug 26, 2017
@FengyunPan
Copy link
Author

/test pull-kubernetes-federation-e2e-gce

@luxas
Copy link
Member

luxas commented Aug 27, 2017

@wlan0 could you please take a look at this PR? I'm gonna be out for a few days

@wlan0
Copy link
Member

wlan0 commented Aug 28, 2017

@FengyunPan Have you tested all the cloudproviders? The code change looks good.

@FengyunPan
Copy link
Author

@wlan0 No, not all, I have tested openstack cloud provider, I have not other cloud to test.
I am sorry for that, my codes is very simple and is easy to read, so I submit it.

@wlan0
Copy link
Member

wlan0 commented Aug 29, 2017

I'm ok with the code changes. @luxas are you ok with merging without testing all the cloudproviders?

In the meantime, @cheftako @jdumars @luomiao @justinsb can you review the implementation of InstanceExistsByProviderID for your clouds and let us know if the change LGTY?

Thanks!

@jdumars
Copy link
Member

jdumars commented Aug 29, 2017

@brendandburns @lachie83 @seanknox PTAL

return false, err
}

_, err = gce.getInstanceFromProjectInZoneByName(project, zone, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create an instanceByProviderID method? Then both InstanceExistsByProviderID and InstanceTypeByProviderID can both call that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

@luomiao
Copy link

luomiao commented Aug 29, 2017

The change at vsphere cloud provider looks good to me.
@BaluDontu @divyenpatel can you also please verify since these functions are recently updated.

@BaluDontu
Copy link
Contributor

@FengyunPan : I was just trying to figure out what is the need for InstanceExistsByProviderID() when kubernetes can call InstanceID() to know the status of a node?

@FengyunPan
Copy link
Author

FengyunPan commented Aug 30, 2017

@BaluDontu That's no need. InstanceExistsByProviderID() just check instance is exist, no need to know the status of a node. I update it soon. Thank you.

@FengyunPan FengyunPan force-pushed the implement-InstanceExistsByProviderID branch from 2858447 to 8dc384b Compare August 30, 2017 02:28
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 30, 2017
@FengyunPan
Copy link
Author

/test pull-kubernetes-e2e-gce-bazel

1 similar comment
@wlan0
Copy link
Member

wlan0 commented Aug 30, 2017

/test pull-kubernetes-e2e-gce-bazel

return false, err
}

return true, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to consider the power state of the Node VM here? Node VM can be obtained from the Inventory but it can be in the powered off state. Powered off Node VM should be immediately deleted by the cloud controller manager.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! If it should be deleted, then that "Power" state should be checked for, and "false" should be returned if the NodeVM is "powered off". @FengyunPan

Copy link
Contributor

@BaluDontu BaluDontu Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu97 : Can you please comment on this. Should the node be deleted by controller manager if a node is powered off? Shouldn't it go into "NotReady" state rather than being deleted from node controller ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@divyenpatel @wlan0 @BaluDontu I do not think we should delete "Power off" state node immediately. In this function, we just check whether node is exist.
Please look this comment: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cloud/node_controller.go#L230
@divyenpatel I know your issuse, we should consider "Power off" state node, but not here, we can not delete "Power off" state node immediately, it is unsafe.
And we have a discuss ( #46442), the discuss is still underway, please take a look and give some comments at there.

@luxas luxas added area/cloudprovider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Sep 1, 2017
@BaluDontu
Copy link
Contributor

@FengyunPan : From vsphere side, the change looks good to us. Its an lgtm from our side.

@FengyunPan
Copy link
Author

@cheftako @jdumars @justinsb @brendandburns @lachie83 @seanknox PTAL
Any thoughts on this PR?

return false, nil
}

return true, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to check the running state.

Also, this duplicates logic in ExternalID, so I propose one calls into the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts sharing code looks complicated, but the logic shouldn't change IMO

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you.

@FengyunPan
Copy link
Author

@justinsb I have checked the status of AWS instance. PTAL, thank you.
@BaluDontu @divyenpatel I have checked the status of VSphere instance, PTAL, thank you.

@luxas
Copy link
Member

luxas commented Oct 9, 2017

@jdumars @brendandburns Please review from the Azure side. Thanks!

@FengyunPan
Copy link
Author

/retest

@FengyunPan
Copy link
Author

ping @jdumars
ping @brendandburns
ping @itowlson
PTAL, thank you.

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 20, 2017
Fix kubernetes#51406
If cloud providers(like aws, gce etc...) implement ExternalID()
and support getting instance by ProviderID , they also implement
InstanceExistsByProviderID().
@FengyunPan FengyunPan force-pushed the implement-InstanceExistsByProviderID branch from db3721b to 462087f Compare October 20, 2017 07:03
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 20, 2017
@itowlson
Copy link
Contributor

Azure implementation looks okay to me.

@FengyunPan
Copy link
Author

ping @luxas @wlan0
Is it ok to approve?

@wlan0
Copy link
Member

wlan0 commented Oct 26, 2017

@FengyunPan yes

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheftako, FengyunPan, wlan0

Associated issue: 51406

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@FengyunPan
Copy link
Author

/test pull-kubernetes-unit

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 51409, 54616). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 55e49ed into kubernetes:master Oct 27, 2017
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 27, 2017

@FengyunPan: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-unit 462087f link /test pull-kubernetes-unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@discordianfish
Copy link
Contributor

It looks like this isn't working in my case and instances get deleted. The providerID looks correct and everything, yet every new node I add gets deleted right away without any error. The only thing the controller-manager logs is:

E1204 10:55:40.596096       1 actual_state_of_world.go:483] Failed to set statusUpdateNeeded to needed true because nodeName="ip-172-20-58-74.ec2.internal"  does not exist
E1204 10:55:40.596332       1 actual_state_of_world.go:497] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="ip-172-20-58-74.ec2.internal"  does not exist

While in the events log I see (different node but same result):

24s         24s          1         ip-172-20-79-61.ec2.internal.14fd183319cda415     Node                  Normal    DeletingNode              controllermanager                         Node ip-172-20-79-61.ec2.internal event: Deleting Node ip-172-20-79-61.ec2.internal because it's not present according to cloud provider

While this is the node object before it got removed: https://gist.github.com/discordianfish/f6ba957db7cb8e2875a8bea0505a5097

I'm going to downgrade kubernetes to see if this fixes it.

@dims
Copy link
Member

dims commented Dec 4, 2017

@discordianfish sounds serious for 1.9, can you please open a bug so we can track it better?

@discordianfish
Copy link
Contributor

@dims Turned out I just missed the KubernetesCluster tags but it took quite long to figure that out due to lack of documentation of the cloud provider itself. The UX is pretty bad, with the cloud intergration just deleting your nodes without giving any clue what might be going on. Would be great if the error returned the filters it used to search for matching instances.

@dims
Copy link
Member

dims commented Dec 4, 2017

@discordianfish thanks for tracking it down. yes good news and bad news :)

cc @justinsb

dims pushed a commit to dims/kubernetes that referenced this pull request Feb 8, 2018
…eExistsByProviderID

Automatic merge from submit-queue (batch tested with PRs 51409, 54616). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Implement InstanceExistsByProviderID() for cloud providers

Fix kubernetes#51406
If cloud providers(like aws, gce etc...) implement ExternalID()
and support getting instance by ProviderID , they also implement
InstanceExistsByProviderID().

/assign wlan0
/assign @luxas

**Release note**:
```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet