Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Controller Manger doesn't query cloud provider for node name, causing the node to be removed #70897

Open
yifan-gu opened this issue Nov 10, 2018 · 40 comments
Assignees
Labels
area/cloudprovider area/provider/aws Issues or PRs related to aws provider area/provider/openstack Issues or PRs related to openstack provider kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.

Comments

@yifan-gu
Copy link
Contributor

yifan-gu commented Nov 10, 2018

What happened:

  • Launch a node with container linux CoreOS-stable-1911.3.0 on aws, with customized ignition configs.
  • The hostname turns to be ip-10-3-18-1, instead of the full private dns ip-10-3-18-1.us-west-1.compute.internal because /etc/hostname is not set.
  • kubelet starts with --cloud-provider=external and skips this code path. So it's not able to set the private dns as the node name.
  • When CCM starts, it tries to read the node name from the node spec.
  • CCM calls GetInstanceProviderID() and fails.
  • CCM calls getNodeAddressesByProviderIDOrName() and fails too.
  • CCM then removes the node.

What you expected to happen:

Since now the kubelet runs with --cloud-provider=external, no one is executing that code path to query the cloud provider to get the node name anymore.
However this code path still needs to be executed by someone to get the correct node name from the cloud provider for the node.
I think the CCM might need to query the cloud provider for the full node hostname in case the hostname given by the kubelet is not the full hostname (in AWS case).

How to reproduce it (as minimally and precisely as possible):
Launch a container linux with non-empty ignition config in the user data, then the hostname won't be the full private-dns.
Then launch kubelet with --cloud-provider=external and CCM will reproduce the issue described above.

Anything else we need to know?:

Environment:

  • Kubernetes version:
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}```
- Cloud provider or hardware configuration: 
AWS
- OS 
```NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1911.3.0
VERSION_ID=1911.3.0
BUILD_ID=2018-11-05-1815
PRETTY_NAME="Container Linux by CoreOS 1911.3.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
  • Kernel:
    Linux ip-10-3-20-13 4.14.78-coreos #1 SMP Mon Nov 5 17:42:07 UTC 2018 x86_64 Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz GenuineIntel GNU/Linux
  • Install tools:
    Internal k8s installer tool based on terraform
  • Others:

This issue can be mitigated by telling the igntion config to set the /etc/hostname to the private-dns (by curl http://169.254.169.254/latest/meta-data/hostname). Or just use the coreos-metadata service.

/cc @Quentin-M

@kubernetes/sig-aws-misc
@andrewsykim

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 10, 2018
@k8s-ci-robot
Copy link
Contributor

@yifan-gu: There are no sig labels on this issue. Please add a sig label by either:

  1. mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
    e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR

  2. specifying the label manually: /sig <group-name>
    e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. See the group list.
The <group-suffix> in method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 10, 2018
@yifan-gu yifan-gu added sig/aws and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 10, 2018
@liggitt liggitt added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. area/cloudprovider labels Nov 11, 2018
@soggiest
Copy link

+1 I've also observed this happening if the hostname isn't set properly.

@yifan-gu
Copy link
Contributor Author

yifan-gu commented Dec 4, 2018

Anyone looking into this?

@andrewsykim
Copy link
Member

Hi @yifan-gu. It's expected that CCM is able to find a node either by its name or provider ID. It'll be hard to break this assumption unfortunately. If the hostname is not matching the node name then we have to expect the user to override the hostname with --hostname or use --provider-id.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 3, 2019
@frittentheke
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 4, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2019
@frittentheke
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2019
@andrewsykim
Copy link
Member

Looks like you will need to set --provider-id on the kubelet for cases like this?

cc @cheftako @mcrute

@k8s-ci-robot k8s-ci-robot added area/provider/aws Issues or PRs related to aws provider and removed sig/aws labels Aug 6, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2019
@cheftako
Copy link
Member

cheftako commented Nov 4, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2019
@cheftako
Copy link
Member

cheftako commented Nov 4, 2019

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 4, 2019
@ehashman
Copy link
Member

This remains an issue on latest k8s. Updated code path in kubelet:

if cloud == nil {
return types.NodeName(hostname), nil
}

I think this also poses a problem for the out-of-tree migration. It seems to me that e.g. the OpenStack legacy cloud provider doesn't guarantee that the node name and hostname match:

// CurrentNodeName implements Instances.CurrentNodeName
// Note this is *not* necessarily the same as hostname.
func (i *Instances) CurrentNodeName(ctx context.Context, hostname string) (types.NodeName, error) {
md, err := getMetadata(i.opts.SearchOrder)
if err != nil {
return "", err
}
return types.NodeName(md.Name), nil
}

So, if one tries to upgrade a node from --cloud-provider=openstack to --cloud-provider=external, the old node name (not necesarily hostname) may not match the new node name (hostname), which will cause the kubelet to be unable to find its node.

From openshift/machine-config-operator#2401 (comment) it sounds like this also affects VMware and AWS depending on configuration. (perhaps @nckturner can confirm impact on AWS)

Wanted to check if this is on the external cloud provider migration radar and if there is a blessed migration path?

@mdbooth
Copy link
Contributor

mdbooth commented Jul 16, 2021

This issue affects both the AWS and OpenStack in-tree cloud providers, both of which return instance.Name in CurrentNodeName(), which is not necessarily the same as kubelet's default of the FQDN hostname.

We can work round this by setting the hostname to be whatever the in-tree cloud provider previously returned. However, this feels like a kludge because:

  • We're changing the hostname of an existing host which may have other consequences, some of which may be outside of our ability to predict if they involve, e.g. third-party drivers.
  • It makes AWS and OpenStack configuration snowflakes
  • It perpetuates the problem beyond the use of the in-tree provider which originally caused it
  • We're not solving the problem, just deferring it indefinitely

An idea I've seen is to request it from the CCM. However that may have a bootstrapping problem as kubelet communicates with the CCM via the API, so it would have to identify itself somehow in order to retrieve the correct information.

I wonder if a simpler idea might be to persist kubelet's node name as local state, e.g. in /etc/kubernetes/node. We could initialise that value appropriately if it doesn't exist, then pass its contents to kubelet via --hostname-override subsequently. We could ensure for upgrade that it is initialised with the current node name for existing nodes, but for all future nodes it is initialised with hostname. This would allow us to eventually remove the upgrade logic, and would also insulate us from cloud configuration changes which affect hostname (although nobody should be doing that anyway).

Or even simpler, launching kubelet with --hostname-override iff /etc/kubernetes/node exists. Then we can create this file only on AWS/OpenStack nodes which previously ran the in-tree cloud provider.

@ehashman
Copy link
Member

/area provider/openstack

Please add additional ones as affected :)

@k8s-ci-robot k8s-ci-robot added the area/provider/openstack Issues or PRs related to openstack provider label Jul 16, 2021
@mdbooth
Copy link
Contributor

mdbooth commented Jul 16, 2021

/area provider/aws

@andrewsykim
Copy link
Member

IMO the right approach is for all kubelets to set the --provider-id flag with the well-known provider ID format of the cloud provider. This way we never have to defer to the name and can rather check the unique ID of instances for existence. However, we can't always assume that users will set that flag and I would agree it's ideally something we can build into the system to do right.

@mdbooth
Copy link
Contributor

mdbooth commented Jul 16, 2021

IMO the right approach is for all kubelets to set the --provider-id flag with the well-known provider ID format of the cloud provider. This way we never have to defer to the name and can rather check the unique ID of instances for existence. However, we can't always assume that users will set that flag and I would agree it's ideally something we can build into the system to do right.

Would you use the provider id as the node name? If not, how would you use it to resolve the node name?

@andrewsykim
Copy link
Member

Would you use the provider id as the node name? If not, how would you use it to resolve the node name?

The provider ID maps to the spec.providerID field on the node object. You can construct it using the well-known format based on the provider, typically something like aws://<instance-id> or simiar

@nckturner
Copy link
Contributor

@andrewsykim does this mean that setting --provider-id is required before upgrade? And if a node's hostname doesn't match its nodename, then kubelets will fail to start, after their control plane has been upgraded to external CCM if that kubelet doesn't set --provider-id? To me that seems like breaking backwards compatibility guarantees, so if we can avoid it we should try to.

@nckturner
Copy link
Contributor

However, we can't always assume that users will set that flag and I would agree it's ideally something we can build into the system to do right.

Nvm, as you mentioned above, users may or may not have this set. I am interested in this suggestion by @mdbooth which sounds like it could get us through the upgrade without breaking existing kubelets:

I wonder if a simpler idea might be to persist kubelet's node name as local state, e.g. in /etc/kubernetes/node.

@andrewsykim
Copy link
Member

andrewsykim commented Jul 16, 2021

Just to be clear this was just for addressing the delete case, existing kubelets should not need --provider-id. And even new kubelets do not need --provider-id. However, having --provider-id set allows to handle the case where the node name is different from the instance name. Technically speaking, if we're just talking about existing kubelets that were using in-tree provider, they should have providerID already set, so you wouldn't need it to set it for them.

@nckturner
Copy link
Contributor

nckturner commented Jul 16, 2021

So the above issue makes it sound like existing nodes will be deleted upon upgrade:

CCM then removes the node.

Am I misunderstanding when this might occur? I read it as: on upgrade, if hostname does not match nodename.

Edit: ok, the existing kubelet case makes sense.

@ehashman
Copy link
Member

Nvm, as you mentioned above, users may or may not have this set. I am interested in this suggestion by @mdbooth which sounds like it could get us through the upgrade without breaking existing kubelets:

I wonder if a simpler idea might be to persist kubelet's node name as local state, e.g. in /etc/kubernetes/node.

This seems error-prone to me and I'd prefer we not add hacks in the kubelet to work around a cloud-provider issue.

@andrewsykim
Copy link
Member

Since it's common in AWS for instances to be named after the private DNS name, I feel like the AWS CCM should just be updated to query nodes both by name instance name and it's private DNS name. Would that solve this issue?

@ehashman
Copy link
Member

Since it's common in AWS for instances to be named after the private DNS name, I feel like the AWS CCM should just be updated to query nodes both by name instance name and it's private DNS name. Would that solve this issue?

That might solve it for AWS but this doesn't just affect AWS; see the examples above involving OpenStack. I am not sure if other cloud providers are also affected.

@mdbooth
Copy link
Contributor

mdbooth commented Jul 19, 2021

Since it's common in AWS for instances to be named after the private DNS name, I feel like the AWS CCM should just be updated to query nodes both by name instance name and it's private DNS name. Would that solve this issue?

I think we're mixing up the delete case you mentioned above. It sounds like this new issue has hijacked a similar but different issue. Sorry!

In the in-tree -> external CCM case, the external CCM is not involved in the bug at all, so unfortunately it's not useful for fixing anything. The issue is in the difference of behaviour between:

  • The in-tree cloud provider
  • The default behaviour of kubelet when there is no in-tree cloud provider

When kubelet does not have an in-tree cloud provider, its default behaviour is to assume that its Node object's name is the FQDN hostname of the host. This is not true if kubelet was previously using the AWS or OpenStack in-tree provider, which provided unqualified names. The result is that kubelet cannot find its Node object and does not start cleanly. For example, because kubelet stops pinging the Node object it becomes stale and the Node is marked NotReady. Static pods are not started (specifically the local coredns; that was a deep and fruitless rabbit hole). Probably other things I didn't get to. It's generally quite sad in ways that the CCM can't fix.

@cheftako
Copy link
Member

/cc @leilajal

@andrewsykim
Copy link
Member

When kubelet does not have an in-tree cloud provider, its default behaviour is to assume that its Node object's name is the FQDN hostname of the host. This is not true if kubelet was previously using the AWS or OpenStack in-tree provider, which provided unqualified names. The result is that kubelet cannot find its Node object and does not start cleanly. For example, because kubelet stops pinging the Node object it becomes stale and the Node is marked NotReady. Static pods are not started (specifically the local coredns; that was a deep and fruitless rabbit hole). Probably other things I didn't get to. It's generally quite sad in ways that the CCM can't fix.

Yes, this is an unfortunate result of the default naming convention when a kubelet is providerless and when it's using the AWS or OpenStack cloud provider. And unfortunately, you can't override hostname on AWS, see #54482.

I think this use-case is not common enough and the solution would be sufficiently complex that we wouldn't try to fix this, but open to hearing other suggestions / alternatives.

@mdbooth
Copy link
Contributor

mdbooth commented Jul 20, 2021

When kubelet does not have an in-tree cloud provider, its default behaviour is to assume that its Node object's name is the FQDN hostname of the host. This is not true if kubelet was previously using the AWS or OpenStack in-tree provider, which provided unqualified names. The result is that kubelet cannot find its Node object and does not start cleanly. For example, because kubelet stops pinging the Node object it becomes stale and the Node is marked NotReady. Static pods are not started (specifically the local coredns; that was a deep and fruitless rabbit hole). Probably other things I didn't get to. It's generally quite sad in ways that the CCM can't fix.

Yes, this is an unfortunate result of the default naming convention when a kubelet is providerless and when it's using the AWS or OpenStack cloud provider. And unfortunately, you can't override hostname on AWS, see #54482.

I think this use-case is not common enough and the solution would be sufficiently complex that we wouldn't try to fix this, but open to hearing other suggestions / alternatives.

Herein lies the problem. When upgrading from in-tree to CCM, which we plan to do some time soon, it will affect 100% of AWS/OpenStack users.

@andrewsykim
Copy link
Member

andrewsykim commented Jul 20, 2021

Herein lies the problem. When upgrading from in-tree to CCM, which we plan to do some time soon, it will affect 100% of AWS/OpenStack users.

Worth clarifying that I was specifically referring to when kubelet goes from --cloud-provider=aws|openstack to no cloud provider at all. This issue does not apply when you go from --cloud-provider=aws|openstack to --cloud-provider=external because the CCM should know how to discover nodes using varying names (like hostname or private DNS)

@mdbooth
Copy link
Contributor

mdbooth commented Jul 20, 2021

Herein lies the problem. When upgrading from in-tree to CCM, which we plan to do some time soon, it will affect 100% of AWS/OpenStack users.

Worth clarifying that I was specifically referring to when kubelet goes from --cloud-provider=aws|openstack to no cloud provider at all. This issue does not apply when you go from --cloud-provider=aws|openstack to --cloud-provider=external because the CCM should know how to discover nodes using varying names (like hostname or private DNS)

To the best of my understanding, --cloud-provider=external and no cloud provider at all are the same to kubelet. It's the same code path. The CCM is not involved at all here. This is purely a kubelet problem. Kubelet doesn't call out to a CCM in the same way that it does, for example, to CNI or CSI. The CCM isn't running on the local node, and all communication is indirect via API objects. The CCM can't help kubelet find its node object.

@andrewsykim
Copy link
Member

The CCM can't help kubelet find its node object.

The CCM can't help find kubelet find the node object, but it can give flexibility for kubelet to use different naming formats as supported by the cloud provider. So in the AWS case, if you switch from --cloud-provider=aws to --cloud-provider=external, and you want to preserve the existing node object before, you can set --hostname-override=<private-dns> with --cloud-provider=external.

@mdbooth
Copy link
Contributor

mdbooth commented Jul 20, 2021

The CCM can't help kubelet find its node object.

The CCM can't help find kubelet find the node object, but it can give flexibility for kubelet to use different naming formats as supported by the cloud provider. So in the AWS case, if you switch from --cloud-provider=aws to --cloud-provider=external, and you want to preserve the existing node object before, you can set --hostname-override=<private-dns> with --cloud-provider=external.

Excellent! I was thinking we could update the script which launches kubelet to look for a local file, e.g. /etc/kubernetes/node, and pass the contents of that file in --hostname-override if it exists. My hope is that we can have, e.g. the CCM operator add a hook... somewhere tbd... to create this file if required with cloud-specific contents. Do you think that would fly?

@andrewsykim
Copy link
Member

andrewsykim commented Jul 20, 2021

Excellent! I was thinking we could update the script which launches kubelet to look for a local file, e.g. /etc/kubernetes/node, and pass the contents of that file in --hostname-override if it exists.

I think this is up to the cluster admin / operator to figure out since this is pertaining to how kubelet eventually ends up running on a given node. The kubelet currently has --hostname-override flag to account for various scenarios per cloud provider, it's just a matter of setting this flag to the correct value

@mdbooth
Copy link
Contributor

mdbooth commented Jul 27, 2021

@andrewsykim Sorry for confusing these 2 issues earlier, btw. Looks like we're going to go with a solution based on --hostname-override.

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Feb 8, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloudprovider area/provider/aws Issues or PRs related to aws provider area/provider/openstack Issues or PRs related to openstack provider kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.
Projects
None yet
Development

No branches or pull requests