Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AWS metadata to retrieve local-hostname in nodeup #12844

Merged
merged 3 commits into from
Nov 27, 2021

Conversation

bwagner5
Copy link
Contributor

Cloud Provider: AWS

This PR changes nodeups method of looking up an instance private DNS name from a DescribeInstances call to an instance metadata lookup. DescribeInstances is prone to throttling with large clusters and can cause significant delays when a node is trying to join a cluster.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 27, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @bwagner5. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 27, 2021
@rifelpet
Copy link
Member

This would break clusters with IMDS disabled. I think it would be reasonable to try IMDS first and fallback to DescribeInstances.

@hakman
Copy link
Member

hakman commented Nov 27, 2021

This would break clusters with IMDS disabled. I think it would be reasonable to try IMDS first and fallback to DescribeInstances.

Can / should a kOps cluster work without IMDS?
Previous implementation also used IMDS to get metadata://aws/meta-data/instance-id.

@hakman
Copy link
Member

hakman commented Nov 27, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 27, 2021
@hakman
Copy link
Member

hakman commented Nov 27, 2021

Thanks @bwagner5. Seems like a good improvement from my point of view.
@rifelpet Leaving this to you for the final review.

@hakman
Copy link
Member

hakman commented Nov 27, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 27, 2021
@bwagner5
Copy link
Contributor Author

This would break clusters with IMDS disabled. I think it would be reasonable to try IMDS first and fallback to DescribeInstances.

Can / should a kOps cluster work without IMDS?
Previous implementation also used IMDS to get metadata://aws/meta-data/instance-id.

There's some other places that use IMDS in nodeup as well so it might be worth going through and figuring out if it's reasonable to perform a fallback if IMDS is disabled everywhere. But it's almost impossible to do really large scale-ups if you're not using IMDS on node bootstrap so I'd lean towards getting rid of the API calls and requiring IMDS.

I have a version of nodeup that does zero API calls and relies only on IMDS and OS calls for node info rather than DescribeInstanceTypes. I may try to PR some of those after they're cleaned up.

@johngmyers
Copy link
Member

Don't you have to use IMDS to get the instance role credentials to authenticate to kops-controller? I don't think it's going to be possible to bootstrap without IMDS.

@johngmyers
Copy link
Member

/assign @rifelpet

@hakman
Copy link
Member

hakman commented Nov 27, 2021

There's some other places that use IMDS in nodeup as well so it might be worth going through and figuring out if it's reasonable to perform a fallback if IMDS is disabled everywhere. But it's almost impossible to do really large scale-ups if you're not using IMDS on node bootstrap so I'd lean towards getting rid of the API calls and requiring IMDS.

In any case one would need instance ID, which is pretty hard to get without IMDS, right?

I have a version of nodeup that does zero API calls and relies only on IMDS and OS calls for node info rather than DescribeInstanceTypes. I may try to PR some of those after they're cleaned up.

That would be very nice.
I think there are also some places where metadata api is called directly instead of using vfs to read it. It may be worth a PR to change all to use vfs.

@olemarkus
Copy link
Member

In any case one would need instance ID, which is pretty hard to get without IMDS, right?

Instances with RBN enabled can just get this from os.hostname :)

Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like IMDS is required at this point then. This lgtm and I agree with proposed follow up PRs.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 27, 2021
@rifelpet
Copy link
Member

Should we remove eco:DescribeInstances from the node IAM policy? Users may be depending on it for other workloads though, and having them add it back makes the policy size larger since it would become its own statement.

@k8s-ci-robot k8s-ci-robot merged commit 2b059a0 into kubernetes:master Nov 27, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Nov 27, 2021
@johngmyers
Copy link
Member

Doesn't kubelet require DescribeInstances?

@olemarkus
Copy link
Member

We can remove it from the nodeup function (if it is there). Kubelet still needs it if the in-tree CCM is used, but it should be a part of the CCM function too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants