Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.3] changes around hostname-override for aws cloud provider #3136

Merged
merged 1 commit into from Jan 12, 2023

Conversation

kinarashah
Copy link
Member

@kinarashah kinarashah commented Jan 11, 2023

This PR addresses 2 issues.

Issue 1 - kubelet:
RKE removes hostname-override from kubelet args with RKE >=v1.3.5 for AWS cloud provider enabled clusters https://github.com/rancher/rke/blob/release/v1.4/cluster/plan.go#L497.
kubelet populates kubernetes.io/hostname label using hostname-override and RKE finds the node by relying on this behavior https://github.com/rancher/rke/blob/release/v1.4/k8s/node.go#L55. So this becomes an issue when --node-name/hostnameOverride is not the same as the kubernetes.io/hostname on node. rancher/rancher#37634

For example,

  1. Rancher >=v2.6.4 ec2 node provider clusters:
  • cluster provisions successfully, this is because rancher defaults to the private ip address of the node which is what shows up on the kubernetes.io/hostname label.
  1. Rancher >=v2.6.4 custom cluster or RKE standalone:
  • set --node-name in registration command to a value that doesn't match with the default hostname assigned for eg "hostname -f" or "kinara-ec2"
  • cluster provisioning fails with node xxxx not found error

Fix 1 - kubelet:
Revert the removal of hostname-override in kubelet args. RKE will continue finding the node by matching hostnameOverride (--node-name) to kubernetes.io/hostname label on the node. Node name will still be set by the cloud provider, if it's not consistent with the hostnameOverride then kubelet will show logs like below but this is expected behavior.

"Replacing cloudprovider-reported hostname with overridden hostname" cloudProviderHostname="[ip-xxx-xx-xx-xx.us](http://ip-172-31-6-245.us/)-west-2.compute.internal" overriddenHostname="kinara-ec2"

Issue 2 - kube-proxy:
RKE >= v1.3.5 sets hostname-override for kube-proxy to private DNS of the node by querying the ec2 metadata service. https://github.com/rancher/rke-tools/blob/master/entrypoint.sh#L17
This was introduced because hostname-override set for kube-proxy might not be the same as the name of node object in Kubernetes when cloud provider is enabled. For example, Rancher sets requestedHostname to nodepool-* for node driver provisioned clusters, but name of the node object is the private DNS of the node. This fixed issues like rancher/rancher#30363 and rancher/rancher#22416.

But the node name is not always guaranteed to be the private DNS of node returned by ec2 metadata (maybe when using custom dns or when hostname is set differently), causing issues for other setups. EKS recently fixed this to set it to spec.nodeName kubernetes/kubernetes#61486 (comment), but this is not feasible for RKE.

Fix 2 - kube-proxy:
Make the behavior optional. RKE will now set the hostname-override for kube-proxy to the instance metadata hostname only if useInstanceMetadataHostname is true. It will default to true for node driver clusters and false for custom clusters/RKE standalone. Users will have to set hostname-override correctly when false.

@kinarashah kinarashah marked this pull request as ready for review January 11, 2023 01:56
@kinarashah kinarashah requested review from a team and galal-hussein January 11, 2023 01:57
galal-hussein
galal-hussein previously approved these changes Jan 11, 2023
cluster/plan.go Outdated
@@ -690,7 +688,8 @@ func (c *Cluster) BuildKubeProxyProcess(host *hosts.Host, serviceOptions v3.Kube
} else {
CommandArgs["bind-address"] = host.Address
}
if c.CloudProvider.Name == k8s.AWSCloudProvider {
if c.CloudProvider.Name == k8s.AWSCloudProvider && c.CloudProvider.UseInstanceMetadataHostname != nil && *c.CloudProvider.UseInstanceMetadataHostname {
// rke-tools will inject hostname-override from ec2 instance metadata to match with the spec.nodeName set by cloud provider https://github.com/rancher/rke-tools/blob/master/entrypoint.sh#L17
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but can we change this to permalink (commit id) instead of master?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I’ll update

@snasovich snasovich requested a review from a team January 11, 2023 18:25
Oats87
Oats87 previously approved these changes Jan 11, 2023
jakefhyde
jakefhyde previously approved these changes Jan 11, 2023
Copy link
Collaborator

@snasovich snasovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kinarashah , thank you for addressing my nit. Since that's the only change and others approved before it, feel free to merge with only 1 formal approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants