Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the workaround for kube-proxy's mishandled "hostname-override" flag #9878

Closed

Conversation

seh
Copy link
Contributor

@seh seh commented Aug 16, 2018

Following up from @dlipovetsky's earlier #9663, with all credit due to him for the core idea, offer a shell script workaround for kube-proxy when run on machines whose hostname must differ from the node name, such as when running in AWS with the corresponding cloud provider enabled. kubernetes/kubeadm#857 captures the problem in more detail.

See here for preceding discussion of this approach, refined here slightly.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 16, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: bradamant3

If they are not already assigned, you can assign the PR to them by writing /assign @bradamant3 in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8sio-netlify-preview-bot
Copy link
Collaborator

Deploy preview for kubernetes-io-master-staging ready!

Built with commit ea74837

https://deploy-preview-9878--kubernetes-io-master-staging.netlify.com

@detiber
Copy link
Member

detiber commented Aug 17, 2018

/lgtm
@dlipovetsky ptal
@Bradamant3 could you provide a review?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2018
@dlipovetsky
Copy link
Contributor

This looks good. Thanks for adding the explanatory notes @seh. I do find the JSON patch to be easier to understand, but I think that's no big deal 😄.

@@ -201,41 +200,66 @@ Error from server: Get https://10.19.0.41:10250/containerLogs/default/mysql-ddc6

## Services with externalTrafficPolicy=Local are not reachable

On nodes where the hostname for the kubelet is overridden using the `--hostname-override` option, kube-proxy will default to treating 127.0.0.1 as the node IP, which results in rejecting connections for Services configured for `externalTrafficPolicy=Local`. This situation can be verified by checking the output of `kubectl -n kube-system logs <kube-proxy pod name>`:
On machines where the node name for the kubelet is overridden using the `--hostname-override` option, such that the node name no longer matches the machine's hostname, kube-proxy will default to treating 127.0.0.1 as the node's IP address, which results in rejecting connections for Services configured for `externalTrafficPolicy=Local`. This situation can be verified by checking the output of `kubectl -n kube-system logs <kube-proxy pod name>`:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an extra space between "in" and "rejecting." I can push an update to fix that.

Steven E. Harris added 2 commits August 17, 2018 13:50
At present kube-proxy does not honor its --hostname-override flag when
used in concert with its --config flag; though it reads the former
flag value, it overrides its with values read from the specified
configuration file. In this case that file's "hostnameOverride"
field's value takes precedence, even when absent, yielding an empty
value.

Replace the previous documented workaround with one that customizes a
copy of the configuration file acquired from the ConfigMap.
Capitalize each item as a sentence, and restore intended hanging
indentation of the cgroup driver-related item.
@seh seh force-pushed the doc-workaround-for-kube-proxy-hostname branch from e601116 to ea74837 Compare August 17, 2018 17:51
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2018
@detiber
Copy link
Member

detiber commented Aug 17, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2018
Copy link
Contributor

@zacharysarah zacharysarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seh Great start! ✨ Some feedback for docs quality, otherwise LGTM.

1. Install docker again following instructions

1. Install Docker again following instructions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this paragraph to:

1. Re-install [Docker](/docs/setup/independent/install-kubeadm/#installing-docker).

So:

  1. Re-install Docker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fine suggestion, here and in what follows, but note that I was just trying to adjust the formatting and capitalization here. This is the unfortunate phenomenon of drawing focus only to what one touches, at the expense of all the other parts that were ignored.

Still, I appreciate that you did think through your suggestions, so I'll take them into account when I can make another editing pass here.

[Configure cgroup driver used by kubelet on Master Node](/docs/setup/independent/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node)
for detailed instructions.
1. Change the kubelet config to match the Docker cgroup driver manually.
You can refer to [Configure cgroup driver used by kubelet on Master Node](/docs/setup/independent/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node) for detailed instructions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:

See "[Configure cgroup driver used by kubelet on Master Node](/docs/setup/independent/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node)" for detailed instructions.

So:

See "Configure cgroup driver used by kubelet on Master Node" for detailed instructions.


- control plane Docker containers are crashlooping or hanging. You can check this by running `docker ps` and investigating each container by running `docker logs`.
- Control plane Docker containers are crashlooping or hanging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:

- Control plane Docker containers crash or hang.

@@ -201,41 +200,65 @@ Error from server: Get https://10.19.0.41:10250/containerLogs/default/mysql-ddc6

## Services with externalTrafficPolicy=Local are not reachable

On nodes where the hostname for the kubelet is overridden using the `--hostname-override` option, kube-proxy will default to treating 127.0.0.1 as the node IP, which results in rejecting connections for Services configured for `externalTrafficPolicy=Local`. This situation can be verified by checking the output of `kubectl -n kube-system logs <kube-proxy pod name>`:
On machines where the node name for the kubelet is overridden using the `--hostname-override` option, such that the node name no longer matches the machine's hostname, kube-proxy will default to treating 127.0.0.1 as the node's IP address, which results in rejecting connections for Services configured for `externalTrafficPolicy=Local`. This situation can be verified by checking the output of `kubectl -n kube-system logs <kube-proxy pod name>`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with:

On machines where the `--hostname-override` option overrides the node name so that the kubelet no longer matches the machine's hostname, kube-proxy defaults to treating 127.0.0.1 as the node's IP address, which results in rejecting connections for Services configured for `externalTrafficPolicy=Local`. You can verify this situation by checking the output of `kubectl -n kube-system logs <kube-proxy pod name>`:

So:

On machines where the --hostname-override option overrides the node name so that the kubelet no longer matches the machine's hostname, kube-proxy defaults to treating 127.0.0.1 as the node's IP address, which results in rejecting connections for Services configured for externalTrafficPolicy=Local. You can verify this situation by checking the output of kubectl -n kube-system logs <kube-proxy pod name>:


```sh
W0507 22:33:10.372369 1 server.go:586] Failed to retrieve node info: nodes "ip-10-0-23-78" not found
W0507 22:33:10.372474 1 proxier.go:463] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
```

A workaround for this is to modify the kube-proxy DaemonSet in the following way:
Just as the kubelet accepts a `--hostname-override` flag to use a different name for the node, so too does kube-proxy. Unfortunately, though, due to an unresolved defect, kube-proxy does not honor that flag's value when used in combination with the `--config` flag specifying a configuration file. Hence we can't use kube-proxy's `--hostname-override` flag to tell it the proper name of our node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with:

{{< caution >}}
**Caution:** Do not use kube-proxy's `--hostname-override` flag to indicate the name of a node.

While kube-proxy also accepts a `--hostname-override` flag, an unresolved defect causes kube-proxy not to honor that flag's value when used in combination with the `--config` flag specifying a configuration file.
{{< /caution >}}

So:

Caution: Do not use kube-proxy's --hostname-override flag to indicate the name of a node.

While kube-proxy also accepts a --hostname-override flag, an unresolved defect causes kube-proxy not to honor that flag's value when used in combination with the --config flag specifying a configuration file.

@stevesloka
Copy link
Contributor

@seh I got a PR merged which fixes this problem in kube-proxy (kubernetes/kubernetes#69340). I went to update docs and found your PR was already in progress. Just wanted to update you on this which affects the content of this PR.

@seh
Copy link
Contributor Author

seh commented Nov 18, 2018

@seh I got a PR merged which fixes this problem in kube-proxy (kubernetes/kubernetes#69340).

Almost a month later, @stevesloka, I finally had the opportunity to put your patch to use this morning, and it worked! Thank you for the fix.

I still had to patch the DaemonSet for kube-proxy that kubeadm creates, but at least it's less messy alteration. @anguslees is on to something with his suggestion here.

@mdlinville
Copy link
Contributor

We'd love to accept this, but it has some pending feedback and needs to be rebased to resolve conflicts. Thanks!

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2018
@seh
Copy link
Contributor Author

seh commented Dec 14, 2018

@mistyhacks, as of version 1.13.0, this workaround is no longer necessary. It was pertinent for a few months before then, but not now. I suggest we abandon the change.

@mdlinville mdlinville closed this Dec 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants