Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting iptables kube-proxier causes connections to fail #75360

Open
PaulFurtado opened this Issue Mar 14, 2019 · 11 comments

Comments

5 participants
@PaulFurtado
Copy link

PaulFurtado commented Mar 14, 2019

What happened:
When restarting kube-proxy in iptables mode, there will be several seconds of timeouts because it flushes the KUBE-SERVICES nat chain and several others.

What you expected to happen:
Restarting kube-proxy should not impact traffic in any way.

How to reproduce it (as minimally and precisely as possible):

  1. Run an HTTP server behind a ClusterIP service
  2. Run an http client in a loop making requests to the HTTP server
  3. kill -15 the kube-proxy process on the client node
  4. Connections timeout for a few seconds until kube-proxy re-syncs the iptables rules

Anything else we need to know?:
Running with -v=6 makes it pretty clear what's happening:

I0314 05:35:28.808656       7 server_others.go:174] Tearing down inactive rules.
I0314 05:35:28.808673       7 iptables.go:419] running iptables -C [OUTPUT -t nat -m comment --comment handle ClusterIPs; NOTE: this must be before the NodePort rules -j KUBE-PORTALS-HOST]
I0314 05:35:28.812934       7 iptables.go:419] running iptables -C [PREROUTING -t nat -m comment --comment handle ClusterIPs; NOTE: this must be before the NodePort rules -j KUBE-PORTALS-CONTAINER]
I0314 05:35:28.816683       7 iptables.go:419] running iptables -C [OUTPUT -t nat -m addrtype --dst-type LOCAL -m comment --comment handle service NodePorts; NOTE: this must be the last rule in the chain -j KUBE-NODEPORT-HOST]
I0314 05:35:28.821715       7 iptables.go:419] running iptables -C [PREROUTING -t nat -m addrtype --dst-type LOCAL -m comment --comment handle service NodePorts; NOTE: this must be the last rule in the chain -j KUBE-NODEPORT-CONTAINER]
I0314 05:35:28.825080       7 iptables.go:419] running iptables -C [INPUT -t filter -m comment --comment Ensure that non-local NodePort traffic can flow -j KUBE-NODEPORT-NON-LOCAL]
I0314 05:35:28.826834       7 iptables.go:419] running iptables -F [KUBE-PORTALS-CONTAINER -t nat]
I0314 05:35:28.830456       7 iptables.go:419] running iptables -F [KUBE-PORTALS-HOST -t nat]
I0314 05:35:28.833272       7 iptables.go:419] running iptables -F [KUBE-NODEPORT-HOST -t nat]
I0314 05:35:28.836361       7 iptables.go:419] running iptables -F [KUBE-NODEPORT-CONTAINER -t nat]
I0314 05:35:28.839093       7 iptables.go:419] running iptables -F [KUBE-NODEPORT-NON-LOCAL -t filter]
I0314 05:35:28.840088       7 iptables.go:419] running iptables -C [OUTPUT -t nat -m comment --comment kubernetes service portals -j KUBE-SERVICES]
I0314 05:35:28.843330       7 iptables.go:419] running iptables -D [OUTPUT -t nat -m comment --comment kubernetes service portals -j KUBE-SERVICES]
I0314 05:35:28.847852       7 iptables.go:419] running iptables -C [PREROUTING -t nat -m comment --comment kubernetes service portals -j KUBE-SERVICES]
I0314 05:35:28.850976       7 iptables.go:419] running iptables -D [PREROUTING -t nat -m comment --comment kubernetes service portals -j KUBE-SERVICES]
I0314 05:35:28.855081       7 iptables.go:419] running iptables -C [POSTROUTING -t nat -m comment --comment kubernetes postrouting rules -j KUBE-POSTROUTING]
I0314 05:35:28.857992       7 iptables.go:419] running iptables -D [POSTROUTING -t nat -m comment --comment kubernetes postrouting rules -j KUBE-POSTROUTING]
I0314 05:35:28.862013       7 iptables.go:419] running iptables -F [KUBE-SERVICES -t nat]
I0314 05:35:28.865648       7 iptables.go:419] running iptables -X [KUBE-SERVICES -t nat]
I0314 05:35:28.869143       7 iptables.go:419] running iptables -F [KUBE-POSTROUTING -t nat]
I0314 05:35:28.872606       7 iptables.go:419] running iptables -X [KUBE-POSTROUTING -t nat]
I0314 05:35:29.159612       7 server.go:444] Version: v1.10.11

it says it's cleaning up inactive rules, but these are crucial to the iptables proxier. You can trivially reproduce by just running:

iptables -F KUBE-SERVICES -t nat

Environment:

  • Kubernetes version (use kubectl version): 1.10.11
  • Cloud provider or hardware configuration: amazon
  • OS (e.g: cat /etc/os-release): custom
  • Kernel (e.g. uname -a): 4.14.77-hs623.el6.x86_64
  • Install tools: custom
  • Others: iptables 1.6.2

We're running kube-proxy 1.10.11, but I've confirmed this issue with kube-proxy 1.13.4 too

@PaulFurtado

This comment has been minimized.

Copy link
Author

PaulFurtado commented Mar 14, 2019

@kubernetes/sig-network-bugs

@k8s-ci-robot k8s-ci-robot added sig/network and removed needs-sig labels Mar 14, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Mar 14, 2019

@PaulFurtado: Reiterating the mentions to trigger a notification:
@kubernetes/sig-network-bugs

In response to this:

@kubernetes/sig-network-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Mar 14, 2019

Possibly a bug in the proxy. It flushes iptabes if the proxier is able to use IPVS which seems like the wrong behavior to me if there are shared tables/chains between the IPVS and iptables proxier. One workaround I can think of right now is to unload the ipvs kernel module to skip this check.

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Mar 14, 2019

/assign

@PaulFurtado

This comment has been minimized.

Copy link
Author

PaulFurtado commented Mar 14, 2019

@andrewsykim oh, that's an interesting workaround, I'll give that a shot, thanks!

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Mar 14, 2019

np, I'll try to work on the bug fix in the meantime!

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Mar 14, 2019

/triage unresolved

@PaulFurtado

This comment has been minimized.

Copy link
Author

PaulFurtado commented Mar 15, 2019

@andrewsykim note that unloading the ip_vs module is not actually enough of a workaround because kube-proxy will load the kernel module if it sees that it is available. If I unload the ip_vs module and then rename ip_vs.ko to something else in /lib/modules then it does the right thing.

Dug slightly further: the way it probes for modules is by actually running modprobe. So the simplest hack that allows us to keep ip_vs loaded for other things on the system is to put a modprobe script on its PATH that just always exits 1 for the ip_vs modules. (We can stop mounting the modules dir into the kube-proxy container in our kubernetes clusters, but we also run kube-proxy on non-kubernetes nodes via the init system, so the PATH hack works well enough there). This should hold us over fine until we get to a version with your fix in it.

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Mar 15, 2019

Good to know, thanks for sharing!

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Mar 21, 2019

/assign @vllry

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Mar 21, 2019

Quick update on this issue from today's SIG Network call: we're going to try to get rid of the automatic proxy clean up altogether for v1.14.1 since this is considered a bug. @vllry is working on the KEP & implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.