Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In kubeproxy ipvs mode UDP traffic to Loadbalancer IP fails after node reboot #105192

Closed
VivekThrivikraman-est opened this issue Sep 22, 2021 · 7 comments · Fixed by #105249
Closed
Assignees
Labels
area/ipvs area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@VivekThrivikraman-est
Copy link

VivekThrivikraman-est commented Sep 22, 2021

What happened:

In ipvs mode client traffic(UDP) to a pod behind LoadbalancerIP is blackholed after node having the pod is rebooted.
The issue seems to be similar to the below iptable issue, since client continuously tries to connect to the loadbalancerIP (even before kubeproxy has applied all rules), and we see that stale conntracks entries remain even after rules are applied. After the stale conntrack entries are cleaned manually the traffic starts flowing :
https://github.com/kubernetes/kubernetes/pull/104151/files

From the below code looks like ipvs currently clears stale conntrack entries for ExternalIPs but not LoadbalancerIP(?):
https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#:~:text=svcInfo.ClusterIP().String())-,for%20_%2C%20extIP%20%3A%3D%20range%20svcInfo.ExternalIPStrings()%20%7B,-staleServices.Insert(extIP

What you expected to happen:

UDP traffic between client and pod(behind Loadbalancer IP) should have been established after node reboot.

How to reproduce it (as minimally and precisely as possible):

  1. Start a POD acting as a UDP server and expose it through Loadbalancer service
  2. Make a UDP client to continuously send traffic to the LoadbalancerIP
  3. Restart the node having the pod.
  4. Even after the new pod serving UDP is up, the UDP traffic from client would be blackholed(until stale conntrack entries are cleared).

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.21.1
  • Cloud provider or hardware configuration:
    Manufacturer: Dell Inc.
    Product Name: PowerEdge R640
    Version: Not Specified
  • OS (e.g: cat /etc/os-release): SUSE Linux Enterprise Server 15 SP2
  • Kernel (e.g. uname -a): Linux pool16-n108-wk16-n080 5.3.18-24.75.3.22886.0.PTF.1187468-default # 1 SMP Thu Sep 9 23:24:48 UTC 2021 (37ce29d) x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Network plugin and version (if this is a network-related bug): Calico v3.19.1-12-baa55cf9
  • Others:
@VivekThrivikraman-est VivekThrivikraman-est added the kind/bug Categorizes issue or PR as related to a bug. label Sep 22, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 22, 2021
@VivekThrivikraman-est
Copy link
Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 22, 2021
@VivekThrivikraman-est
Copy link
Author

@uablrek @aojea

@uablrek
Copy link
Contributor

uablrek commented Sep 22, 2021

/assign

I'll try to reproduce this using ctraffic but it may take some days.

@aojea
Copy link
Member

aojea commented Sep 22, 2021

yeah, ipvs kube-proxy doesn't have some of the latest "conntrack stale" fixes like #104151 , mainly because I'm not very familiar with IPVS and I can't know if all of them are needed or only some of them.

@khenidak
Copy link
Contributor

/triage accepted

@aojea is it a matter of inserting externalIPs in conntrack?

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 24, 2021
@uablrek
Copy link
Contributor

uablrek commented Sep 25, 2021

@khenidak No, when a node is rebootes and udp messages to a loadBalancerIP enters before kube-proxy is started, then invalid (UNREPLIED) conntrack entrys are created. They will black-hole traffic, and since traffic keeps coming they will never timeout.

This is not easy to reproduce. I found it best to direct incoming traffic to a node (2), and have one endpoint pod on another node (4). Then reboot the node where traffic enters (and let the pod live) while traffic is sent continously. After reboot, check the conntrack on the rebooted node.

# conntrack -p udp -L
udp      17 29 src=1000::1:c0a8:1c9 dst=1000:: sport=52245 dport=5003 [UNREPLIED] src=1000:: dst=1000::1:c0a8:1c9 sport=5003 dport=52245 mark=0 use=1
udp      17 29 src=1000::1:c0a8:1c9 dst=1000:: sport=60276 dport=5003 [UNREPLIED] src=1000:: dst=1000::1:c0a8:1c9 sport=5003 dport=60276 mark=0 use=1
udp      17 119 src=1000::1:c0a8:1c9 dst=1000:: sport=60719 dport=5003 src=1100::402 dst=1000::1:c0a8:102 sport=5003 dport=62141 [ASSURED] mark=0 use=1
udp      17 29 src=1000::1:c0a8:1c9 dst=1000:: sport=43534 dport=5003 [UNREPLIED] src=1000:: dst=1000::1:c0a8:1c9 sport=5003 dport=43534 mark=0 use=1
udp      17 29 src=1000::1:c0a8:1c9 dst=1000:: sport=50938 dport=5003 [UNREPLIED] src=1000:: dst=1000::1:c0a8:1c9 sport=5003 dport=50938 mark=0 use=1
udp      17 119 src=1000::1:c0a8:1c9 dst=1000:: sport=49618 dport=5003 src=1100::402 dst=1000::1:c0a8:102 sport=5003 dport=19547 [ASSURED] mark=0 use=1
udp      17 29 src=1000::1:c0a8:1c9 dst=1000:: sport=56135 dport=5003 [UNREPLIED] src=1000:: dst=1000::1:c0a8:1c9 sport=5003 dport=56135 mark=0 use=1
udp      17 29 src=1000::1:c0a8:1c9 dst=1000:: sport=52045 dport=5003 [UNREPLIED] src=1000:: dst=1000::1:c0a8:1c9 sport=5003 dport=52045 mark=0 use=1
udp      17 119 src=1000::1:c0a8:1c9 dst=1000:: sport=39766 dport=5003 src=1100::402 dst=1000::1:c0a8:102 sport=5003 dport=22024 [ASSURED] mark=0 use=1
udp      17 29 src=1000::1:c0a8:1c9 dst=1000:: sport=44823 dport=5003 [UNREPLIED] src=1000:: dst=1000::1:c0a8:1c9 sport=5003 dport=44823 mark=0 use=1
conntrack v1.4.5 (conntrack-tools): 10 flow entries have been shown.

In this example 3 out of 10 connections are ok. Those are the fortunate ones that didn't send a packet in the small time span from node start to kube-proxy start. (about 1 pkt/sec is sent on all 10 connections)

@uablrek
Copy link
Contributor

uablrek commented Sep 25, 2021

/area kube-proxy
/area ipvs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipvs area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants