Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][TEST} debug kubeproxy iptables lock #92811

Closed
wants to merge 4 commits into from
Closed

Conversation

aojea
Copy link
Member

@aojea aojea commented Jul 5, 2020

No description provided.

@k8s-ci-robot
Copy link
Contributor

@aojea: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 5, 2020
@k8s-ci-robot k8s-ci-robot requested review from bowei and MrHohn July 5, 2020 20:43
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 5, 2020
@aojea
Copy link
Member Author

aojea commented Jul 5, 2020

/test pull-kubernetes-e2e-kind-ipv6

@aojea
Copy link
Member Author

aojea commented Jul 6, 2020

ok, one interesting failure here
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/92811/pull-kubernetes-conformance-kind-ga-only-parallel/1279878461086765057

1/5208 Tests Failed. | expand_less
-- | --
[sig-network] Services should have session affinity timeout work for NodePort service [LinuxOnly] [Conformance] expand_less4m59stest/e2e/framework/framework.go:597 Jul  5 21:40:03.634: Unexpected error:     <*errors.errorString \| 0xc0005e6f50>: {         s: "service is not reachable within 2m0s timeout on endpoint 172.18.0.2:31750 over TCP protocol",     }     service is not reachable within 2m0s timeout on endpoint 172.18.0.2:31750 over TCP protocol occurred test/e2e/network/service.go:3313 | [sig-network] Services should have session affinity timeout work for NodePort service [LinuxOnly] [Conformance] expand_less | 4m59s | test/e2e/framework/framework.go:597 Jul  5 21:40:03.634: Unexpected error:     <*errors.errorString \| 0xc0005e6f50>: {         s: "service is not reachable within 2m0s timeout on endpoint 172.18.0.2:31750 over TCP protocol",     }     service is not reachable within 2m0s timeout on endpoint 172.18.0.2:31750 over TCP protocol occurred test/e2e/network/service.go:3313
[sig-network] Services should have session affinity timeout work for NodePort service [LinuxOnly] [Conformance] expand_less | 4m59s
test/e2e/framework/framework.go:597 Jul  5 21:40:03.634: Unexpected error:     <*errors.errorString \| 0xc0005e6f50>: {         s: "service is not reachable within 2m0s timeout on endpoint 172.18.0.2:31750 over TCP protocol",     }     service is not reachable within 2m0s timeout on endpoint 172.18.0.2:31750 over TCP protocol occurred test/e2e/network/service.go:3313

problem is kube-proxy does not install the nodeport rules, instead is rejecting the traffic to the nodeport

2020-07-05T21:40:11.472894763Z stderr F I0705 21:40:11.469510       1 proxier.go:1680] Opened local port "nodePort for services-1776/affinity-nodeport-timeout" (:31750/tcp)
2020-07-05T21:40:11.4729044Z stderr F I0705 21:40:11.469527       1 traffic.go:91] [DetectLocalByCIDR (10.244.0.0/16)] Jump Not Local: [-A KUBE-SERVICES -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp -p tcp -d 10.96.0.10/32 --dport 53 ! -s 10.244.0.0/16 -j KUBE-MARK-MASQ]
2020-07-05T21:40:11.472937524Z stderr F I0705 21:40:11.469553       1 traffic.go:91] [DetectLocalByCIDR (10.244.0.0/16)] Jump Not Local: [-A KUBE-SERVICES -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp -p tcp -d 10.96.0.10/32 --dport 9153 ! -s 10.244.0.0/16 -j KUBE-MARK-MASQ]
2020-07-05T21:40:11.472944435Z stderr F I0705 21:40:11.469579       1 traffic.go:91] [DetectLocalByCIDR (10.244.0.0/16)] Jump Not Local: [-A KUBE-SERVICES -m comment --comment "default/kubernetes:https cluster IP" -m tcp -p tcp -d 10.96.0.1/32 --dport 443 ! -s 10.244.0.0/16 -j KUBE-MARK-MASQ]
2020-07-05T21:40:11.472953475Z stderr F I0705 21:40:11.469754       1 proxier.go:1567] Restoring iptables rules: *filter
2020-07-05T21:40:11.472978675Z stderr F :KUBE-SERVICES - [0:0]
2020-07-05T21:40:11.472984563Z stderr F :KUBE-EXTERNAL-SERVICES - [0:0]
2020-07-05T21:40:11.472990554Z stderr F :KUBE-FORWARD - [0:0]
2020-07-05T21:40:11.472996638Z stderr F -A KUBE-SERVICES -m comment --comment "services-1776/affinity-nodeport-timeout has no endpoints" -m tcp -p tcp -d 10.110.216.198/32 --dport 80 -j REJECT
2020-07-05T21:40:11.473019145Z stderr F -A KUBE-EXTERNAL-SERVICES -m comment --comment "services-1776/affinity-nodeport-timeout has no endpoints" -m addrtype --dst-type LOCAL -m tcp -p tcp --dport 31750 -j REJECT
2020-07-05T21:40:11.473034603Z stderr F -A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
2020-07-05T21:40:11.47306164Z stderr F -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x00004000/0x00004000 -j ACCEPT
2020-07-05T21:40:11.473067979Z stderr F -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
2020-07-05T21:40:11.473074652Z stderr F -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
2020-07-05T21:40:11.473080719Z stderr F COMMIT
2020-07-05T21:40:11.473088162Z stderr F *nat

@aojea
Copy link
Member Author

aojea commented Jul 6, 2020

@BenTheElder FYI, I think that in kind we are being to agressive with iptables and this is causing contention when there is a big number of changes, that may explain the iptables lock issue

diff between kube-prox GCE config

I0705 20:58:01.848313       1 proxier.go:328] iptables(IPv4) sync params: minSyncPeriod=10s, syncPeriod=1m0s, burstSyncs=2

and KIND config

2020-07-05T21:25:18.696487533Z stderr F I0705 21:25:18.693778       1 proxier.go:328] iptables(IPv4) sync params: minSyncPeriod=0s, syncPeriod=30s, burstSyncs=2

@aojea
Copy link
Member Author

aojea commented Jul 6, 2020

/retest

@aojea
Copy link
Member Author

aojea commented Jul 7, 2020

/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-e2e-kind

@aojea
Copy link
Member Author

aojea commented Jul 7, 2020

/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-e2e-kind

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 8, 2020
Antonio Ojea and others added 3 commits July 8, 2020 10:59
Currently kube-proxy defaults the min-sync-period for
iptables to 0. However, as explained by Dan Winship,

"With minSyncPeriod: 0, you run iptables-restore 100 times.
 With minSyncPeriod: 1s , you run iptables-restore once.
 With minSyncPeriod: 10s , you also run iptables-restore once,
 but you might have to wait 10 seconds first"
the iptables monitor was using iptables -L to list the chains,
without the -n option, so it was trying to do reverse DNS lookups.
A side effect is that it was holding the lock, so other components
could not use it.
We can use -S instead of -L -n to avoid this, since we only want
to check the chain exists. Also, avoid to hold the lock in this
operation because we can fail fast and retry later.

root@kind-control-plane:/# time iptables -t nat -L > /dev/null

real	0m5.154s
user	0m0.004s
sys	0m0.005s
root@kind-control-plane:/# time iptables -t nat -L -n > /dev/null

real	0m0.002s
user	0m0.000s
sys	0m0.002s
@aojea
Copy link
Member Author

aojea commented Jul 8, 2020

/test pull-kubernetes-e2e-kind-ipv6

@aojea
Copy link
Member Author

aojea commented Jul 8, 2020

/test pull-kubernetes-e2e-kind

@aojea
Copy link
Member Author

aojea commented Jul 8, 2020

/test pull-kubernetes-e2e-kind

@aojea
Copy link
Member Author

aojea commented Jul 8, 2020

/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-e2e-kind

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/kubelet and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 9, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea
To complete the pull request process, please assign thockin
You can assign the PR to them by writing /assign @thockin in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jul 9, 2020
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 9, 2020

@aojea: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-bazel-test a3e1877 link /test pull-kubernetes-bazel-test
pull-kubernetes-e2e-gce-ubuntu-containerd a3e1877 link /test pull-kubernetes-e2e-gce-ubuntu-containerd
pull-kubernetes-e2e-kind a3e1877 link /test pull-kubernetes-e2e-kind
pull-kubernetes-e2e-gce-device-plugin-gpu a3e1877 link /test pull-kubernetes-e2e-gce-device-plugin-gpu
pull-kubernetes-verify a3e1877 link /test pull-kubernetes-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@aojea
Copy link
Member Author

aojea commented Jul 9, 2020

@BenTheElder and the winner holding the lock is .... the portmap plugin

wget https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/92811/pull-kubernetes-e2e-kind/1281213384548683778/artifacts/logs/kind-worker/kubelet.log

grep IPTABLES kubelet.log | grep -v ms |  awk '{ print $NF,$0 }' | sort -k1,1 -n | cut -f2- -d' '
(snipped)
Jul 09 14:02:47 kind-worker kubelet[767]: I0709 14:02:47.983562     767 server.go:1336] IPTABLES lock acquired by Pid: 148823 Comm: (ip6tables) PPid: 147327 PComm: (portmap) ElapsedTime: 12.807867801s
Jul 09 14:01:32 kind-worker kubelet[767]: I0709 14:01:32.837867     767 server.go:1336] IPTABLES lock acquired by Pid: 140811 Comm: (ip6tables) PPid: 140644 PComm: (portmap) ElapsedTime: 12.839015454s
Jul 09 14:01:32 kind-worker kubelet[767]: I0709 14:01:32.790118     767 server.go:1336] IPTABLES lock acquired by Pid: 140660 Comm: (iptables) PPid: 139580 PComm: (portmap) ElapsedTime: 12.843264998s
Jul 09 14:02:49 kind-worker kubelet[767]: I0709 14:02:49.116133     767 server.go:1336] IPTABLES lock acquired by Pid: 148823 Comm: (ip6tables) PPid: 147327 PComm: (portmap) ElapsedTime: 13.940434659s
Jul 09 14:01:34 kind-worker kubelet[767]: I0709 14:01:34.144748     767 server.go:1336] IPTABLES lock acquired by Pid: 140811 Comm: (ip6tables) PPid: 140644 PComm: (portmap) ElapsedTime: 14.145899462s
Jul 09 14:01:35 kind-worker kubelet[767]: I0709 14:01:35.259976     767 server.go:1336] IPTABLES lock acquired by Pid: 140811 Comm: (ip6tables) PPid: 140644 PComm: (portmap) ElapsedTime: 15.261130135s
Jul 09 14:02:50 kind-worker kubelet[767]: I0709 14:02:50.517041     767 server.go:1336] IPTABLES lock acquired by Pid: 148823 Comm: (ip6tables) PPid: 147327 PComm: (portmap) ElapsedTime: 15.341346641s
Jul 09 14:01:36 kind-worker kubelet[767]: I0709 14:01:36.372998     767 server.go:1336] IPTABLES lock acquired by Pid: 140811 Comm: (ip6tables) PPid: 140644 PComm: (portmap) ElapsedTime: 16.374148888s
Jul 09 14:01:37 kind-worker kubelet[767]: I0709 14:01:37.471170     767 server.go:1336] IPTABLES lock acquired by Pid: 140811 Comm: (ip6tables) PPid: 140644 PComm: (portmap) ElapsedTime: 17.472321233s
Jul 09 14:01:38 kind-worker kubelet[767]: I0709 14:01:38.573475     767 server.go:1336] IPTABLES lock acquired by Pid: 140811 Comm: (ip6tables) PPid: 140644 PComm: (portmap) ElapsedTime: 18.574628968s
Jul 09 14:01:39 kind-worker kubelet[767]: I0709 14:01:39.770235     767 server.go:1336] IPTABLES lock acquired by Pid: 140811 Comm: (ip6tables) PPid: 140644 PComm: (portmap) ElapsedTime: 19.771386831s

@k8s-ci-robot
Copy link
Contributor

@aojea: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2020
@aojea
Copy link
Member Author

aojea commented Jul 13, 2020

more findings about this, it seems that when a pod is deleted the portmap is executed 3 times, despite it does not have portmaps configured, if we add that to the fact that it executes iptables and ip6tables it means we hold the lock 6 times.

how to repro

  1. move the portmap plugin to portmap.bin
  2. create a wrapper script to log the calls
docker exec -it kind-control-plane cat /opt/cni/bin/portmap
#!/bin/sh
#!/bin/sh
echo $(date +"%T") >> /tmp/debug_portmap
echo "CNI: $CNI_COMMAND $CNI_IFNAME $CNI_NETNS $CNI_CONTAINERID"  >> /tmp/debug_portmap
tee -a /tmp/debug_portmap | /opt/cni/bin/portmap.bin
echo "\n---------------" >> /tmp/debug_portmap
  1. create a pod kubectl run nginx --image=nginx --port=80
CNI: ADD eth0 /var/run/netns/cni-5f0b2fce-72a5-6108-092f-241e4030e484 47aa93243add946867e7acb33291a483ea19f417bef324665c2f9669ede0b703
{"capabilities":{"portMappings":true},"cniVersion":"0.3.1","name":"kindnet","prevResult":{"cniVersion":"0.3.1","dns":{},"interfaces":[{"mac":"86:36:f5:fa:5b:9e","name":"vethf01dee95"},{"mac":"02:4c:5d:31:c8:a1","name":"eth0","sandbox":"/var/run/netns/cni-5f0b2fce-72a5-6108-092f-241e4030e484"}],"ips":[{"address":"10.244.0.21/24","gateway":"10.244.0.1","interface":1,"version":"4"}],"routes":[{"dst":"0.0.0.0/0"}]},"runtimeConfig":{"portMappings":null},"type":"portmap"}
---------------
  1. delete the pod kubectl delete deployment nginx
09:39:44
CNI: DEL eth0 /var/run/netns/cni-5f0b2fce-72a5-6108-092f-241e4030e484 47aa93243add946867e7acb33291a483ea19f417bef324665c2f9669ede0b703
{"capabilities":{"portMappings":true},"cniVersion":"0.3.1","name":"kindnet","runtimeConfig":{"portMappings":null},"type":"portmap"}
---------------
09:39:45
CNI: DEL eth0  47aa93243add946867e7acb33291a483ea19f417bef324665c2f9669ede0b703
{"capabilities":{"portMappings":true},"cniVersion":"0.3.1","name":"kindnet","runtimeConfig":{"portMappings":null},"type":"portmap"}
---------------
09:39:46
CNI: DEL eth0  47aa93243add946867e7acb33291a483ea19f417bef324665c2f9669ede0b703
{"capabilities":{"portMappings":true},"cniVersion":"0.3.1","name":"kindnet","runtimeConfig":{"portMappings":null},"type":"portmap"}
---------------

if we configure a hostPort

09:39:43
CNI: ADD eth0 /var/run/netns/cni-0b43b484-5f8c-f5ee-45fc-799d285e8c52 587c678bc086c87e4e95748ac2e1a186d426d8ada6758450f7b8847a53bfc482
{"capabilities":{"portMappings":true},"cniVersion":"0.3.1","name":"kindnet","prevResult":{"cniVersion":"0.3.1","dns":{},"interfaces":[{"mac":"fe:ea:f7:1e:f5:7b","name":"vethecffc0b3"},{"mac":"72:69:36:b7:bb:1d","name":"eth0","sandbox":"/var/run/netns/cni-0b43b484-5f8c-f5ee-45fc-799d285e8c52"}],"ips":[{"address":"10.244.0.22/24","gateway":"10.244.0.1","interface":1,"version":"4"}],"routes":[{"dst":"0.0.0.0/0"}]},"runtimeConfig":{"portMappings":[{"HostPort":8080,"ContainerPort":80,"Protocol":"tcp","HostIP":""}]},"type":"portmap"}

we can observe that is executed one time more with a delay

09:42:41
CNI: DEL eth0 /var/run/netns/cni-0b43b484-5f8c-f5ee-45fc-799d285e8c52 587c678bc086c87e4e95748ac2e1a186d426d8ada6758450f7b8847a53bfc482
{"capabilities":{"portMappings":true},"cniVersion":"0.3.1","name":"kindnet","runtimeConfig":{"portMappings":[{"HostPort":8080,"ContainerPort":80,"Protocol":"tcp","HostIP":""}]},"type":"portmap"}
---------------
09:42:42
CNI: DEL eth0  587c678bc086c87e4e95748ac2e1a186d426d8ada6758450f7b8847a53bfc482
{"capabilities":{"portMappings":true},"cniVersion":"0.3.1","name":"kindnet","runtimeConfig":{"portMappings":[{"HostPort":8080,"ContainerPort":80,"Protocol":"tcp","HostIP":""}]},"type":"portmap"}
---------------
09:42:43
CNI: DEL eth0  587c678bc086c87e4e95748ac2e1a186d426d8ada6758450f7b8847a53bfc482
{"capabilities":{"portMappings":true},"cniVersion":"0.3.1","name":"kindnet","runtimeConfig":{"portMappings":[{"HostPort":8080,"ContainerPort":80,"Protocol":"tcp","HostIP":""}]},"type":"portmap"}
---------------
09:43:31
CNI: DEL eth0  587c678bc086c87e4e95748ac2e1a186d426d8ada6758450f7b8847a53bfc482
{"capabilities":{"portMappings":true},"cniVersion":"0.3.1","name":"kindnet","runtimeConfig":{"portMappings":[{"HostPort":8080,"ContainerPort":80,"Protocol":"tcp","HostIP":""}]},"type":"portmap"}
---------------

so, it seems containerd tears down the network multiple times

Jul 13 09:13:15 kind-control-plane containerd[25550]: time="2020-07-13T09:13:15.937380950Z" level=info msg="TearDown network for sandbox \"b78efe047b461be029f0b2e8a08a1f478bfaee97a57c253945e5ca4289e38014\" successfully"
Jul 13 09:13:16 kind-control-plane containerd[25550]: time="2020-07-13T09:13:16.859755495Z" level=info msg="TearDown network for sandbox \"b78efe047b461be029f0b2e8a08a1f478bfaee97a57c253945e5ca4289e38014\" successfully"
Jul 13 09:13:17 kind-control-plane containerd[25550]: time="2020-07-13T09:13:17.696627579Z" level=info msg="TearDown network for sandbox \"b78efe047b461be029f0b2e8a08a1f478bfaee97a57c253945e5ca4289e38014\" successfully"
Jul 13 09:13:17 kind-control-plane containerd[25550]: time="2020-07-13T09:13:17.861819714Z" level=info msg="TearDown network for sandbox \"b78efe047b461be029f0b2e8a08a1f478bfaee97a57c253945e5ca4289e38014\" successfully"
Jul 13 09:13:30 kind-control-plane containerd[25550]: time="2020-07-13T09:13:30.861008355Z" level=info msg="TearDown network for sandbox \"b78efe047b461be029f0b2e8a08a1f478bfaee97a57c253945e5ca4289e38014\" successfully"

full log here
https://pastebin.com/9d83jMvj

if we log the number of iptables called issued using a tool like execsnoop, we can see that the portmap plugin executes multiples iptables and ip6tables calls using the --wait flag without any timeout, that means wait forever

sudo ./execsnoop.bt | grep -E ip6?tables

84112      78102 /usr/sbin/iptables --version
84113      78103 /usr/sbin/iptables -t nat -S OUTPUT --wait
84115      78104 /usr/sbin/ip6tables --version
84116      78105 /usr/sbin/ip6tables -t nat -S OUTPUT --wait
84117      78106 /usr/sbin/iptables -t nat -N CNI-DN-9d924da12ab9b7dd58f9a --wait
84118      78107 /usr/sbin/iptables -t nat -F CNI-DN-9d924da12ab9b7dd58f9a --wait
84119      78108 /usr/sbin/iptables -t nat -S CNI-HOSTPORT-DNAT --wait
84121      78109 /usr/sbin/iptables -t nat -D CNI-HOSTPORT-DNAT -p tcp -m comment --comment dnat name: "kindnet" id: "034ae066e50dbfd10de36326a95ce2ac23276958e2408c02eb5a3c56732b6574" -m multiport --dports 8080 -j
84125      78110 /usr/sbin/iptables -t nat -X CNI-DN-9d924da12ab9b7dd58f9a --wait
84128      78111 /usr/sbin/iptables -t nat -N CNI-SN-9d924da12ab9b7dd58f9a --wait
84130      78112 /usr/sbin/iptables -t nat -S CNI-HOSTPORT-SNAT --wait
84131      78113 /usr/sbin/iptables -t nat -X CNI-SN-9d924da12ab9b7dd58f9a --wait
84132      78114 /usr/sbin/ip6tables -t nat -N CNI-DN-9d924da12ab9b7dd58f9a --wait
84133      78115 /usr/sbin/ip6tables -t nat -S CNI-HOSTPORT-DNAT --wait
84134      78116 /usr/sbin/ip6tables -t nat -X CNI-DN-9d924da12ab9b7dd58f9a --wait
84136      78117 /usr/sbin/ip6tables -t nat -N CNI-SN-9d924da12ab9b7dd58f9a --wait
84137      78118 /usr/sbin/ip6tables -t nat -S CNI-HOSTPORT-SNAT --wait
84138      78119 /usr/sbin/ip6tables -t nat -X CNI-SN-9d924da12ab9b7dd58f9a --wait
84250      78155 /usr/sbin/iptables --version
84252      78156 /usr/sbin/iptables -t nat -S OUTPUT --wait
84253      78157 /usr/sbin/ip6tables --version
84254      78158 /usr/sbin/ip6tables -t nat -S OUTPUT --wait
84255      78159 /usr/sbin/iptables -t nat -N CNI-DN-9d924da12ab9b7dd58f9a --wait
84256      78160 /usr/sbin/iptables -t nat -S CNI-HOSTPORT-DNAT --wait
84257      78161 /usr/sbin/iptables -t nat -X CNI-DN-9d924da12ab9b7dd58f9a --wait
84258      78162 /usr/sbin/iptables -t nat -N CNI-SN-9d924da12ab9b7dd58f9a --wait
84259      78163 /usr/sbin/iptables -t nat -S CNI-HOSTPORT-SNAT --wait
84260      78164 /usr/sbin/iptables -t nat -X CNI-SN-9d924da12ab9b7dd58f9a --wait
84261      78165 /usr/sbin/ip6tables -t nat -N CNI-DN-9d924da12ab9b7dd58f9a --wait
84263      78166 /usr/sbin/ip6tables -t nat -S CNI-HOSTPORT-DNAT --wait
84264      78167 /usr/sbin/ip6tables -t nat -X CNI-DN-9d924da12ab9b7dd58f9a --wait
84265      78168 /usr/sbin/ip6tables -t nat -N CNI-SN-9d924da12ab9b7dd58f9a --wait
84266      78169 /usr/sbin/ip6tables -t nat -S CNI-HOSTPORT-SNAT --wait
84267      78170 /usr/sbin/ip6tables -t nat -X CNI-SN-9d924da12ab9b7dd58f9a --wait
85250      78243 /usr/sbin/iptables --version
85251      78244 /usr/sbin/iptables -t nat -S OUTPUT --wait
85253      78245 /usr/sbin/ip6tables --version
85253      78246 /usr/sbin/ip6tables -t nat -S OUTPUT --wait
85255      78247 /usr/sbin/iptables -t nat -N CNI-DN-9d924da12ab9b7dd58f9a --wait
85257      78248 /usr/sbin/iptables -t nat -S CNI-HOSTPORT-DNAT --wait
85260      78250 /usr/sbin/iptables -t nat -X CNI-DN-9d924da12ab9b7dd58f9a --wait
85262      78251 /usr/sbin/iptables -t nat -N CNI-SN-9d924da12ab9b7dd58f9a --wait
85264      78252 /usr/sbin/iptables -t nat -S CNI-HOSTPORT-SNAT --wait
85265      78253 /usr/sbin/iptables -t nat -X CNI-SN-9d924da12ab9b7dd58f9a --wait
85266      78254 /usr/sbin/ip6tables -t nat -N CNI-DN-9d924da12ab9b7dd58f9a --wait
85267      78255 /usr/sbin/ip6tables -t nat -S CNI-HOSTPORT-DNAT --wait
85268      78256 /usr/sbin/ip6tables -t nat -X CNI-DN-9d924da12ab9b7dd58f9a --wait
85269      78257 /usr/sbin/ip6tables -t nat -N CNI-SN-9d924da12ab9b7dd58f9a --wait
85270      78258 /usr/sbin/ip6tables -t nat -S CNI-HOSTPORT-SNAT --wait
85271      78259 /usr/sbin/ip6tables -t nat -X CNI-SN-9d924da12ab9b7dd58f9a --wait
88303      78290 /usr/sbin/iptables -t nat -S --wait

@bboreham
Copy link
Contributor

Great piece of detective work, Antonio! We've now merged your PR into the portmap plugin.

Can I ask: are there any issues filed to reduce the number of times Kubelet (or containerd) calls DEL on the same pod?

@aojea
Copy link
Member Author

aojea commented Jul 15, 2020

Great piece of detective work, Antonio! We've now merged your PR into the portmap plugin.

Thanks, we rolled out the change in the CI with KIND yesterday and random network failures stopped, so it seems is working ok

Can I ask: are there any issues filed to reduce the number of times Kubelet (or containerd) calls DEL on the same pod?

I opened one some time ago containerd/cri#1352
I think that since the DEL should be idempotent the runtimes prefer to be safe and run it to avoid sandbox leakage, but that's my guess

@bboreham
Copy link
Contributor

That CRI issue seems focused on whether the calls should be concurrent, whereas my point is that they shouldn't need more than one, for a given container.

@aojea
Copy link
Member Author

aojea commented Jul 15, 2020

That CRI issue seems focused on whether the calls should be concurrent, whereas my point is that they shouldn't need more than one, for a given container.

yeah, sorry,

That CRI issue seems focused on whether the calls should be concurrent, whereas my point is that they shouldn't need more than one, for a given container.

ups, sorry, no I'm aware of, I attached some kubelet logs here https://pastebin.com/9d83jMvj , that show that TearDown network is called 3 times from the kubelet, I didn't analyzed the events that are triggering the TearDowns that causes that the CRI configured send the DEL to the CNI

@aojea
Copy link
Member Author

aojea commented Jul 22, 2020

it was very useful 😄

@aojea aojea closed this Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants