New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][TEST} debug kubeproxy iptables lock #92811
Conversation
@aojea: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubernetes-e2e-kind-ipv6 |
ok, one interesting failure here
problem is kube-proxy does not install the nodeport rules, instead is rejecting the traffic to the nodeport
|
@BenTheElder FYI, I think that in kind we are being to agressive with iptables and this is causing contention when there is a big number of changes, that may explain the iptables lock issue diff between kube-prox GCE config
and KIND config
|
/retest |
/test pull-kubernetes-e2e-kind-ipv6 |
/test pull-kubernetes-e2e-kind-ipv6 |
Currently kube-proxy defaults the min-sync-period for iptables to 0. However, as explained by Dan Winship, "With minSyncPeriod: 0, you run iptables-restore 100 times. With minSyncPeriod: 1s , you run iptables-restore once. With minSyncPeriod: 10s , you also run iptables-restore once, but you might have to wait 10 seconds first"
the iptables monitor was using iptables -L to list the chains, without the -n option, so it was trying to do reverse DNS lookups. A side effect is that it was holding the lock, so other components could not use it. We can use -S instead of -L -n to avoid this, since we only want to check the chain exists. Also, avoid to hold the lock in this operation because we can fail fast and retry later. root@kind-control-plane:/# time iptables -t nat -L > /dev/null real 0m5.154s user 0m0.004s sys 0m0.005s root@kind-control-plane:/# time iptables -t nat -L -n > /dev/null real 0m0.002s user 0m0.000s sys 0m0.002s
/test pull-kubernetes-e2e-kind-ipv6 |
/test pull-kubernetes-e2e-kind |
/test pull-kubernetes-e2e-kind |
/test pull-kubernetes-e2e-kind-ipv6 |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aojea The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@aojea: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@BenTheElder and the winner holding the lock is .... the portmap plugin
|
@aojea: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
more findings about this, it seems that when a pod is deleted the portmap is executed 3 times, despite it does not have portmaps configured, if we add that to the fact that it executes iptables and ip6tables it means we hold the lock 6 times. how to repro
docker exec -it kind-control-plane cat /opt/cni/bin/portmap
#!/bin/sh
#!/bin/sh
echo $(date +"%T") >> /tmp/debug_portmap
echo "CNI: $CNI_COMMAND $CNI_IFNAME $CNI_NETNS $CNI_CONTAINERID" >> /tmp/debug_portmap
tee -a /tmp/debug_portmap | /opt/cni/bin/portmap.bin
echo "\n---------------" >> /tmp/debug_portmap
if we configure a hostPort
we can observe that is executed one time more with a delay
so, it seems containerd tears down the network multiple times
full log here if we log the number of iptables called issued using a tool like
|
Great piece of detective work, Antonio! We've now merged your PR into the portmap plugin. Can I ask: are there any issues filed to reduce the number of times Kubelet (or containerd) calls DEL on the same pod? |
Thanks, we rolled out the change in the CI with KIND yesterday and random network failures stopped, so it seems is working ok
I opened one some time ago containerd/cri#1352 |
That CRI issue seems focused on whether the calls should be concurrent, whereas my point is that they shouldn't need more than one, for a given container. |
yeah, sorry,
ups, sorry, no I'm aware of, I attached some kubelet logs here https://pastebin.com/9d83jMvj , that show that |
it was very useful 😄 |
No description provided.