kubelet flushes KUBE-HOSTPORTS and KUBE-MARK-MASQ when pods start #32415
Labels
area/kubelet
priority/critical-urgent
Highest priority. Must be actively worked on as someone's top priority right now.
Milestone
Observed: hitting a service nodePort fails intermittently, but the service cluster IP works 100%
Debug: tcpdump shows SYN sent, but no SYNACK returned. I noticed that in the error case, srcip was 127.0.0.1 - clearly wrong. We proved that the KUBE-MARK-MASQ chain was being flushed and so we were not getting SNAT'ed. We proved it was kubelet that was flushing, and kube-proxy that eventually restored it (yay for rectification loops!).
We found the hostport code in kubenet erroneously flushes those chains when starting a pod. After that it can take up to several minutes for kube-proxy to hit its own sync loop and fix the problem.
The fix is easy - don't flush those chains. @freehan is working on the fix right now.
@fabioy @timstclair for 1.3.x
@pwittrock for 1.4.x
@spxtr for reporting it concretely enough to repro
The text was updated successfully, but these errors were encountered: