Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy fails to restore iptable rules #82587

Closed
jsafrane opened this issue Sep 11, 2019 · 6 comments · Fixed by #82602

Comments

@jsafrane
Copy link
Member

commented Sep 11, 2019

What happened:
Running hack/local-up-cluster.sh, kube-proxy never succeeds calling iptables-restore --noflush --counters:

I0911 14:30:08.104508    1022 proxier.go:1420] Restoring iptables rules: *filter
:KUBE-SERVICES - [0:0]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FORWARD - [0:0]
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x00004000/0x00004000 -j ACCEPT
COMMIT
*nat
:KUBE-SERVICES - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SEP-SGYQAO2MFIR4HIEV - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
:KUBE-SEP-SNPTLXDNVSPZ5ND2 - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SEP-7PPXA5JT5ALVQPIV - [0:0]
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x00004000/0x00004000 -j MASQUERADE
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x00004000/0x00004000
-A KUBE-SERVICES -m comment --comment "default/kubernetes:https cluster IP" -m tcp -p tcp -d 10.0.0.1/32 --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-SGYQAO2MFIR4HIEV
-A KUBE-SEP-SGYQAO2MFIR4HIEV -s 192.168.122.150/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-SGYQAO2MFIR4HIEV -m tcp -p tcp -j DNAT --to-destination 192.168.122.150:6443
-A KUBE-SERVICES -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp -p udp -d 10.0.0.10/32 --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SVC-TCOU7JCQXEZGVUNU -j KUBE-SEP-SNPTLXDNVSPZ5ND2
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -m udp -p udp -j DNAT --to-destination 172.17.0.2:53
-A KUBE-SERVICES -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp -p tcp -d 10.0.0.10/32 --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SVC-ERIFXISQEP7F7OF4 -j KUBE-SEP-7PPXA5JT5ALVQPIV
-A KUBE-SEP-7PPXA5JT5ALVQPIV -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-7PPXA5JT5ALVQPIV -m tcp -p tcp -j DNAT --to-destination 172.17.0.2:53
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
COMMIT
I0911 14:30:08.104638    1022 iptables.go:397] running iptables-restore [--noflush --counters]
E0911 14:30:08.106159    1022 proxier.go:1423] Failed to execute iptables-restore: exit status 4 (Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
)

No matter how long I wait, kube-proxy always gets Another app is currently holding the xtables lock.

Adding -w to iptables-restore cmdline (via hacked version check) fixed the issue for me, but I am not able to judge what else it break (there must have been reason for #80368). Until now I was happy user of local-up-cluster.sh and I'd like have it working again.

Anything else we need to know?:

kernel-3.10.0-957.12.2.el7.x86_64
iptables-1.4.21-28.el7.x86_64
kubernetes v1.16.0-rc.1

/sig network

@danwinship, PTAL
cc @bertinatto

@danwinship

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

Can you test if the iptables-rhel-fix branch of https://github.com/danwinship/kubernetes fixes this for you?

@danwinship

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

So what's going on here: I had thought that if we removed the RHEL 7 special case code, then the existing non-RHEL7 fallback code would do the right thing (grabbing the xtables lock manually before calling iptables-restore) but the fallback code actually ends up breaking us, because the RHEL7 iptables-restore will also try to grab the xtables lock, which our fallback code already took, and so iptables-restore will fail.

There is no workaround; 1.16.0 is completely broken on RHEL 7 at the moment. (Note that this has nothing at all to do with the iptables legacy vs nft thing.)

The possible fixes, if we want to fix this for 1.16.0:

  1. Revert #80368. If we want to do this trivially it will require also reverting #78547 which builds on the cleanups from #80368. (And then we fix things correctly in master/1.16.1.)

  2. Add a trivial patch to assume that if you have iptables 1.4.21 then you're on RHEL 7 and we should behave the way we used to:

     func getIPTablesRestoreWaitFlag(version *utilversion.Version) []string {
            if version.AtLeast(WaitRestoreMinVersion) {
                    return []string{WaitString, WaitSecondsValue}
    +       } else if version.String() == "1.4.21" { // HACK for 1.16.0
    +               return []string{WaitString}
            } else {
                    return nil
            }
    

    Note that 1.4.21 is very old, and also not the last 1.4.x release, so there's little reason anyone would happen to have that exact release unless they were on RHEL 7, but if they did, and we took this approach, then it would mean that kube-proxy would often fail to be able to make an update (but it would succeed eventually).

@danwinship

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

Note that 1.4.21 is very old, and also not the last 1.4.x release, so there's little reason anyone would happen to have that exact release unless they were on RHEL 7

OK, that's not true. Apparently no one ever shipped 1.4.22. Debian Jessie (aka oldoldstable) has iptables 1.4.21, and is apparently LTS until 2020. Fedora 21-24 also shipped 1.4.21, but Fedora 24 end-of-lifed in 2017. No currently-supported version of Ubuntu ships 1.4.21. (The oldest LTS is xenial/16.04, which has iptables 1.6.0.)

So if we do the minimal hack above, it would make kube-proxy somewhat flaky for people running kubernetes 1.16.0 on Debian Jessie. I don't know if that is a non-empty set of people.

Of course, if we do the small hack for 1.16.0 we can do a larger patch for master and then 1.16.1 (reverting just the objectionable part of #80368).

@thockin

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

Is there no other signal we can use to disambiguate?

@jsafrane

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2019

@danwinship, I tested both #82596 and #82602, both work well on my local-up-cluster @ CentOS. Not tested with other distros.

@haircommander

This comment has been minimized.

Copy link

commented Sep 12, 2019

us in CRI-O have seen this problem in 1.15 as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.