New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid iptables lock error for endpoint port mapping #44330
Conversation
On Ubuntu Focal the iptables implementation returns immediately with an error if the xtables lock is held. This is a change in behavior that has exposed a latent issue in Docker's implementation. After moving to Focal hosts in our CI environment, contention on the iptables xtable lock is leading to the following error message (these failures are easy to trigger and frequent): ``` Error response from daemon: driver failed programming external connectivity on endpoint foo (4ee313e3bf3f375e70c3ee5d00b9000523cb80e5fb29b9ccff565913c74ab6ec): (iptables failed: iptables -t nat -A POSTROUTING -p tcp -s 172.16.128.7 -d 172.16.128.7 --dport 8081 -j MASQUERADE: Another app is currently holding the xtables lock. Perhaps you want to use the -w option? ``` The source of that error message is endpoint.sbJoin in libnetwork/endpoint.go. It calls: ``` endpoint.sbJoin driver.ProgramExternalConnectivity (libnetwork/drivers/bridge/bridge.go) bridgeNetwork.allocatePorts bridgeNetwork.allocatePortsInternal bridgeNetwork.allocatePort PortMapper.MapRange PortMapper.AppendForwardingTableEntry PortMapper.Forward ChainInfo.Forward iptable.ProgramRule iptable.Exist iptable.exists iptable.existsRaw ``` `iptable.raw` calls iptables with `--wait` if supported, but `iptable.existsRaw` does not. The invocation of iptables in `existsRaw` has not changed since 2017, but the OS behavior has. In Ubuntu 20.04 (Focal) the call to `iptables -S` will fail immediately if the lock is held by another process, this is not the case in earlier (Bionic) or later (Jammy) releases. This patch adds the `--wait` option to this invocation of iptable (or if that is not supported the `bestEffortLock` is used) when listing existing rules. Signed-off-by: Robert C Jennings <rcj4747@gmail.com> Fixes: #44331
Local testing shows that this fix for #44331 is incomplete. With the added Looking again at the error message I see that this is coming from
The code to initialize I will test this on Monday in our production environment to confirm my suspicion. |
Thanks for your PR! I had a quick glance, and changes in the PR looked good; I only wanted to double-check if the For the iptables init; ISTR there was another PR touching that part. Let me see if I can find that one (and why it wasn't merged yet, or perhaps it was, and it made another change). |
Ah! I think this was the one; #43060 It's been a while since I looked at that one, and if there were still remaining changes to be addressed (reviews are always welcome; more eyes never hurt!) |
@rcj4747 looks like your last two commits are missing a DCO sign-off (which makes CI fail) could you amend those commits? (let me know if you need help with "rebase" / "amend" instructions 👍 ) |
The check for the optional ip6tables executable short-circuits other initialization using the iptables command. This patch allows the other initialization to occur even after ip6tables is not found. Signed-off-by: Robert C Jennings <rcj4747@gmail.com>
ip6tables is treated as an option executable, unlike the iptables command, but there are no guards against calling exec.Command with an empty string for the executable name if an ipv6 rule is provided. This patch adds checks where ip6tablesPath is referenced to allow for more graceful and helpful failure. Signed-off-by: Robert C Jennings <rcj4747@gmail.com>
@thaJeztah I fixed the commits up with sign-offs but I had pushed them initially before I saw your comment about #43060. That PR would also seemingly address the issue handled in these last 2 commits. How would you like to proceed? |
I did my best to search for this as well and I did not find any chances for deadlock. It's taken just before the |
No worries! Let's keep them in for now (we can always drop them if we decide to go for the other PR). Let me try to get more eyes on both to see what direction we'll go. Might not be Today, but I'll try to get back to this (in case I don't; don't hesitate to give me a |
@thaJeztah, I like the direction #43060 is taking and I'd like to drop the 2 patches I added that overlap with that PR. We're seeing iptables lock contention when running |
(close/reopen to re-trigger (and re-load) github actions) |
Hey @rcj4747, as you discovered the real issue lies in Docker early exiting As mentioned in my PR (#43060), all distributions supported by Docker are now shipping iptables with Same applies to your PR: since you're fixing ip6tables detection in your PR, Docker doesn't enter |
Fixes: #44331 Run failure due to iptables xtables lock contention
On Ubuntu Focal the iptables implementation returns immediately with an
error if the xtables lock is held. This is a change in behavior that
has exposed a latent issue in Docker's implementation.
After moving to Focal hosts in our CI environment, contention on the
iptables xtable lock is leading to the following error message (these
failures are easy to trigger and frequent):
The source of that error message is endpoint.sbJoin in
libnetwork/endpoint.go. It calls:
iptable.raw
calls iptables with--wait
if supported, butiptable.existsRaw
does not. The invocation of iptables inexistsRaw
has not changed since 2017, but the OS behavior has. In Ubuntu 20.04
(Focal) the call to
iptables -S
will fail immediately if the lock isheld by another process, this is not the case in earlier (Bionic) or
later (Jammy) releases.
This patch adds the
--wait
option to this invocation of iptable (orif that is not supported the
bestEffortLock
is used) when listingexisting rules.
Signed-off-by: Robert C Jennings rcj4747@gmail.com
- What I did
Addressed the iptables lock contention
- How I did it
Added the same handling for
iptable.existsRaw
as is found iniptable.raw
; specifically yhis patch adds the--wait
option to this invocation of iptable (or if that is not supported thebestEffortLock
is used) when listing existing rules.- How to verify it
On Ubuntu 20.04 (Bionic) you can explicitly take the lock with(edit: running flock will block the other iptables calls which wait for the lock, you need contention and timing to recreate)flock /run/xtables.lock sleep 60&
and run a container with port mappings.- Description for the changelog
Wait for iptables lock when listing existing rules with bridge networks