update images to Debian buster, detect iptables mode #82966

danwinship · 2019-09-21T14:33:06Z

What this PR does / why we need it:
On systems running iptables 1.8, containers that run iptables in the root network namespace (such as our kube-proxy image) need to run iptables in the same mode (legacy / nft) as the base system. As of 1.16 our solution was "you have to run the base system in legacy mode"; this makes us able to support either mode (and provides a template for other image creators to follow, although we'll want to provide some slightly-more-generic scripts for them eventually).

Which issue(s) this PR fixes:
Fixes #71305
Fixes #81729

Does this PR introduce a user-facing change?:

The official kube-proxy image (used by kubeadm, among other things) is now
compatible with systems running iptables 1.8 in "nft" mode, and will autodetect
which mode it should use.

/kind feature
/sig network
/priority important-soon

danwinship · 2019-09-21T15:09:25Z

/assign @thockin

I started with your branch, and reverted the changes to the debian-iptables build; we don't need the nftables package, because iptables includes both iptables-legacy and iptables-nft.

My original plan had been to install a script to detect the mode which we could run before starting kube-proxy, but the process for building the kube-proxy, etc, container images isn't really set up to deal with having a wrapper/init script...

So then I realized that if I made the iptables binaries themselves be the detect-and-run-update-alternatives script, then we wouldn't need to modify the kube-proxy image at all (and in fact, any image based on the debian-iptables image would automatically work as expected). So that's the current version. I tested the debian-iptables image in isolation but not as part of kube-proxy yet because I was having trouble building the release images.

danw@p50:debian-iptables (iptables-nft)> sudo docker run --network=host --privileged -it k8s.gcr.io/debian-iptables-amd64:v12.0.0 /bin/sh
# ls -l /usr/sbin/iptables /etc/alternatives/iptables
lrwxrwxrwx 1 root root 26 Sep 21 13:33 /usr/sbin/iptables -> /etc/alternatives/iptables
lrwxrwxrwx 1 root root 26 Sep 21 14:54 /etc/alternatives/iptables -> /usr/sbin/iptables-wrapper
# iptables -C INPUT -m comment --comment "test" -j ACCEPT
iptables: Bad rule (does a matching rule exist in that chain?).
# iptables -A INPUT -m comment --comment "test" -j ACCEPT
# ls -l /etc/alternatives/iptables
lrwxrwxrwx 1 root root 25 Sep 21 15:07 /etc/alternatives/iptables -> /usr/sbin/iptables-legacy

praseodym · 2019-09-21T15:29:22Z

Will this be affected by the iptables 1.8.2 bug described in #82361?

danwinship · 2019-09-21T18:49:21Z

kube-proxy doesn't use iptables -C, so it's not affected by the bug from #82361. So if you have a base system with iptables 1.8.3, you could safely use this iptables 1.8.2-based kube-proxy image on it in nft mode. But if other people are using our debian-iptables base image for other things, they might run into the problem. But hopefully that should all be fixed in debian by the time kube 1.17 goes out.

BenTheElder

clever approach :-)

BenTheElder · 2019-09-28T05:10:40Z

build/debian-base/Dockerfile.build

    ncurses-base \
    ncurses-bin \
-    systemd \
-    systemd-sysv \
-    sysv-rc \


changes in this file seem unrelated?

That's from Tim's commit changing the base from stretch to buster; the exact set of packages that gets installed by default is different and some of the ones that were getting purged from the image before either don't exist or can't be removed now. (Though there are probably additional packages that could be added to the purge list now.)

Exactly. My POC commit was hack-and-slack to get something that builds. I am not an apt expert and I don't really know where this list came from, so SOMEONE would have to do it "right".

Have we identified anyone who ACTUALLY knows how to PROPERLY reproduce this process?

@tallclair ?

The original process for coming up with this list was going through all the installed dependencies (apt list --installed) and trying to remove anything that didn't seem important (for a container, e.g. no need for an init system).

The original motivation for this was 2 fold:

reduce the size of the image, though I now think this is somewhat irrelevant as the base layer is shared among containers that use it, and the application layer tends to dwarf the base layer (at least in the fluentd case).

reduce the vulnerability scanner noise. This is the more significant motivation (though I may be biased). Fewer deps == fewer times we need to bump the image to pick up a vulnerability fix in an irrelevant dependency.

The tradeoff is that by removing base dependencies, we're more likely to hit untested corner cases, and it's harder to update the image (as we see here).

/cc @rphillips - did the last major version bump

for reference, this is the PR with the last major version bump
#52744

We could also build the image up (i.e. adding packages to a scratch image), using bazel. The downside is that we can't run the postinstall scripts, so e.g. if we need special users we have to create them separately. But ... it's very doable if we have a list of the packages we rely on.

aojea · 2019-09-29T14:35:54Z

I think this fixes #81729 too

BenTheElder · 2019-09-30T03:47:39Z

this looks good to me, happy to help push the image(s) when this is ready - feel free to shoot me a ping

aojea · 2019-09-30T07:04:17Z

build/debian-base/Makefile

@@ -33,22 +33,22 @@ SUDO=$(if $(filter 0,$(shell id -u)),,sudo)
 export DOCKER_CLI_EXPERIMENTAL := enabled

 ifeq ($(ARCH),amd64)
-	BASEIMAGE?=debian:stretch
+	BASEIMAGE?=debian:buster


seems there is a slim flavour of buster in the docker registry with half of the size buster-slim25.84 MB vs buster48.05 MB https://hub.docker.com/_/debian/?tab=tags&page=1&name=buster
is it worth to try it?

Possibly... This is sort of the same issue as above with the apt-get purge. People can submit follow-up commits to slim things down further if we don't get it perfect here.

yes. I don't know who did the debian-base work before, but we probably need a community ownership and docs on how to repeat the process for new debian releases

@tallclair was it you?

@tallclair ?

Looks like the slim version is mostly redundant with the directories we remove, but I think moving to that base can't hurt.

I agree we need a better process for this base image. I'm tempted to say that we should get out of the image maintenance game and just scrap this for buster-slim, but I worry about how much noise we'll end up with from the vulnerability scanners.

That would be somewhat mitigated if we do a full rebuild from the latest upstream base on every release. E.g. rebuild the intermediate images (debian-iptables) as part of the release process.

sorry if this is orthogonal or if this was previously discussed but, based on the image size and security concerns, why using debian instead of alpine?, per example #84420

danwinship · 2019-09-30T11:19:08Z

this looks good to me, happy to help push the image(s) when this is ready - feel free to shoot me a ping

OK. I'm not totally sure what needs to be done here; it seems like pull-kubernetes-cross won't even pass completely until the images already exist? If you know what needs to be done there though, and you're happy with the general build changes, and @thockin is happy with the iptables-specific changes (which he hasn't commented on yet), then I think this is ready to go?

danwinship · 2019-09-30T11:49:04Z

But if other people are using our debian-iptables base image for other things, they might run into the problem. But hopefully that should all be fixed in debian by the time kube 1.17 goes out.

Comments in #82361 suggest that iptables 1.8.3 might not actually be backported to buster, but that it's considered legitimate to pull single packages from buster-backports, which is just a small change:

+# Install latest iptables package from buster-backports
+RUN echo deb http://deb.debian.org/debian buster-backports main >> /etc/apt/sources.list; \
+    apt-get update; \
+    apt-get -t buster-backports -y --no-install-recommends install iptables

BenTheElder · 2019-10-03T17:55:18Z

build/workspace.bzl

@@ -110,7 +110,7 @@ def debian_image_dependencies():
            digest = _digest(_DEBIAN_BASE_DIGEST, arch),
            registry = "k8s.gcr.io",
            repository = "debian-base",
-            tag = "0.4.1",  # ignored, but kept here for documentation
+            tag = "1.0.0",  # ignored, but kept here for documentation


i'm pretty sure the shas above need to be updated, but that is pointless until we push the images anyhow

(lines 91-96, not sure how we are populating these exactly)

the debian base and debian-iptables digests maps above need to be updated with the values in
#82966 (comment)
#82966 (comment)
these are in the same order except the overall manifest is at the bottom.
amd64, arm, arm64, ppc64le, s390x. the digest is the hash at the end of the line after "digest" 🙃

also, this is v2.0.0 now, right? (even if it is ignored)

BenTheElder · 2019-10-03T17:57:12Z

when the base image changes are approved, @thockin or I (or another googler) can push & promote these and then we need to update the references to match them.

in the future image promotion & pushing will run under CNCF infra and build will generally be automated, but that switch hasn't been flipped yet..

thockin · 2019-10-11T20:31:54Z

build/debian-base/Dockerfile.build

    ncurses-base \
    ncurses-bin \
-    systemd \
-    systemd-sysv \
-    sysv-rc \


Exactly. My POC commit was hack-and-slack to get something that builds. I am not an apt expert and I don't really know where this list came from, so SOMEONE would have to do it "right".

thockin · 2019-10-11T20:32:59Z

build/debian-base/Makefile

@@ -33,22 +33,22 @@ SUDO=$(if $(filter 0,$(shell id -u)),,sudo)
 export DOCKER_CLI_EXPERIMENTAL := enabled

 ifeq ($(ARCH),amd64)
-	BASEIMAGE?=debian:stretch
+	BASEIMAGE?=debian:buster


yes. I don't know who did the debian-base work before, but we probably need a community ownership and docs on how to repeat the process for new debian releases

thockin · 2019-10-11T20:33:11Z

build/debian-base/Makefile

@@ -33,22 +33,22 @@ SUDO=$(if $(filter 0,$(shell id -u)),,sudo)
 export DOCKER_CLI_EXPERIMENTAL := enabled

 ifeq ($(ARCH),amd64)
-	BASEIMAGE?=debian:stretch
+	BASEIMAGE?=debian:buster


@tallclair was it you?

thockin · 2019-10-11T20:37:04Z

build/debian-iptables/iptables-wrapper.sh

+# Detect whether the base system is using iptables-legacy or
+# iptables-nft. This assumes that some non-containerized process (eg
+# kubelet) has already created some iptables rules.
+num_legacy_lines=$( (iptables-legacy-save || true; ip6tables-legacy-save || true) 2>/dev/null | wc -l)


what about not ip6 enabled systems? Will this error?

yes; that's what the "|| true" is for.
I'm continuing to work on https://github.com/danwinship/iptables-wrappers/ and one of the iptables hackers suggested we should add " | grep -v "^$[*:#]\|COMMIT$"" here too so we're only comparing actual rules and don't get tricked if there are just lots of tables. (Although I think that's probably not really a problem since we know kubelet will create like 8 rules or so before this script ever runs, so even if all the default tables exist in the "wrong" version, we should still be OK.)

any reason not to check for actual rules?

thockin

My original PR edited the debian-iptables/Dockerfile to include nftables - is that not needed?

thockin · 2019-10-11T20:39:56Z

build/debian-iptables/iptables-wrapper.sh

+    mode=nft
+fi
+
+update-alternatives --set iptables "/usr/sbin/iptables-${mode}" > /dev/null


Is there a reason to update-alternatives vs just execing the correct mode binary directly?

It moves the wrapper script out of the way so that all future iptables calls will go directly to the correct binary. (We don't want kube-proxy to have to run "iptables-save" an extra time to figure out the correct mode every time it pushes an update.)

danwinship · 2019-10-12T17:24:10Z

My original PR edited the debian-iptables/Dockerfile to include nftables - is that not needed?

It's not needed. The nftables package contains the non-iptables-API-emulating nft binary, but iptables-nft doesn't use that.

This feature is required by any OS that enables nft based iptables i.e. Debian >=10, CentOS 8.1 that have iptables >=1.8. This solution is based on kube-proxy fix to the same issue: kubernetes/kubernetes#82966 Bumped cilium-runtime base image to Ubuntu 20.04 that has iptables 1.8.4 Removed libgcc-5-dev which is not available in 20.04. Added iptables-wrapper that handles iptables mode detection. Signed-off-by: Maciej Skrocki <maciejskrocki@google.com>

debian-iptables container transparently select iptables-legacy or iptables-nft since v12.0.0: kubernetes/kubernetes#82966 Signed-off-by: Etienne Champetier <etienne.champetier@anevia.com>

The original iptables-wrapper script is coming from [1], however, this was spinned off to [2] in k8s upstream repo. This commit is to get the latest iptables-wrapper script. [1]: kubernetes/kubernetes#82966 [2]: https://github.com/kubernetes-sigs/iptables-wrappers/blob/master/iptables-wrapper-installer.sh Signed-off-by: Tam Mach <tam.mach@cilium.io>

Cilium currently, chooses to use iptables-legacy or iptables-nft using an iptables-wrapper script. The script currently does a simple check to see if there are more than 10 rules in iptables-legacy and if so picks legacy mode. Otherwise it will pick whichever has more rules nft or legacy. See [1] for the original wrapper this is taken from. This however can be problematic in some cases. We've hit an environment where arguably broken pods are inserting rules directly into iptables without checking legacy or nft. This can happen in cases of pods that are older for example and use an older package of iptables before 1.8.4 that was buggy or missing nft altogether. At any rate when this happens it becomes a race to see what pods come online first and insert rules into the table and if its greater than 10 cilium will flip into legacy mode. This becomes painfully obvious if the agent is restarted after the system has been running and these buggy pods already created their rules. At this point Cilium may be using legacy while kube-proxy and kubelet are running in nft space. (more on why this is bad below). We can quickly check this from a sysdump with a few one liners, $ find . -name iptables-nft-save* | xargs wc -l 1495 ./cilium-bugtool-cilium-1234/cmd/iptables-nft-save--c.md $ find . -name iptables-save* | xargs wc -l 109 ./cilium-bugtool-cilium-1234/cmd/iptables-save--c.md here we see that a single node has a significant amount of rules in both nft and legacy tables. In the above example we dove into the legacy table and found the normal CILIUM-* chains and rules. Then in the nft tables we see the standard KUBE-PROXY-* chains and rules. Another scenario where we can create a similar problem is with an old kube-proxy. In this hypothetical scenario the user upgrades to a new distribution/kernel with a base iptables image that points to iptables-nft. This will cause kubelet to use nft tables, but because of the older version of kube-proxy it may use iptables. Now kubelet and kube-proxy are out of sync. Now how should Cilium pick nft or legacy? Lets analyze the two scenarios. Assume Cilium and Kube-proxy pick differently. First we might ask what runs first nft or iptables. From the kernel side its unclear to me. The hooks are run walking an array but, it appears those hooks are registered at runtime. So its up to which hooks register first. And hooks register at init so now we are left wondering which of nft or legacy registers first. This may very well depend on if iptables-legacy or iptables-nft runs first because the init of the module is done on demand with a request_module helper. So bottom line ordering is fragile at best. For this discussion lets assume we can't make any claims on if nft or iptables runs first. Next, lets assume kube-proxy is in nft and Cilium is in legacy and nft runs first. Now this will break Cilium's expectation that the rules for Cilium are run before kube-proxy and any other iptables rules. The result can be drops in the datapath. The example that lead us on this adventure is IPSEC traffic hit a kube-proxy -j DROP rule because it never ran the Cilium -j ACCEPT rule we expected to be inserted into the front of the chain. So clearly this is no good. Just to cover our cases, consider Cilium is run first and then kube-proxy is run. Well we are still stuck from kernel code side the hooks are executed in a for loop over the hooks and an ACCEPT will run the next hook instead of the normal accept the skb and do not run any further rules. The next hook in this case will have the kube-proxy rules and we hit the same -j DROP rule again. Finally because we can't depend on the order of nft vs legacy running it doesn't matter if cilium and kube proxy flip to put cilium on nft and kube-proxy on legacy. We get the same problem. Because Cilium and kube-proxy are coupled in that they both manage iptables for datapath flows they need to be on the same hook. We could try to do this by doing [2] and following kubelet AND assuming kube-proxy does the same everything should be OK. The problem is if kube-proxy is not updated and doesn't follow kubelet we again get stuck with Cilium and kube-proxy using different hooks. To fix this case modify [2] so that Cilium follows kube-proxy instead of following kubelet. This will force cilium and kube-proxy to at least choose the same hook and avoid the faults outlined above. There is a corner case if kube-proxy is not up before cilium, but experimentally it seems kube-proxy is started close to kubelet and init paths so is in fact up before cilium making this ok. If we ever need to verify this in sysdump we can check startAt times in the k8s-pod.yaml to confirm the start ordering of pods. For reference The original iptables-wrapper script the Cilium used previous to this patch is coming from [1]. This patch is based off of the new wrapper [2] in k8s upstream repo. [1]: kubernetes/kubernetes#82966 [2]: https://github.com/kubernetes-sigs/iptables-wrappers/blob/master/iptables-wrapper-installer.sh Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

[ upstream commit 369f3f9 ] Cilium currently, chooses to use iptables-legacy or iptables-nft using an iptables-wrapper script. The script currently does a simple check to see if there are more than 10 rules in iptables-legacy and if so picks legacy mode. Otherwise it will pick whichever has more rules nft or legacy. See [1] for the original wrapper this is taken from. This however can be problematic in some cases. We've hit an environment where arguably broken pods are inserting rules directly into iptables without checking legacy or nft. This can happen in cases of pods that are older for example and use an older package of iptables before 1.8.4 that was buggy or missing nft altogether. At any rate when this happens it becomes a race to see what pods come online first and insert rules into the table and if its greater than 10 cilium will flip into legacy mode. This becomes painfully obvious if the agent is restarted after the system has been running and these buggy pods already created their rules. At this point Cilium may be using legacy while kube-proxy and kubelet are running in nft space. (more on why this is bad below). We can quickly check this from a sysdump with a few one liners, $ find . -name iptables-nft-save* | xargs wc -l 1495 ./cilium-bugtool-cilium-1234/cmd/iptables-nft-save--c.md $ find . -name iptables-save* | xargs wc -l 109 ./cilium-bugtool-cilium-1234/cmd/iptables-save--c.md here we see that a single node has a significant amount of rules in both nft and legacy tables. In the above example we dove into the legacy table and found the normal CILIUM-* chains and rules. Then in the nft tables we see the standard KUBE-PROXY-* chains and rules. Another scenario where we can create a similar problem is with an old kube-proxy. In this hypothetical scenario the user upgrades to a new distribution/kernel with a base iptables image that points to iptables-nft. This will cause kubelet to use nft tables, but because of the older version of kube-proxy it may use iptables. Now kubelet and kube-proxy are out of sync. Now how should Cilium pick nft or legacy? Lets analyze the two scenarios. Assume Cilium and Kube-proxy pick differently. First we might ask what runs first nft or iptables. From the kernel side its unclear to me. The hooks are run walking an array but, it appears those hooks are registered at runtime. So its up to which hooks register first. And hooks register at init so now we are left wondering which of nft or legacy registers first. This may very well depend on if iptables-legacy or iptables-nft runs first because the init of the module is done on demand with a request_module helper. So bottom line ordering is fragile at best. For this discussion lets assume we can't make any claims on if nft or iptables runs first. Next, lets assume kube-proxy is in nft and Cilium is in legacy and nft runs first. Now this will break Cilium's expectation that the rules for Cilium are run before kube-proxy and any other iptables rules. The result can be drops in the datapath. The example that lead us on this adventure is IPSEC traffic hit a kube-proxy -j DROP rule because it never ran the Cilium -j ACCEPT rule we expected to be inserted into the front of the chain. So clearly this is no good. Just to cover our cases, consider Cilium is run first and then kube-proxy is run. Well we are still stuck from kernel code side the hooks are executed in a for loop over the hooks and an ACCEPT will run the next hook instead of the normal accept the skb and do not run any further rules. The next hook in this case will have the kube-proxy rules and we hit the same -j DROP rule again. Finally because we can't depend on the order of nft vs legacy running it doesn't matter if cilium and kube proxy flip to put cilium on nft and kube-proxy on legacy. We get the same problem. Because Cilium and kube-proxy are coupled in that they both manage iptables for datapath flows they need to be on the same hook. We could try to do this by doing [2] and following kubelet AND assuming kube-proxy does the same everything should be OK. The problem is if kube-proxy is not updated and doesn't follow kubelet we again get stuck with Cilium and kube-proxy using different hooks. To fix this case modify [2] so that Cilium follows kube-proxy instead of following kubelet. This will force cilium and kube-proxy to at least choose the same hook and avoid the faults outlined above. There is a corner case if kube-proxy is not up before cilium, but experimentally it seems kube-proxy is started close to kubelet and init paths so is in fact up before cilium making this ok. If we ever need to verify this in sysdump we can check startAt times in the k8s-pod.yaml to confirm the start ordering of pods. For reference The original iptables-wrapper script the Cilium used previous to this patch is coming from [1]. This patch is based off of the new wrapper [2] in k8s upstream repo. [1]: kubernetes/kubernetes#82966 [2]: https://github.com/kubernetes-sigs/iptables-wrappers/blob/master/iptables-wrapper-installer.sh [ Backport notes: Conflict on file images/runtime/configure-iptables-wrapper.sh, due to the copyright year being removed in 1.12 in commit 17a78a2 ("images: remove copyright year from copyright notices in source files") ] Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit 369f3f9 ] Cilium currently, chooses to use iptables-legacy or iptables-nft using an iptables-wrapper script. The script currently does a simple check to see if there are more than 10 rules in iptables-legacy and if so picks legacy mode. Otherwise it will pick whichever has more rules nft or legacy. See [1] for the original wrapper this is taken from. This however can be problematic in some cases. We've hit an environment where arguably broken pods are inserting rules directly into iptables without checking legacy or nft. This can happen in cases of pods that are older for example and use an older package of iptables before 1.8.4 that was buggy or missing nft altogether. At any rate when this happens it becomes a race to see what pods come online first and insert rules into the table and if its greater than 10 cilium will flip into legacy mode. This becomes painfully obvious if the agent is restarted after the system has been running and these buggy pods already created their rules. At this point Cilium may be using legacy while kube-proxy and kubelet are running in nft space. (more on why this is bad below). We can quickly check this from a sysdump with a few one liners, $ find . -name iptables-nft-save* | xargs wc -l 1495 ./cilium-bugtool-cilium-1234/cmd/iptables-nft-save--c.md $ find . -name iptables-save* | xargs wc -l 109 ./cilium-bugtool-cilium-1234/cmd/iptables-save--c.md here we see that a single node has a significant amount of rules in both nft and legacy tables. In the above example we dove into the legacy table and found the normal CILIUM-* chains and rules. Then in the nft tables we see the standard KUBE-PROXY-* chains and rules. Another scenario where we can create a similar problem is with an old kube-proxy. In this hypothetical scenario the user upgrades to a new distribution/kernel with a base iptables image that points to iptables-nft. This will cause kubelet to use nft tables, but because of the older version of kube-proxy it may use iptables. Now kubelet and kube-proxy are out of sync. Now how should Cilium pick nft or legacy? Lets analyze the two scenarios. Assume Cilium and Kube-proxy pick differently. First we might ask what runs first nft or iptables. From the kernel side its unclear to me. The hooks are run walking an array but, it appears those hooks are registered at runtime. So its up to which hooks register first. And hooks register at init so now we are left wondering which of nft or legacy registers first. This may very well depend on if iptables-legacy or iptables-nft runs first because the init of the module is done on demand with a request_module helper. So bottom line ordering is fragile at best. For this discussion lets assume we can't make any claims on if nft or iptables runs first. Next, lets assume kube-proxy is in nft and Cilium is in legacy and nft runs first. Now this will break Cilium's expectation that the rules for Cilium are run before kube-proxy and any other iptables rules. The result can be drops in the datapath. The example that lead us on this adventure is IPSEC traffic hit a kube-proxy -j DROP rule because it never ran the Cilium -j ACCEPT rule we expected to be inserted into the front of the chain. So clearly this is no good. Just to cover our cases, consider Cilium is run first and then kube-proxy is run. Well we are still stuck from kernel code side the hooks are executed in a for loop over the hooks and an ACCEPT will run the next hook instead of the normal accept the skb and do not run any further rules. The next hook in this case will have the kube-proxy rules and we hit the same -j DROP rule again. Finally because we can't depend on the order of nft vs legacy running it doesn't matter if cilium and kube proxy flip to put cilium on nft and kube-proxy on legacy. We get the same problem. Because Cilium and kube-proxy are coupled in that they both manage iptables for datapath flows they need to be on the same hook. We could try to do this by doing [2] and following kubelet AND assuming kube-proxy does the same everything should be OK. The problem is if kube-proxy is not updated and doesn't follow kubelet we again get stuck with Cilium and kube-proxy using different hooks. To fix this case modify [2] so that Cilium follows kube-proxy instead of following kubelet. This will force cilium and kube-proxy to at least choose the same hook and avoid the faults outlined above. There is a corner case if kube-proxy is not up before cilium, but experimentally it seems kube-proxy is started close to kubelet and init paths so is in fact up before cilium making this ok. If we ever need to verify this in sysdump we can check startAt times in the k8s-pod.yaml to confirm the start ordering of pods. For reference The original iptables-wrapper script the Cilium used previous to this patch is coming from [1]. This patch is based off of the new wrapper [2] in k8s upstream repo. [1]: kubernetes/kubernetes#82966 [2]: https://github.com/kubernetes-sigs/iptables-wrappers/blob/master/iptables-wrapper-installer.sh Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

update images to Debian buster, detect iptables mode Conflicts: build/common.sh build/workspace.bzl

k8s-ci-robot requested review from lavalamp and mkumatag September 21, 2019 14:34

danwinship force-pushed the iptables-nft branch from 9053ac2 to bddb4ba Compare September 21, 2019 14:59

k8s-ci-robot assigned thockin Sep 21, 2019

danwinship force-pushed the iptables-nft branch from bddb4ba to e709ac0 Compare September 21, 2019 18:43

BenTheElder reviewed Sep 28, 2019

View reviewed changes

aojea reviewed Sep 30, 2019

View reviewed changes

danwinship changed the title ~~WIP update images to Debian buster, detect iptables mode~~ update images to Debian buster, detect iptables mode Sep 30, 2019

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 30, 2019

BenTheElder reviewed Oct 3, 2019

View reviewed changes

thockin reviewed Oct 11, 2019

View reviewed changes

danwinship force-pushed the iptables-nft branch 2 times, most recently from d9986cc to b1d283b Compare October 17, 2019 11:56

BenTheElder mentioned this pull request May 14, 2020

[1.16] base-images: Use debian-base:v2.1.0 and debian-iptables:v12.1.0 (includes CVE fixes) #90940

Merged

aramase mentioned this pull request Jun 24, 2020

1.16.11 hyperkube assumes nf_tables for kube-proxy Azure/aks-engine#3529

Closed

mogren mentioned this pull request Jun 26, 2020

Default to random-fully aws/amazon-vpc-cni-k8s#1048

Merged

ffuerste mentioned this pull request Aug 13, 2020

race condition for pod starting order results in not working clusterIPs in case nf_tables must be used kubermatic/kubeone#1037

Closed

danwinship mentioned this pull request Jun 2, 2021

Umbrella issue: Reduce release image sizes #102493

Closed

hhyasdf mentioned this pull request Jul 15, 2021

iptables not correctly configured on CentOS 8 host alibaba/hybridnet#29

Closed

Poorunga mentioned this pull request Aug 12, 2021

Cannot see the iptables EDGE-MESH chain on centos8 kubeedge/edgemesh#30

Closed

Poorunga mentioned this pull request Mar 2, 2022

add ability to detect iptables mode in edgemesh-agent image kubeedge/edgemesh#316

Merged

sayboras mentioned this pull request May 24, 2022

image/runtime: Update iptables-wrapper script cilium/cilium#19937

Closed

This was referenced Sep 26, 2022

add ability to detect iptables mode in iptablesmanager image kubeedge/kubeedge#4236

Closed

add ability to detect iptables mode in iptablesmanager image kubeedge/kubeedge#4239

Closed

rodrigc mentioned this pull request Jun 5, 2023

Program routes and policy rules using netlink, not iptables binary tailscale/tailscale#391

Open

tengattack pushed a commit to tengattack/kubernetes that referenced this pull request Nov 22, 2023

Merge pull request kubernetes#82966 from danwinship/iptables-nft

6c3bf55

update images to Debian buster, detect iptables mode Conflicts: build/common.sh build/workspace.bzl

update images to Debian buster, detect iptables mode #82966

update images to Debian buster, detect iptables mode #82966

Conversation

danwinship commented Sep 21, 2019 • edited

danwinship commented Sep 21, 2019

praseodym commented Sep 21, 2019

danwinship commented Sep 21, 2019

BenTheElder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinsb Nov 18, 2019 • edited

Choose a reason for hiding this comment

aojea commented Sep 29, 2019

BenTheElder commented Sep 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea Oct 27, 2019 • edited

Choose a reason for hiding this comment

danwinship commented Sep 30, 2019

danwinship commented Sep 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenTheElder commented Oct 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship Oct 12, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship commented Oct 12, 2019

danwinship commented Sep 21, 2019 •

edited

justinsb Nov 18, 2019 •

edited

aojea Oct 27, 2019 •

edited

danwinship Oct 12, 2019 •

edited