Multiple runs of istio-init container leads to crashloopbackoff status #42792

RaiAnandKr · 2023-01-12T16:30:45Z

Bug Description

We are testing Istio in a small test setup before expanding it into our production cluster. With sidecar injection, in a pod, istio-init container completes just fine for the first time, setting up all the iptables rules and the sidecar proxy thereafter works fine too.
However, for reasons that we are "aware" of, the istio-init containers re-run without the pod restarting or anything. We run docker system prune -f as a cron which removes the exited istio-init container and that is tipping off Kubelet to start the init container again. This situation is much discussed in the past at kubernetes/kubernetes#67261 for e.g.

Note: even though the container and the pod shows Init:CrashLoopBackOff status post that, the app container and proxy container keep on working fine but it's throwing us off.

However I was under the impression that istio-init runs are idempotent and a comment from around 3 years back indicates the same: #18159 (comment)
Is that not the case anymore? We still are running iptables with --noflush flag which makes this whole thing idempotent as per that comment.

All the first runs of istio-init container across pods are completing just fine but every run from the second run onward is failing. Here is the log of istio-init container from a failed run:

2023-01-12T07:42:54.205490Z	info	Istio iptables environment:
ENVOY_PORT=
INBOUND_CAPTURE_PORT=
ISTIO_INBOUND_INTERCEPTION_MODE=
ISTIO_INBOUND_TPROXY_ROUTE_TABLE=
ISTIO_INBOUND_PORTS=
ISTIO_OUTBOUND_PORTS=
ISTIO_LOCAL_EXCLUDE_PORTS=
ISTIO_EXCLUDE_INTERFACES=
ISTIO_SERVICE_CIDR=
ISTIO_SERVICE_EXCLUDE_CIDR=
ISTIO_META_DNS_CAPTURE=
INVALID_DROP=

2023-01-12T07:42:54.205569Z	info	Istio iptables variables:
PROXY_PORT=15001
PROXY_INBOUND_CAPTURE_PORT=15006
PROXY_TUNNEL_PORT=15008
PROXY_UID=1337
PROXY_GID=1337
INBOUND_INTERCEPTION_MODE=REDIRECT
INBOUND_TPROXY_MARK=1337
INBOUND_TPROXY_ROUTE_TABLE=133
INBOUND_PORTS_INCLUDE=*
INBOUND_PORTS_EXCLUDE=15090,15021,15020
OUTBOUND_OWNER_GROUPS_INCLUDE=*
OUTBOUND_OWNER_GROUPS_EXCLUDE=
OUTBOUND_IP_RANGES_INCLUDE=*
OUTBOUND_IP_RANGES_EXCLUDE=
OUTBOUND_PORTS_INCLUDE=
OUTBOUND_PORTS_EXCLUDE=
KUBE_VIRT_INTERFACES=
ENABLE_INBOUND_IPV6=false
DNS_CAPTURE=false
DROP_INVALID=false
CAPTURE_ALL_DNS=false
DNS_SERVERS=[],[]
OUTPUT_PATH=
NETWORK_NAMESPACE=
CNI_MODE=false
HOST_NSENTER_EXEC=false
EXCLUDE_INTERFACES=

2023-01-12T07:42:54.205993Z	info	Writing following contents to rules file: /tmp/iptables-rules-1673509374205647672.txt397886627
* nat
-N ISTIO_INBOUND
-N ISTIO_REDIRECT
-N ISTIO_IN_REDIRECT
-N ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp --dport 15008 -j RETURN
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A ISTIO_INBOUND -p tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_OUTPUT -o lo -s 127.0.0.6/32 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
COMMIT
2023-01-12T07:42:54.206062Z	info	Running command: iptables-restore --noflush /tmp/iptables-rules-1673509374205647672.txt397886627
2023-01-12T07:42:54.208964Z	error	Command error output: xtables other problem: line 2 failed
2023-01-12T07:42:54.209009Z	error	Failed to execute: iptables-restore --noflush /tmp/iptables-rules-1673509374205647672.txt397886627, exit status 1

On our side, we can enhance docker system prune -f to not prune istio-init containers or even get rid of this task, but I believe even k8s suggests making the init containers idempotent.

Because init containers can be restarted, retried, or re-executed, init container code should be idempotent.

from https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior

Can I get some help/direction here?

Version

$ istioctl version
client version: 1.16.1
control plane version: 1.16.1
data plane version: 1.16.1 (9 proxies)

$ kubectl version --short
Client Version: v1.21.1
Server Version: v1.20.13

Additional Information

No response

The text was updated successfully, but these errors were encountered:

dhawton · 2023-01-12T22:10:10Z

Appears to be erroring when attempting to recreate a chain that already exists. Ideally, you shouldn't be deleting containers externally, which is why kubelet is rescheduling the init-container. There's a problem here as iptables is not declarative: there's not a way to say the rules should look like X, and only change what needs changing... but if we flush and rebuild, we risk breaking existing or new connections while the rules are being rebuilt.

howardjohn · 2023-01-12T22:52:40Z

Definitely cannot flush them. If someone has a simple proposal to make it work without flushing I would be open to it. Maybe split "create chain" and "apply rules"

logic:

_ = CreateChains() // ignore error; second run would fail
return ApplyRules()

?

what happens when it fails? Do we apply no rules, all the valid rules, or all the valid rules before the error?

dhawton · 2023-01-12T23:06:35Z

I believe at present it hits the failure to create the chains and stops. There's a config to not use the restore method, but the other relies on RunOrFail which would stop applying rules as well as soon as it fails. I can take a look at this, the restore method I don't think there's an easy way around but RunOrFail should be doable.

howardjohn · 2023-01-12T23:11:53Z

In that case my idea may be somewhat problematic. In k8s I'd imagine the rules applies never change but in VMs anything could happen and we may get into weird inconsistent states? Although I guess that can happen today as well

…

On Thu, Jan 12, 2023, 3:06 PM Daniel Hawton ***@***.***> wrote: I believe at present it hits the failure to create the chains and stops. There's a config to not use the restore method, but the other relies on RunOrFail which would stop applying rules as well as soon as it fails. I can take a look at this, the restore method I don't think there's an easy way around but RunOrFail should be doable. — Reply to this email directly, view it on GitHub <#42792 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXJV3CXKIBLTG6LGUPTWSCFAPANCNFSM6AAAAAATZOIWFQ> . You are receiving this because you commented.Message ID: ***@***.***>

RaiAnandKr · 2023-01-13T03:33:31Z

you shouldn't be deleting containers externally, which is why kubelet is rescheduling the init-container.

That's true. We already have a task in the pipeline to not do that. Irrespective of that I wanted to kick off the discussion around if the init container is idempotent and if it's straightforward enough to make it.

but if we flush and rebuild, we risk breaking existing or new connections while the rules are being rebuilt.

oh we definitely shouldn't be doing that. In fact, #16768 got us away from this deletion + re-creation combination as part of the init run so we wouldn't want to go back there.

There was this proposal at #18159 to add a step to check if iptables rules are already present and if they are, just exit. I like the #3 option proposed there and @howardjohn you seem to be a part of that conversation too. We backed out of the change but if the reason to back out is not true/no longer applies (I am not sure what it is), can we revive the same solution? (not sure what all has changed with Istio-init and cni plugin since then and if the same solution would still help).

howardjohn · 2023-01-13T15:54:02Z

Maybe we can see what k8s does, they have much more dynamic usage of iptables

…

On Thu, Jan 12, 2023 at 7:33 PM Anand Kumar ***@***.***> wrote: you shouldn't be deleting containers externally, which is why kubelet is rescheduling the init-container. That's true. We already have a task in the pipeline to not do that. Irrespective of that I wanted to kick off the discussion around if the init container is idempotent and if it's straightforward enough to make it. but if we flush and rebuild, we risk breaking existing or new connections while the rules are being rebuilt. oh we definitely shouldn't be doing that. In fact, #16768 <#16768> got us away from this deletion + re-creation combination as part of the init run so we wouldn't want to go back there. There was this proposal at #18159 <#18159> to add a step to check if iptables rules are already present and if they are, just exit. I like the #3 <#3> option proposed there and @howardjohn <https://github.com/howardjohn> you seem to be a part of that conversation too. We backed out of the change but if the reason to back out is not true/no longer applies (I am not sure what it is), can we revive the same solution? (not sure what all has changed with Istio-init and cni plugin since then and if the same solution would still help). — Reply to this email directly, view it on GitHub <#42792 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXLDBRJUVXXMINIKFSDWSDEJNANCNFSM6AAAAAATZOIWFQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

RaiAnandKr · 2023-01-30T06:18:43Z

@dhawton have you already decided on how we want to tackle this (assuming, we want to tackle this)? Let me know if you need my bandwidth on any sort of implementation or testing.

istio-policy-bot · 2023-04-29T05:02:58Z

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2023-01-13. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

davidxia · 2024-06-21T13:06:03Z

I also ran into this issue.

$ istioctl version
client version: 1.17.1
control plane version: 1.17.1
data plane version: 1.17.1 (352 proxies)

$ kubectl version
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4-gke.1447000

istio-policy-bot added area/networking area/test and release area/user experience labels Jan 12, 2023

dhawton self-assigned this Jan 12, 2023

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 14, 2023

istio-policy-bot closed this as completed Apr 29, 2023

istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Apr 29, 2023

leosarra mentioned this issue Apr 9, 2024

Idempotency for istio-iptables apply flow #50328

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple runs of istio-init container leads to crashloopbackoff status #42792

Multiple runs of istio-init container leads to crashloopbackoff status #42792

RaiAnandKr commented Jan 12, 2023 •

edited

Loading

dhawton commented Jan 12, 2023

howardjohn commented Jan 12, 2023

dhawton commented Jan 12, 2023

howardjohn commented Jan 12, 2023 via email

RaiAnandKr commented Jan 13, 2023

howardjohn commented Jan 13, 2023 via email

RaiAnandKr commented Jan 30, 2023 •

edited

Loading

istio-policy-bot commented Apr 29, 2023

davidxia commented Jun 21, 2024 •

edited

Loading

Multiple runs of istio-init container leads to crashloopbackoff status #42792

Multiple runs of istio-init container leads to crashloopbackoff status #42792

Comments

RaiAnandKr commented Jan 12, 2023 • edited Loading

Bug Description

Version

Additional Information

dhawton commented Jan 12, 2023

howardjohn commented Jan 12, 2023

dhawton commented Jan 12, 2023

howardjohn commented Jan 12, 2023 via email

RaiAnandKr commented Jan 13, 2023

howardjohn commented Jan 13, 2023 via email

RaiAnandKr commented Jan 30, 2023 • edited Loading

istio-policy-bot commented Apr 29, 2023

davidxia commented Jun 21, 2024 • edited Loading

RaiAnandKr commented Jan 12, 2023 •

edited

Loading

RaiAnandKr commented Jan 30, 2023 •

edited

Loading

davidxia commented Jun 21, 2024 •

edited

Loading