Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambient: Race condition external network unreachable #51193

Closed
2 tasks done
jbmolle opened this issue May 22, 2024 · 4 comments · Fixed by istio/ztunnel#1092
Closed
2 tasks done

Ambient: Race condition external network unreachable #51193

jbmolle opened this issue May 22, 2024 · 4 comments · Fixed by istio/ztunnel#1092
Assignees
Labels
area/ambient Issues related to ambient mesh area/networking

Comments

@jbmolle
Copy link

jbmolle commented May 22, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

Hi,

When I start a pod with a cluster external network request, the request fails and the pod is in an error state. Not using ambient mode works.
I've installed Ambient with Cilium CNI.
I've firstly notice the bug with Argo Workflows. I have a workflow with a Git input artifact. So when the pod starts, the first init container does a git clone and with Ambient mode I get "artifact failed to load: failed to clone "https://gitlab.com/xxxx.git": Get "https://gitlab.com/xxx.git/info/refs?service=git-upload-pack": EOF
I don't get any error if I run the workflow on a non Ambient namespace.

I've reproduced the bug with a simple pod. I create a namespace test with label istio.io/dataplane-mode=ambient

Then I create a pod which fails:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: test
spec:
  containers:
  - name: busybox
    image: busybox
    command:
    - /bin/sh
    - -c
    - wget https://google.com
  restartPolicy: Never

But this pod succeeds:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: test
spec:
  containers:
  - name: busybox
    image: busybox
    command:
    - /bin/sh
    - -c
    - sleep 1 && wget https://google.com
  restartPolicy: Never

Just doing a sleep of 1 second gives enough time for the pod to get the correct network settings.
Also just the first container is affected so this pod also works:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: test
spec:
  initContainers:
  - name: init
    image: busybox
    command:
    - /bin/sh
    - -c
    - sleep 1 && wget https://google.com
  containers:
  - name: busybox
    image: busybox
    command:
    - /bin/sh
    - -c
    - wget https://google.com
  restartPolicy: Never

And if I remove the sleep from init initContainer then it fails.

Version

Istio
client version: 1.22.0
control plane version: 1.22.0
data plane version: 1.22.0 (12 proxies)

K8s
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.1

Cilium
cilium-cli: v0.16.7 compiled with go1.22.2 on linux/amd64
cilium image (default): v1.15.4
cilium image (stable): v1.15.5
cilium image (running): 1.15.4

Additional Information

No response

@istio-policy-bot istio-policy-bot added the area/ambient Issues related to ambient mesh label May 22, 2024
@howardjohn
Copy link
Member

cc @bleggett

I think we need to unify the inpod manager and WDS to only ACK when we get the local workload. Otherwise we get 'unknown source'

@bleggett bleggett self-assigned this May 22, 2024
@bleggett
Copy link
Contributor

cc @bleggett

I think we need to unify the inpod manager and WDS to only ACK when we get the local workload. Otherwise we get 'unknown source'

Ah ok. Yeah. I'll take a look.

@bleggett
Copy link
Contributor

bleggett commented Jun 4, 2024

@jbmolle this should be fixed in latest 1.23 dev builds and once istio/ztunnel#1111 merges the fix will go out in in the next 1.22 point release - apologies for the delay!

If you have other issues, let us know.

@jbmolle
Copy link
Author

jbmolle commented Jun 4, 2024

@bleggett Thank you very much for the quick fix!
I'm looking forward to using the next 1.22 release.
And thanks to all the istio contributors, Ambient is such a delight to use!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh area/networking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants