Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App container unable to connect to network before sidecar is fully running #11130

Closed
linsun opened this issue Jan 21, 2019 · 43 comments
Closed
Labels
area/networking kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Milestone

Comments

@linsun
Copy link
Member

linsun commented Jan 21, 2019

Describe the feature request
We had users who spend very long time to debug why their app container stops working initially when sidecar is used in istio. They have found out the app container could not reach out to network for simple things like clone a file from GitHub before the envoy proxy is ready and running. It is hard to debug this because when they exec into the container after the deployment is running, everything works fine.

Describe alternatives you've considered
the current work group used by folks is to put a big sleep like 20 or 30 seconds in their app container to give enough time for envoy to start up.

This is fine once they discover the issue and understand how istio works better, but it can take days for them to discover the issue.

How can we make the experience better?
can we provide some startup hook so app container won't start till envoy sidecar is ready, if the app container starts very fast and requires network connectivity.

@linsun
Copy link
Member Author

linsun commented Jan 21, 2019

@esnible pls feel free to add things I missed. cc @GregHanson

@esnible
Copy link
Contributor

esnible commented Jan 23, 2019

Currently I tell people to put the following into their .yaml:

command: ["/bin/bash", "-c"]
args: ["until curl --head localhost:15000 ; do echo Waiting for Sidecar; sleep 3 ; done ; echo Sidecar available; ./startup.sh"] # replace startup.sh with actual startup command.

It would be better if networking was ready when the app container started.

A novel approach would be to slow down the app container until networking was available. A hook could set the CPU for containers other than the sidecar to use spec.containers[].resources.requests.cpu: 1m (a milli-CPU). A tool like the Network CNI would raise the CPU to an original/default value after networking started. This should starve anything compute-bound giving Envoy more time to start.

Another idea is to have the init container include pilot-agent and fetch /etc/istio
/proxy/envoy-rev0.json
before any non-init containers start, allowing Envoy to be configured with real values immediately instead of waiting for Pilot while the app container is starting.

@esnible
Copy link
Contributor

esnible commented Feb 2, 2019

This may be a duplicate of #9454

@esnible
Copy link
Contributor

esnible commented Feb 3, 2019

This may be a duplicate of #4341

@jackkleeman
Copy link
Member

Hey, we here at Monzo have open sourced our solution to this sequencing problem:
https://github.com/monzo/envoy-preflight
The idea is, it's a wrapper around your main application, which ensures it starts after envoy is live, and shuts down envoy when its done. You'll still need to prevent sigterms from reaching envoy.

@esnible esnible removed their assignment Apr 30, 2019
@esnible
Copy link
Contributor

esnible commented Apr 30, 2019

Removing myself because I am not a sidecar networking guru. That is what we need for this item.

@howardjohn
Copy link
Member

Long term fix is #11366 or maybe kubernetes/kubernetes#65502

@hzxuzhonghu
Copy link
Member

As we have ALLOW_ANY, is this still a big problem?

@idouba
Copy link
Member

idouba commented Jun 28, 2019

Consider config postStart for app container to check envoy status. such as:
httpGet: path: /healthz/ready

@hzxuzhonghu
Copy link
Member

I think this makes sense.

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: xxx
    command: 
    lifecycle:
      postStart:           # same as istio-proxy readiness probe, when this hook exec failed, the app container will be restarted
        httpGet:
          path: /healthz/ready
          port: 15020

@xiaozhongliu
Copy link

xiaozhongliu commented Jul 10, 2019

esnible's solution worked for us for a long period. Unfortunately the issue starts to occur again, and even worse.
Our external database can be unavailable for more than 8 seconds after the envoy is ready plus 5 seconds more sleep ...

until curl -s localhost:15000 > /dev/null; do echo '>>> Waiting for sidecar'; sleep 2 ; done ; echo '>>> Sidecar available'; sleep 5 ; ...

Could anyone shed light on this?

@stale
Copy link

stale bot commented Oct 18, 2019

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 18, 2019
@howardjohn howardjohn added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Oct 29, 2019
@geeknoid geeknoid removed the stale label Nov 2, 2019
@Jonathan34
Copy link

It's been a year that this issue (or related issues) have been opened.

It would be nice not to have the deployments know that they need to wait for the mesh' sidecar to be ready. That link should not exist and waiting for the sidecar is becoming a best practice and its a known common problem that leads to weak UX and onboarding of new users.

@jdomag
Copy link

jdomag commented Feb 4, 2022

Containers do start sequentially - usually starting containers is nearly instant though so it looks parallel. That issue you linked is about waiting until previous ones are ready, not started. The trick, which is really a hack relying on implementation details, is a container is not considered "started" until the preStart hook (or maybe it was postStart, I forget :-)) is finished. So the hook waits for envoy to be ready.

@howardjohn that's interesting - you probably mean postStart as per https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks
However after adding the holdApplicationUntilProxyStarts: true to istio config I can't see postStart hooks being added to istio-proxy container (there's only preStop that I used for another purpose)

image

Looking at the repository I found those lines:

It doesn't make any sense for me - I used preStop and holdApplicationUntilProxyStarts and ended up with preStop hook only. I understand though that both holdApplicationUntilProxyStarts and .Values.global.proxy.lifecycle.postStart can't be use the same time.

@mohammadsuha
Copy link

i am trying to do some setup by connecting to external service from init-container but i am unable connect to external service in ISTIO strict mode.
i found out ISTIO CNI so i want to know whether ISTIO CNI will help to resolve the issue?

https://istio.io/latest/docs/setup/additional-setup/cni/#compatibility-with-application-init-containers

in above document it says there are 3 ways we can do it

i want to know whether those 3 ways actually work to resolve the issue?

@sandeep-sharda-discovery

Istio sidecar is killing the Postgress DB event listener connection frequently. We have used mentioned above workaround,
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }' traffic.sidecar.istio.io/excludeOutboundPorts: "5432"
however, the issue still occurs. Kindly update if any fix is available soon.

@ChimbuChinnadurai
Copy link

Another workaround is to run the init container as uid 1337
This will allow the initcontainer to bypass istio-proxy container for any external network access.

securityContext:
  runAsUser: 1337

@hammadzz
Copy link

There really is no interim fix for this right? I tried setting values.global.proxy.holdApplicationUntilProxyStarts=true on istiod via it's helm release but it does not help. I see stackdriver export errors on almost every deploy of a new pod, which is quite annoying.

@luksa
Copy link
Contributor

luksa commented Jun 28, 2023

Kubernetes is getting proper sidecar support soon. Istio will be able to leverage this new feature to run the sidecar before any init container runs, which will enable network connectivity for init containers (through istio-proxy).

@omerfsen
Copy link

@luksa

Kubernetes is getting proper sidecar support soon. Istio will be able to leverage this new feature to run the sidecar before any init container runs, which will enable network connectivity for init containers (through istio-proxy).

Can you give me some link about this "Kubernetes is getting proper sidecar support soon" .. I want to read more about it..

@GregHanson
Copy link
Member

There is some work on this in upstream istio already around this new Kubernetes feature:

Related K8S links:

@LukaszRacon
Copy link

If you want to eliminate sidecars - check ambient mesh:
https://istio.io/latest/blog/2022/introducing-ambient-mesh/

@GregHanson
Copy link
Member

GregHanson commented Jun 28, 2023

@LukaszRacon is right. However, ambient was included in istio 1.18 but it is still considered alpha feature status and isn't recommended for production use yet. If you can afford to wait, ambient is definitely the way to go.

Granted the Sidecar KEP isn't implemented in istio yet either and I don't know the timeline of when it will land

@keithmattix
Copy link
Contributor

This is available via feature flag in #45959

@linsun
Copy link
Member Author

linsun commented Jan 29, 2024

closing this due to #45959

@linsun linsun closed this as completed Jan 29, 2024
@thesuperzapper
Copy link

I just want to highlight something that I got confused by:

THERE IS NO WAY TO USE THE ISTIO MESH IN initContainers

This is because the istio-proxy sidecar must be running for the istio mesh to be available, and this will obviously not be the case during the init-container phase.

Note, setting holdApplicationUntilProxyStarts to true will not fix this, because that setting only ensures your containers start after the istio-proxy.

@howardjohn
Copy link
Member

THERE IS NO WAY TO USE THE ISTIO MESH IN initContainers

there is -
https://istio.io/latest/blog/2023/native-sidecars/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
Development

No branches or pull requests