App container unable to connect to network before sidecar is fully running #11130

linsun · 2019-01-21T20:57:26Z

Describe the feature request
We had users who spend very long time to debug why their app container stops working initially when sidecar is used in istio. They have found out the app container could not reach out to network for simple things like clone a file from GitHub before the envoy proxy is ready and running. It is hard to debug this because when they exec into the container after the deployment is running, everything works fine.

Describe alternatives you've considered
the current work group used by folks is to put a big sleep like 20 or 30 seconds in their app container to give enough time for envoy to start up.

This is fine once they discover the issue and understand how istio works better, but it can take days for them to discover the issue.

How can we make the experience better?
can we provide some startup hook so app container won't start till envoy sidecar is ready, if the app container starts very fast and requires network connectivity.

linsun · 2019-01-21T20:57:56Z

@esnible pls feel free to add things I missed. cc @GregHanson

esnible · 2019-01-23T17:22:26Z

Currently I tell people to put the following into their .yaml:

command: ["/bin/bash", "-c"]
args: ["until curl --head localhost:15000 ; do echo Waiting for Sidecar; sleep 3 ; done ; echo Sidecar available; ./startup.sh"] # replace startup.sh with actual startup command.

It would be better if networking was ready when the app container started.

A novel approach would be to slow down the app container until networking was available. A hook could set the CPU for containers other than the sidecar to use spec.containers[].resources.requests.cpu: 1m (a milli-CPU). A tool like the Network CNI would raise the CPU to an original/default value after networking started. This should starve anything compute-bound giving Envoy more time to start.

Another idea is to have the init container include pilot-agent and fetch /etc/istio
/proxy/envoy-rev0.json before any non-init containers start, allowing Envoy to be configured with real values immediately instead of waiting for Pilot while the app container is starting.

esnible · 2019-02-02T15:17:46Z

This may be a duplicate of #9454

esnible · 2019-02-03T01:34:59Z

This may be a duplicate of #4341

jackkleeman · 2019-04-18T11:46:39Z

Hey, we here at Monzo have open sourced our solution to this sequencing problem:
https://github.com/monzo/envoy-preflight
The idea is, it's a wrapper around your main application, which ensures it starts after envoy is live, and shuts down envoy when its done. You'll still need to prevent sigterms from reaching envoy.

esnible · 2019-04-30T14:26:18Z

Removing myself because I am not a sidecar networking guru. That is what we need for this item.

howardjohn · 2019-05-31T17:56:19Z

Long term fix is #11366 or maybe kubernetes/kubernetes#65502

hzxuzhonghu · 2019-06-25T02:45:23Z

As we have ALLOW_ANY, is this still a big problem?

idouba · 2019-06-28T08:21:28Z

Consider config postStart for app container to check envoy status. such as:
httpGet: path: /healthz/ready

hzxuzhonghu · 2019-06-28T08:56:50Z

I think this makes sense.

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: xxx
    command: 
    lifecycle:
      postStart:           # same as istio-proxy readiness probe, when this hook exec failed, the app container will be restarted
        httpGet:
          path: /healthz/ready
          port: 15020

xiaozhongliu · 2019-07-10T08:20:19Z

esnible's solution worked for us for a long period. Unfortunately the issue starts to occur again, and even worse.
Our external database can be unavailable for more than 8 seconds after the envoy is ready plus 5 seconds more sleep ...

until curl -s localhost:15000 > /dev/null; do echo '>>> Waiting for sidecar'; sleep 2 ; done ; echo '>>> Sidecar available'; sleep 5 ; ...

Could anyone shed light on this?

stale · 2019-10-18T05:53:16Z

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

Jonathan34 · 2019-11-19T00:36:33Z

It's been a year that this issue (or related issues) have been opened.

It would be nice not to have the deployments know that they need to wait for the mesh' sidecar to be ready. That link should not exist and waiting for the sidecar is becoming a best practice and its a known common problem that leads to weak UX and onboarding of new users.

jdomag · 2022-02-04T10:28:37Z

Containers do start sequentially - usually starting containers is nearly instant though so it looks parallel. That issue you linked is about waiting until previous ones are ready, not started. The trick, which is really a hack relying on implementation details, is a container is not considered "started" until the preStart hook (or maybe it was postStart, I forget :-)) is finished. So the hook waits for envoy to be ready.

@howardjohn that's interesting - you probably mean postStart as per https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks
However after adding the holdApplicationUntilProxyStarts: true to istio config I can't see postStart hooks being added to istio-proxy container (there's only preStop that I used for another purpose)

Looking at the repository I found those lines:

istio/operator/cmd/mesh/testdata/manifest-generate/output/sidecar_template.golden.yaml

Line 369 in 0faf513

It doesn't make any sense for me - I used preStop and holdApplicationUntilProxyStarts and ended up with preStop hook only. I understand though that both holdApplicationUntilProxyStarts and .Values.global.proxy.lifecycle.postStart can't be use the same time.

mohammadsuha · 2022-07-05T17:31:57Z

i am trying to do some setup by connecting to external service from init-container but i am unable connect to external service in ISTIO strict mode.
i found out ISTIO CNI so i want to know whether ISTIO CNI will help to resolve the issue?

https://istio.io/latest/docs/setup/additional-setup/cni/#compatibility-with-application-init-containers

in above document it says there are 3 ways we can do it

i want to know whether those 3 ways actually work to resolve the issue?

sandeep-sharda-discovery · 2022-08-04T11:31:55Z

Istio sidecar is killing the Postgress DB event listener connection frequently. We have used mentioned above workaround,
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }' traffic.sidecar.istio.io/excludeOutboundPorts: "5432"
however, the issue still occurs. Kindly update if any fix is available soon.

ChimbuChinnadurai · 2022-10-31T13:59:48Z

Another workaround is to run the init container as uid 1337
This will allow the initcontainer to bypass istio-proxy container for any external network access.

securityContext:
  runAsUser: 1337

hammadzz · 2023-06-28T13:45:17Z

There really is no interim fix for this right? I tried setting values.global.proxy.holdApplicationUntilProxyStarts=true on istiod via it's helm release but it does not help. I see stackdriver export errors on almost every deploy of a new pod, which is quite annoying.

luksa · 2023-06-28T14:52:21Z

Kubernetes is getting proper sidecar support soon. Istio will be able to leverage this new feature to run the sidecar before any init container runs, which will enable network connectivity for init containers (through istio-proxy).

omerfsen · 2023-06-28T18:00:31Z

@luksa

Kubernetes is getting proper sidecar support soon. Istio will be able to leverage this new feature to run the sidecar before any init container runs, which will enable network connectivity for init containers (through istio-proxy).

Can you give me some link about this "Kubernetes is getting proper sidecar support soon" .. I want to read more about it..

GregHanson · 2023-06-28T18:05:20Z

There is some work on this in upstream istio already around this new Kubernetes feature:

Related K8S links:

LukaszRacon · 2023-06-28T18:42:58Z

If you want to eliminate sidecars - check ambient mesh:
https://istio.io/latest/blog/2022/introducing-ambient-mesh/

GregHanson · 2023-06-28T18:47:50Z

@LukaszRacon is right. However, ambient was included in istio 1.18 but it is still considered alpha feature status and isn't recommended for production use yet. If you can afford to wait, ambient is definitely the way to go.

Granted the Sidecar KEP isn't implemented in istio yet either and I don't know the timeline of when it will land

keithmattix · 2023-08-22T14:44:22Z

This is available via feature flag in #45959

linsun · 2024-01-29T00:40:27Z

closing this due to #45959

thesuperzapper · 2024-05-23T05:20:17Z

I just want to highlight something that I got confused by:

THERE IS NO WAY TO USE THE ISTIO MESH IN initContainers

This is because the istio-proxy sidecar must be running for the istio mesh to be available, and this will obviously not be the case during the init-container phase.

Note, setting holdApplicationUntilProxyStarts to true will not fix this, because that setting only ensures your containers start after the istio-proxy.

howardjohn · 2024-05-23T12:48:31Z

THERE IS NO WAY TO USE THE ISTIO MESH IN initContainers

there is -
https://istio.io/latest/blog/2023/native-sidecars/

linsun assigned esnible Jan 21, 2019

luksa mentioned this issue Jan 29, 2019

Consider running Envoy through CNI/CRI instead of as sidecar #11366

Closed

esnible mentioned this issue Feb 3, 2019

Istio creates a race condition as the app crashes with apiserver unavailable #5442

Closed

esnible mentioned this issue Mar 23, 2019

Better support for sidecar containers in batch jobs #6324

Closed

esnible removed their assignment Apr 30, 2019

This was referenced May 8, 2019

the network of container cant connect when spring clound start with istio sidecar #13911

Closed

External service not accessible shortly, most of time when a pod is just created. #14070

Closed

howardjohn added area/networking kind/enhancement labels May 31, 2019

trevorlinton mentioned this issue Jul 1, 2019

Http Filters & Service Mesh: Important Information akkeris/akkeris#9

Open

This was referenced Aug 26, 2019

Network not available first milliseconds of the pod #9454

Closed

Unable to communicate with RabbitMQ outside the mesh #15896

Closed

howardjohn mentioned this issue Sep 7, 2019

pods with istio-sidecar container , cannot reach the external services (sometimes). #15806

Closed

schuylr mentioned this issue Oct 16, 2019

Istio: Container boots before Istio sidecar is ready rancher/rancher#23417

Closed

stale bot added the stale label Oct 18, 2019

howardjohn added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Oct 29, 2019

geeknoid removed the stale label Nov 2, 2019

This was referenced Jun 25, 2021

initContainer tfserving-model-initializer is not able to pull model from s3 SeldonIO/seldon-operator#71

Open

initContainer tfserving-model-initializer is not able to pull model from s3 SeldonIO/seldon-core#3330

Closed

skaravad mentioned this issue Aug 18, 2021

Add ability to wait for sidecar container kumahq/kuma#2483

Closed

xhejtman mentioned this issue Aug 20, 2021

v2 initContainer can't run as non-root kubeflow/mpi-operator#407

Closed

callum-tait-pbx mentioned this issue Aug 25, 2021

Connection Refused on example-runner actions/actions-runner-controller#767

Closed

2 tasks

enocom mentioned this issue Sep 30, 2021

Wait for network before starting up GoogleCloudPlatform/cloud-sql-proxy#949

Closed

adriangb mentioned this issue Feb 26, 2022

Add command for use in "PostStart" hook that delays until proxy has started GoogleCloudPlatform/cloud-sql-proxy#1117

Closed

bstadlbauer mentioned this issue May 30, 2022

[BUG] flytescheduler not starting with istio authorization policy flyteorg/flyte#2562

Closed

2 tasks

narendrapatel mentioned this issue Sep 9, 2022

Adding support for lifecycle hooks and health probe for sidecars hashicorp/consul-k8s#1482

Closed

2 tasks

tafaust mentioned this issue Oct 12, 2022

enterprise-gateway does not connect to k8s kernel when istio is configured jupyter-server/enterprise_gateway#1168

Open

ypoplavs mentioned this issue Dec 22, 2022

Istio support kubeshop/testkube#2936

Closed

curtiscook mentioned this issue Feb 21, 2023

Make Kuma's init container first by default kumahq/kuma#3121

Closed

This was referenced Sep 4, 2023

Race condition with Istio sidecar prevents KIC to startup correctly Kong/kubernetes-ingress-controller#4603

Closed

feat: Add lifecycle hook support to ingress container Kong/charts#880

Closed

wleese mentioned this issue Sep 14, 2023

Adopt Scuttle #46991

Closed

linsun closed this as completed Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

App container unable to connect to network before sidecar is fully running #11130

App container unable to connect to network before sidecar is fully running #11130

linsun commented Jan 21, 2019

linsun commented Jan 21, 2019

esnible commented Jan 23, 2019

esnible commented Feb 2, 2019

esnible commented Feb 3, 2019

jackkleeman commented Apr 18, 2019

esnible commented Apr 30, 2019

howardjohn commented May 31, 2019

hzxuzhonghu commented Jun 25, 2019

idouba commented Jun 28, 2019 •

edited

hzxuzhonghu commented Jun 28, 2019

xiaozhongliu commented Jul 10, 2019 •

edited

stale bot commented Oct 18, 2019

Jonathan34 commented Nov 19, 2019

jdomag commented Feb 4, 2022

mohammadsuha commented Jul 5, 2022

sandeep-sharda-discovery commented Aug 4, 2022

ChimbuChinnadurai commented Oct 31, 2022

hammadzz commented Jun 28, 2023

luksa commented Jun 28, 2023

omerfsen commented Jun 28, 2023

GregHanson commented Jun 28, 2023

LukaszRacon commented Jun 28, 2023

GregHanson commented Jun 28, 2023 •

edited

keithmattix commented Aug 22, 2023

linsun commented Jan 29, 2024

thesuperzapper commented May 23, 2024

howardjohn commented May 23, 2024

App container unable to connect to network before sidecar is fully running #11130

App container unable to connect to network before sidecar is fully running #11130

Comments

linsun commented Jan 21, 2019

linsun commented Jan 21, 2019

esnible commented Jan 23, 2019

esnible commented Feb 2, 2019

esnible commented Feb 3, 2019

jackkleeman commented Apr 18, 2019

esnible commented Apr 30, 2019

howardjohn commented May 31, 2019

hzxuzhonghu commented Jun 25, 2019

idouba commented Jun 28, 2019 • edited

hzxuzhonghu commented Jun 28, 2019

xiaozhongliu commented Jul 10, 2019 • edited

stale bot commented Oct 18, 2019

Jonathan34 commented Nov 19, 2019

jdomag commented Feb 4, 2022

mohammadsuha commented Jul 5, 2022

sandeep-sharda-discovery commented Aug 4, 2022

ChimbuChinnadurai commented Oct 31, 2022

hammadzz commented Jun 28, 2023

luksa commented Jun 28, 2023

omerfsen commented Jun 28, 2023

GregHanson commented Jun 28, 2023

LukaszRacon commented Jun 28, 2023

GregHanson commented Jun 28, 2023 • edited

keithmattix commented Aug 22, 2023

linsun commented Jan 29, 2024

thesuperzapper commented May 23, 2024

howardjohn commented May 23, 2024

idouba commented Jun 28, 2019 •

edited

xiaozhongliu commented Jul 10, 2019 •

edited

GregHanson commented Jun 28, 2023 •

edited