Pod fails to start: Application container unable to access network before sidecar ready #4341

mandarjog · 2018-03-16T22:05:55Z

When a pod starts, the sidecar and the application containers all start together.

If an application container attempts to access a network service before the sidecar is ready, the connection fails.

Access can fail completely if no listener is present on the sidecar
Access fails with 404 / 503 if listener is present but no routes are available.

If the application is resilient to its dependency availability, then this is not an issue. The application will continue to retry until the connection can be established.
However if the application uses a network endpoint during the startup process and considers it a fatal error if the endpoint cannot be accessed, the application container will die.

As long as restartPolicy is OnFailure (or Always) k8s will restart the container while sidecar gets ready.

Test that this really works
Document mitigation

The text was updated successfully, but these errors were encountered:

ZackButcher · 2018-06-18T21:33:38Z

I have users that have confirmed both that this problem exists and is painful, and that setting a restartPolicy "fixes" it. Really this is just the user-visible side of #4363.

stale · 2018-07-22T06:56:01Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 2 weeks unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

costinm · 2018-08-07T18:18:40Z

Readiness issue.

bluk · 2018-08-24T18:35:28Z

I've also encountered this issue when running a Kubernetes job which immediately tries to connect to a PostgreSQL instance. The job container failed with an ERROR: connect ECONNREFUSED 10.99.214.72:5432. I was thinking that it was because I enabled mutual TLS with Istio but I eventually found that just having the sidecar injected would cause this issue. If I ran the job without the sidecar being injected, the job would succeed.

Setting a restartPolicy: OnFailure will help the issue as noted. Is there a recommended way to identify if the sidecar is ready?

violetgo · 2018-09-29T08:28:45Z

I have also encountered this problem, the current processing is to sleep for a few seconds before the service starts and then connect to the network.

ZackButcher · 2018-10-01T20:44:30Z

#8983, which will be 1.1, should help address this (by letting applications call out to other services while Envoy is starting up).

stale · 2018-12-30T21:27:00Z

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale · 2019-02-12T10:00:05Z

This issue has been automatically closed because it has not had activity in the last month and a half. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

hzxuzhonghu · 2019-02-13T01:34:38Z

/remove stale
since this is not addressed

stale · 2019-05-14T07:30:41Z

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale · 2019-06-13T08:05:04Z

This issue has been automatically closed because it has not had activity in the last month and a half. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

chris922 · 2020-04-30T06:22:39Z

I am using Istio v1.5.2 and still have this issue. Unfortunately the app container doesn't fail so that the workaround with the restart policy doesn't work for me.

Any other ideas for a workaround or plans to solve this? We are evaluating Istio right now for our project and this seems to be a blocking issue.

Only solution I see is to add something like a sleep in the app container before the real app starts.. but I expect that Istio shouldn't really need changes at the app itself to work properly.

ZackButcher · 2020-05-01T21:03:12Z

The full solution to this in Kubernetes is for k8s to support Sidecar containers as a first class concept, starting them up entirely before starting up the application container. We'd been hopeful this would land in the latest k8s release but it's since been put on indefinite hold by the k8s community and will not ship with K8s 1.19 (at this point we can hope for 1.20, but I haven't been following in k8s closely to see if that's realistic).

Other organizations I've worked with have solved this problem by adding a sleep to the app container. The base framework for services we use at Tetrate incorporates a sleep at startup to paper over this pain too, for example. It's not clean, and violates the design goal of the mesh being transparent, but until there's better support for container lifecycles in underlying platforms that Istio runs on there's not too much we can do here.

chris922 · 2020-05-02T12:54:10Z

Thanks for the detailed information @ZackButcher!

What do you think is the best place to put the "sleep"?

ZackButcher · 2020-05-04T18:39:55Z

I wrote our sleep to be literally the first thing that the application does at startup. It's effectively the first line of code that executes in a shared main method that all of our services use - that lets us make sure there's standard flags for configuring the startup delay, etc. Making it absolutely the first thing that happens prevents developers from accidentally attempting to do stuff that could fail without a sidecar (like opening up connections to the database or reading some online config store, etc).

Anecdotally, with a 5 second delay at startup we've not seen any startup failures due to waiting on the sidecar in our continuous testing environments. How long the typical startup delay is in your system is mainly a function of Pilot load (number of services in the system, rate of change of pods, services in the system, number of sidecars connected, etc).

mandarjog added the area/networking label Mar 16, 2018

rshriram assigned ZackButcher Jun 12, 2018

ZackButcher mentioned this issue Jun 18, 2018

istio-proxy takes so long time to work-status, so if the application is started before this time and the application need call external network, it will be failed! #4620

Closed

sakshigoel12 added this to the 1.0 milestone Jun 22, 2018

stale bot added the stale label Jul 22, 2018

stale bot removed the stale label Aug 7, 2018

stale bot added the stale label Dec 30, 2018

esnible mentioned this issue Feb 3, 2019

App container unable to connect to network before sidecar is fully running #11130

Closed

stale bot closed this as completed Feb 12, 2019

hzxuzhonghu reopened this Feb 13, 2019

stale bot removed the stale label Feb 13, 2019

mandarjog added the community/help wanted label Feb 13, 2019

stale bot added the stale label May 14, 2019

stale bot closed this as completed Jun 13, 2019

rlenglet modified the milestones: 1.4, 1.3 Jul 9, 2019

ghost mentioned this issue Dec 24, 2020

Proxy pass jwt to application #29762

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod fails to start: Application container unable to access network before sidecar ready #4341

Pod fails to start: Application container unable to access network before sidecar ready #4341

mandarjog commented Mar 16, 2018

ZackButcher commented Jun 18, 2018

stale bot commented Jul 22, 2018

costinm commented Aug 7, 2018

bluk commented Aug 24, 2018

violetgo commented Sep 29, 2018

ZackButcher commented Oct 1, 2018

stale bot commented Dec 30, 2018

stale bot commented Feb 12, 2019

hzxuzhonghu commented Feb 13, 2019

stale bot commented May 14, 2019

stale bot commented Jun 13, 2019

chris922 commented Apr 30, 2020

ZackButcher commented May 1, 2020

chris922 commented May 2, 2020

ZackButcher commented May 4, 2020

Pod fails to start: Application container unable to access network before sidecar ready #4341

Pod fails to start: Application container unable to access network before sidecar ready #4341

Comments

mandarjog commented Mar 16, 2018

ZackButcher commented Jun 18, 2018

stale bot commented Jul 22, 2018

costinm commented Aug 7, 2018

bluk commented Aug 24, 2018

violetgo commented Sep 29, 2018

ZackButcher commented Oct 1, 2018

stale bot commented Dec 30, 2018

stale bot commented Feb 12, 2019

hzxuzhonghu commented Feb 13, 2019

stale bot commented May 14, 2019

stale bot commented Jun 13, 2019

chris922 commented Apr 30, 2020

ZackButcher commented May 1, 2020

chris922 commented May 2, 2020

ZackButcher commented May 4, 2020