-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod fails to start: Application container unable to access network before sidecar ready #4341
Comments
I have users that have confirmed both that this problem exists and is painful, and that setting a |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 2 weeks unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
Readiness issue. |
I've also encountered this issue when running a Kubernetes job which immediately tries to connect to a PostgreSQL instance. The job container failed with an Setting a |
I have also encountered this problem, the current processing is to sleep for a few seconds before the service starts and then connect to the network. |
#8983, which will be 1.1, should help address this (by letting applications call out to other services while Envoy is starting up). |
This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last month and a half. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
/remove stale |
This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last month and a half. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
I am using Istio v1.5.2 and still have this issue. Unfortunately the app container doesn't fail so that the workaround with the restart policy doesn't work for me. Any other ideas for a workaround or plans to solve this? We are evaluating Istio right now for our project and this seems to be a blocking issue. Only solution I see is to add something like a sleep in the app container before the real app starts.. but I expect that Istio shouldn't really need changes at the app itself to work properly. |
The full solution to this in Kubernetes is for k8s to support Sidecar containers as a first class concept, starting them up entirely before starting up the application container. We'd been hopeful this would land in the latest k8s release but it's since been put on indefinite hold by the k8s community and will not ship with K8s 1.19 (at this point we can hope for 1.20, but I haven't been following in k8s closely to see if that's realistic). Other organizations I've worked with have solved this problem by adding a sleep to the app container. The base framework for services we use at Tetrate incorporates a sleep at startup to paper over this pain too, for example. It's not clean, and violates the design goal of the mesh being transparent, but until there's better support for container lifecycles in underlying platforms that Istio runs on there's not too much we can do here. |
Thanks for the detailed information @ZackButcher! What do you think is the best place to put the "sleep"? |
I wrote our sleep to be literally the first thing that the application does at startup. It's effectively the first line of code that executes in a shared main method that all of our services use - that lets us make sure there's standard flags for configuring the startup delay, etc. Making it absolutely the first thing that happens prevents developers from accidentally attempting to do stuff that could fail without a sidecar (like opening up connections to the database or reading some online config store, etc). Anecdotally, with a 5 second delay at startup we've not seen any startup failures due to waiting on the sidecar in our continuous testing environments. How long the typical startup delay is in your system is mainly a function of Pilot load (number of services in the system, rate of change of pods, services in the system, number of sidecars connected, etc). |
When a pod starts, the sidecar and the application containers all start together.
If an application container attempts to access a network service before the sidecar is ready, the connection fails.
If the application is resilient to its dependency availability, then this is not an issue. The application will continue to retry until the connection can be established.
However if the application uses a network endpoint during the startup process and considers it a fatal error if the endpoint cannot be accessed, the application container will die.
As long as
restartPolicy
isOnFailure
(orAlways
) k8s will restart the container while sidecar gets ready.The text was updated successfully, but these errors were encountered: