Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application experiences connection refused for all outbound requests during startup #2704

Closed
dwj300 opened this issue Apr 16, 2019 · 11 comments

Comments

@dwj300
Copy link
Contributor

dwj300 commented Apr 16, 2019

Bug Report

What is the issue?

(More or a question, or a request to improve docs): During the initialization of our application, we make a handful of HTTP calls to external services. While the proxy is initializing (acquiring its cert, etc), these calls all receive a connection refused error. While we do have retries in place, we do run out of attempts and then fatally fail. While we can simply can add more retries, we were wondering what the recommended pattern is for waiting for the proxy to be ready before making these calls. Ideally, we could instruct kubernetes to not start our application container until the proxy gives the OK, but I don't believe that is supported today. Should we probe the pod's health endpoint until it is ready? Is there a better way to do this?

How can it be reproduced?

Deploy a pod that makes some HTTP outbound calls (in our case, to the open internet).

Logs, error output, etc

Get https://<URL>: dial tcp <IP>:<PORT>: connect: connection refused

linkerd check output

linkerd check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ control plane namespace exists
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

Status check results are √

Environment

  • Kubernetes Version: 1.12
  • Cluster Environment: AKS
  • Host OS: Ubuntu 16.04
  • Linkerd version: edge-19.4.4

Possible solution

Additional context

@olix0r
Copy link
Member

olix0r commented Apr 16, 2019

Thanks @dwj300 . We're hopeful that Kubernetes will provide a better primitive so that there's a principled way to achieve this, but I have a workaround you can try in the meantime:

A few weeks ago, I wrote a little tool that you can use to prevent your application from running until the proxy is ready: https://github.com/olix0r/linkerd-await

I've tested it for myself and it seems to work well, but since this needs to be added to your application container images, it might not be an ideal solution... Ideally, we'd improve this to only block on readiness when linkerd is injected, but we don't yet add environment variables to non-proxy containers.

Let us know if this workaround is feasible for you and if you think it would be useful to pull into the project more officially.

@campbel
Copy link
Contributor

campbel commented Apr 19, 2019

Also noticing this issue, it's not urgent but would make the deployment cleaner. Currently my container will boot, fail to connect to a rabbitmq server and then restart. The second attempt usually succeeds.

@dwj300
Copy link
Contributor Author

dwj300 commented May 2, 2019

@olix0r sorry for the slow delay, finally got around to trying this - works great! Maybe just add a snippet to the Readme about recommended uses in kubernetes. For us, we now build our images with COPY --from=olix0r/linkerd-await:v0.1.0 /linkerd-await / and then in our kubernetes yamls, we add:

command: ["/linkerd-await"]
args: ["--", "someBinary", "--some", "--args"]

@stale
Copy link

stale bot commented Jul 31, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 31, 2019
@stale stale bot closed this as completed Aug 15, 2019
@poochwashere
Copy link
Contributor

Hi @olix0r, I was wondering if there is any update on this. We experience the same reset connection errors in our apps. I was reviewing the workaround, but I really do not want to put the binary in all of our container images.

@grampelberg
Copy link
Contributor

@poochwashere unfortunately, there's not much Linkerd can do here. The options are:

@eeeeeta
Copy link

eeeeeta commented Jul 10, 2020

Would it not be possible for the linkerd-init container to block until the sidecar was initialized? That seems like a simple solution that would fix this issue, but maybe I'm missing something here...

@alpeb
Copy link
Member

alpeb commented Jul 10, 2020

Would it not be possible for the linkerd-init container to block until the sidecar was initialized? That seems like a simple solution that would fix this issue, but maybe I'm missing something here...

When a pod is launched, Init-containers like linkerd-init are run first and when they're done the normal containers are launched (the linkerd proxy and your app containers). The problem is to ensure a start sequence (and also shutdown sequence) for the normal containers. That's what the options outlined above try to address.

@Tolsto
Copy link

Tolsto commented Jan 24, 2021

Would it be possible to add a post-start lifecycle hook to the linkerd-proxy container that blocks until the proxy is ready to serve requests?

@olix0r
Copy link
Member

olix0r commented Jan 24, 2021

@Tolsto The issue is that we need to block application containers from starting until the proxy has started. I don't think post-start hooks are can do this. Per the docs:

This hook is executed immediately after a container is created. However, there is no guarantee that the hook will execute before the container ENTRYPOINT.

So, even if we modified application containers with a hook that depends on the proxy, I don't think this could reliably block the container from starting.

That said, if you find something that works, please share an example!

@Tolsto
Copy link

Tolsto commented Jan 25, 2021

@olix0r I didn't mean to use post-start hooks for the application containers but only for the linkerd-proxy container. Kubernetes starts containers sequentially and a post-start hook will block that process until the post-start hook has completed. This post here describes the idea: https://medium.com/@marko.luksa/delaying-application-start-until-sidecar-is-ready-2ec2d21a7b74

The relevant code is: https://github.com/kubernetes/kubernetes/blob/537a602195efdc04cdf2cb0368792afad082d9fd/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L827-L830

and

https://github.com/kubernetes/kubernetes/blob/4c9e96c2388c51a48ea530b111ba7381872e7d7a/pkg/kubelet/kuberuntime/kuberuntime_container.go#L212-L227

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants