Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actuator document is misleading about k8s startup probe #28432

Closed
ichenhe opened this issue Oct 22, 2021 · 7 comments
Closed

Actuator document is misleading about k8s startup probe #28432

ichenhe opened this issue Oct 22, 2021 · 7 comments
Labels
type: documentation A documentation update
Milestone

Comments

@ichenhe
Copy link

ichenhe commented Oct 22, 2021

Actuator reference says:

If an application takes longer to start than the configured liveness period, Kubernetes mention the "startupProbe" as a possible solution. The "startupProbe" is not necessarily needed here as the "readinessProbe" fails until all startup tasks are done.

But the fact is "Liveness probes do not wait for readiness probes to succeed". That means if your application take a long time to start, k8s may kill it before its readinessProbe success — It will never be able to start successfully.

So startupProbe is really necessary.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Oct 22, 2021
@bclozel
Copy link
Member

bclozel commented Oct 22, 2021

If I understand correctly, the startupProbe is a way to have a "special case livenessProbe only at startup".

What we're trying to say here is that Spring Boot generally handles what's strictly necessary to get the application live and delays many initialization tasks (like ApplicationRunner instances) after that. The readinessProbe is marked as successful only when all those startup tasks are done. The application is technically live, just handling startup tasks and not receiving traffic until it's fully ready.

In many cases, long running startup tasks are executed after the application is marked as live, so a startupProbe is not strictly necessary.

"Liveness probes do not wait for readiness probes to succeed".

I think this bit means that if your application has a successful readinessProbe and a failed livenessProbe, your application is considered as broken and will be wiped. Spring Boot is perfectly in line with that and I don't think that this section of the documentation states otherwise.

We don't completely rule out startupProbes, we're merely saying that you might not need it. Of course, some applications are handling heavy startup tasks as part of bean lifecycle (not a best practice from my point of view). Doing so ties those tasks to the context refresh phase and thus the time to get the livenessProbe UP. In this case, a startupProbe is probably required if you don't want to extend the period check too much.

I'd be happy to improve the documentation - I'd rather not explain k8s internals in our reference documentation, but give general guidance to developers.

@bclozel bclozel added the status: waiting-for-feedback We need additional information before we can continue label Oct 22, 2021
@ichenhe
Copy link
Author

ichenhe commented Oct 22, 2021

What we're trying to say here is that Spring Boot generally handles what's strictly necessary to get the application live and delays many initialization tasks (like ApplicationRunner instances) after that.

Understand. What you're trying to say is that spring usually starts very quickly, those slow tasks will be delayed. But there's a use case:

I have some micro services (maybe 5 or more), I deploy them at once, then the server may be temporarily overloaded. In this case, the start will be slower than expected. And then, k8s will kill them.


"Liveness probes do not wait for readiness probes to succeed. If you want to wait before executing a liveness probe you should use initialDelaySeconds or a startupProbe.

I think you got it wrong. Now, I have copied this warning entirly. He clarified one thing:

Maybe many people have misunderstood readiness probes because of the word readiness. They (include me) believe that liveness will not be judged until it is ready. For programs that start slowly, this understanding is fatal.

So k8s wrote this warning.

But Actuator's document deepened my misunderstanding. Now I understand what he really means, but we'd better improve the description. @bclozel

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Oct 22, 2021
@bclozel
Copy link
Member

bclozel commented Oct 22, 2021

Looking at this table describing the application startup sequence and the probe states during the different phases or the ApplicationAvailability section, I not sure how we could improve our documentation. Any idea?

@bclozel bclozel added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Oct 22, 2021
@ichenhe
Copy link
Author

ichenhe commented Oct 22, 2021

Personally, I think your explanation just now is very good. For example, we can write like this:

Generally speaking, the "startupProbe" is not necessarily needed here as the "readinessProbe" fails until all startup tasks are done, which means spirng will not receive the request until it is ready. But if your application need a long time to start (not a best practice), please add "startupProbe" to make sure k8s won't kill it in the process of starting.

In this way, we express two views:

  1. There is no need to worry about receiving the request before the startup is successful. (If I understand it correctly, that's what you want.)
  2. If the startup is slow, startupProbe is required.

And prevented the possibility of the misunderstanding likes mine.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Oct 22, 2021
@bclozel bclozel modified the milestones: 2.5.7, 2.5.x Oct 22, 2021
@bclozel bclozel added type: documentation A documentation update and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Oct 22, 2021
@wilkinsona wilkinsona modified the milestones: 2.5.x, 2.6.x May 19, 2022
@stefanocke
Copy link

stefanocke commented Jun 16, 2022

I would like to mention that a common reason for slow startup (besides heavy load) might be database migration (like flyway), since it happens before the actuator endpoints are available at all. Please correct me if I am wrong.

@rohanKanojia

This comment was marked as outdated.

@philwebb

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: documentation A documentation update
Projects
None yet
Development

No branches or pull requests

8 participants