Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: reinit health monitor on live-restore #47051

Merged
merged 2 commits into from Jan 16, 2024

Conversation

corhere
Copy link
Contributor

@corhere corhere commented Jan 9, 2024

The container may have been running without health probes for an indeterminate amount of time. The container may have become unhealthy in the interim. We should probe it sooner than in steady-state, while also giving it some leeway to recover from e.g. timed-out connections. This is easy to achieve by probing the container like a freshly-started one. The original author of health-checks came to the same conclusion; the health monitor was reinitialized on live-restored containers before v17.11.0, when health monitoring of live-restored containers was accidentally broken. When the regression was noticed (for v23.0.0, five years after it was broken!) health monitoring of live-restored containers was resumed, not reinitialized. With the new addition of Healthcheck.StartInterval, the original behaviour is even more compelling.

- What I did
Revert to the original behavior of reinitializing the health monitor on live restore so that restored containers' health monitors are reset to the start period.

- How I did it

- How to verify it

- Description for the changelog

  • Containers that are live-restored will now be given another health-check start period when the daemon restarts, giving the container more time to recover from live-restore before the health-retries countdown begins while being health-checked at the start interval.

- A picture of a cute animal (not mandatory but encouraged)

The container may have been running without health probes for an
indeterminate amount of time. The container may have become unhealthy in
the interim. We should probe it sooner than in steady-state, while also
giving it some leeway to recover from e.g. timed-out connections. This
is easy to achieve by probing the container like a freshly-started one.
The original author of health-checks came to the same conclusion; the
health monitor was reinitialized on live-restored containers before
v17.11.0, when health monitoring of live-restored containers was
accidentally broken. Revert to the original behavior.

Signed-off-by: Cory Snider <csnider@mirantis.com>
@corhere corhere added status/1-design-review kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/daemon labels Jan 9, 2024
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

but perhaps @cpuguy83 could have a quick peek to double-check if there's any concerns

@cpuguy83
Copy link
Member

Do we have a test for this in live-restore?
A quick search doesn't yield anything:

$ git grep -i 'Live' | grep -i Health
$ git grep -i 'Restore' | grep -i Health
daemon/health.go:361:// Reset the health state for a newly-started, restarted or restored container.

Update the TestDaemonRestartKilContainers integration test to assert
that a container's healthcheck status is always reset to the Starting
state after a daemon restart, even when the container is live-restored.

Signed-off-by: Cory Snider <csnider@mirantis.com>
@corhere
Copy link
Contributor Author

corhere commented Jan 15, 2024

@cpuguy83 an integration test was added as part of #41935. I have now updated that test to also assert that the health monitor is reset to the Starting state when the container is (live-)restored after a daemon restart.

@cpuguy83 cpuguy83 merged commit 353bccd into moby:master Jan 16, 2024
105 checks passed
@corhere corhere deleted the reinit-health-on-live-restore branch January 16, 2024 19:09
@vvoland vvoland added this to the 25.0.0 milestone Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon impact/changelog kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/1-design-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants