New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make OOM not be a SIGKILL #40157
Comments
@kubernetes/sig-node-feature-requests |
It is not possible to change OOM behavior currently. Kubernetes (or runtime) could provide your container a signal whenever your container is close to its memory limit. This will be on a best effort basis though because memory spikes might not be handled on time. |
FYI using this crutch atm https://github.com/grosser/preoomkiller any idea what would need to change to make OOM behavior configureable ? |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
/remove-lifecycle stale |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Was this meant to be closed? It seems like @yujuhong meant to say /remove-lifecycle rotten? |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
When the node is reaching OOM levels I guess I understand some SIGKILLs happening but when a pod is reaching it's manually set resource limit it also gets a SIGKILL. As the initial post mentions, this can cause a lot of harm. As a workaround we're going to try and make the pod unhealthy before it reaches the memory limit to get a graceful shutdown. If I want this feature created, how should I go about it? Should I provide a PR with code changes or ping someone to make a proposal? |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
This remains an active issue. There appears to be no way to gracefully handle OOMKilled at the moment. @dashpole Can this be re-opened? |
/reopen |
What makes this really interesting is sometimes you want the kernel to do the OOMKill when the node is under memory pressure, other times it woul dbe nice if kubernetes would preemptively SIGTERM as a OOMKill because the container is approaching a set limit |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Even better would be that OOM causes the call to brk to fail (aka malloc returning NULL), allowing the application to gracefully handle running out of memory the normal way instead of killing the process when it goes over. |
It should provide opportunity to gracefully shutdown |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
for reasons outlined in #40157 (comment) we can't just change the signal delivered. OTOH we can integrate with some OOM daemons, but this would require a separate discussion and KEP. |
Kubernetes does not use issues on this repo for support requests. If you have a question on how to use Kubernetes or to debug a specific issue, please visit our forums. /remove-kind feature |
@fromanirh: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@fromanirh this is not a support request, it's a legit feature request. |
@fromanirh Can you reopen this, please? It's clearly a feature request, not a support case. |
it is a feature request for the Kernel not for Kubernetes , the kernel generates the SIGKILL |
@aojea there non-kernel solutions here, such as triggering graceful shutdowns at a threshold memory usage (e.g. 95%) before the hard limit is reached. |
oh, that is not clear from the title and from the comments, sorry
so it should be retitled or open a new issue with the clear request ... and for sure that will need a KEP |
Sure, how about titling it: "Add graceful memory usage-based SIGTERM before hard OOM kill happens" |
Is there another Issue that's been recreated for this? I can't find one in the Issues list. If not, I can create a new feature request issue. |
Atm apps that go over the memory limit are hard killed 'OOMKilled', which is bad (losing state / not running cleanup code etc)
Is there a way to get SIGTERM instead (with a grace period or 100m before reaching the limit) ?
The text was updated successfully, but these errors were encountered: