Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-problem-detector does not discover OOMs in newer Linux kernels #96746

Closed
tosi3k opened this issue Nov 20, 2020 · 2 comments · Fixed by #96716
Closed

node-problem-detector does not discover OOMs in newer Linux kernels #96746

tosi3k opened this issue Nov 20, 2020 · 2 comments · Fixed by #96716
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@tosi3k
Copy link
Member

tosi3k commented Nov 20, 2020

What happened:
In a cluster with NPD addon installed and Linux 5.1+ as the kernel OS of the nodes OOMs are not being captured by the NPD.

What you expected to happen:
NPD should inform about OOMs by emitting appropriate K8s events.

How to reproduce it (as minimally and precisely as possible):
Create a K8s cluster with NPD addon and Linux 5.1+ installed on the nodes. Schedule a pod which leaks memory. Try to list events with reason: OOMKilling field after an OOM occurs to see there's none.

Anything else we need to know?:
Fix is already merged into newest (v0.8.5) NPD release: kubernetes/node-problem-detector#481.

We're currently waiting for the release of Docker image of NPD v0.8.5 - we can proceed with fixing this in k/k afterwards. After it gets merged to the master branch, it should be then cherry-picked to older releases as well.

Environment:
Any Kubernetes version with NPD addon installed and Linux as the node OS with the kernel version 5.1 or newer.

@tosi3k tosi3k added the kind/bug Categorizes issue or PR as related to a bug. label Nov 20, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 20, 2020
@k8s-ci-robot
Copy link
Contributor

@tosi3k: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 20, 2020
@tosi3k
Copy link
Member Author

tosi3k commented Nov 20, 2020

/sig scalability
/sig node

I opened a fixing PR in #96716 but it's on hold since the Docker image of NPD v0.8.5 doesn't exist yet.

@k8s-ci-robot k8s-ci-robot added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants