kmsg FD closing after being flooded with 1000s of messages per second

Hey team,

I'm running into an issue with the node problem detector where the kmsg file descriptor get closed after being overloaded with messages:
```
E1103 12:19:38.657492 1 log_watcher_linux.go:102] Kmsg channel closed
...
E1103 12:19:45.959686       1 log_monitor.go:136] Log channel closed: /config/kernel-monitor.json
```
```
sudo journalctl --since "2025-11-03 12:19:37" --until "2025-11-03 12:19:40" | grep -E "error|fail|killed|oom|memory" | wc -l

56477
```

When the kmsg FD is closed, the pod continues to run without indication of an issue. Is there some way we can detect this and potentially restart the container? Or is there a way to mitigate this?

Related issues: #1003 and #1004 

Kubernetes version: 1.33
Node Problem Detector version: v1.34.0

Steps to reproduce:
Prereqs:
- Kubernetes cluster with node-problem-detector daemonset installed

Steps:
- ssh to one of the instances in the cluster
- Run the script below
```
#!/bin/bash
# minimal_fast_flood.sh

COUNT=${1:-100000}
UPDATE_EVERY=5000

[ "$EUID" -ne 0 ] && exec sudo bash "$0" "$@"

echo "Flooding: $COUNT messages | $(date '+%H:%M:%S')"
START=$(date +%s)

exec 3>/dev/kmsg
for ((i=1; i<=COUNT; i++)); do
    echo "task docker:abc123 blocked for more than 60 seconds." >&3
    ((i % UPDATE_EVERY == 0)) && echo "  $i/$COUNT"
done
exec 3>&-

ELAPSED=$(($(date +%s) - START))
echo "Done: ${ELAPSED}s ($(($COUNT / $ELAPSED)) msg/s)"
```
- Check the node problem detector pod logs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kmsg FD closing after being flooded with 1000s of messages per second #1168

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kmsg FD closing after being flooded with 1000s of messages per second #1168

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions