Skip to content

kmsg FD closing after being flooded with 1000s of messages per second #1168

@arjraman

Description

@arjraman

Hey team,

I'm running into an issue with the node problem detector where the kmsg file descriptor get closed after being overloaded with messages:

E1103 12:19:38.657492 1 log_watcher_linux.go:102] Kmsg channel closed
...
E1103 12:19:45.959686       1 log_monitor.go:136] Log channel closed: /config/kernel-monitor.json
sudo journalctl --since "2025-11-03 12:19:37" --until "2025-11-03 12:19:40" | grep -E "error|fail|killed|oom|memory" | wc -l

56477

When the kmsg FD is closed, the pod continues to run without indication of an issue. Is there some way we can detect this and potentially restart the container? Or is there a way to mitigate this?

Related issues: #1003 and #1004

Kubernetes version: 1.33
Node Problem Detector version: v1.34.0

Steps to reproduce:
Prereqs:

  • Kubernetes cluster with node-problem-detector daemonset installed

Steps:

  • ssh to one of the instances in the cluster
  • Run the script below
#!/bin/bash
# minimal_fast_flood.sh

COUNT=${1:-100000}
UPDATE_EVERY=5000

[ "$EUID" -ne 0 ] && exec sudo bash "$0" "$@"

echo "Flooding: $COUNT messages | $(date '+%H:%M:%S')"
START=$(date +%s)

exec 3>/dev/kmsg
for ((i=1; i<=COUNT; i++)); do
    echo "task docker:abc123 blocked for more than 60 seconds." >&3
    ((i % UPDATE_EVERY == 0)) && echo "  $i/$COUNT"
done
exec 3>&-

ELAPSED=$(($(date +%s) - START))
echo "Done: ${ELAPSED}s ($(($COUNT / $ELAPSED)) msg/s)"
  • Check the node problem detector pod logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions