I'm testing the node-problem-detector and despite restarting the kubelet service on purpose in one of the nodes, the node-problem-detector doesn't show a problem with FrequentKubeletRestart.
I'm using the helm chart available at https://github.com/deliveryhero/helm-charts/tree/master/stable/node-problem-detector to deploy the node-problem-detector.
Below is my values.yaml file.:
---
settings:
log_monitors:
- /config/kernel-monitor.json
- /config/docker-monitor.json
- /config/abrt-adaptor.json
- /config/kernel-monitor-filelog.json
custom_plugin_monitors:
- /config/kernel-monitor-counter.json
- /config/docker-monitor-counter.json
- /config/systemd-monitor-counter.json
- /config/health-checker-kubelet.json
- /custom-config/FrequentKubeletRestartCustom.json
custom_monitor_definitions:
FrequentKubeletRestartCustom.json: |
{
"plugin": "custom",
"pluginConfig": {
"invoke_interval": "5m",
"timeout": "1m",
"max_output_length": 80,
"concurrency": 1
},
"source": "systemd-monitor",
"metricsReporting": true,
"conditions": [
{
"type": "FrequentKubeletRestartCustom",
"reason": "FrequentKubeletRestartCustom",
"message": "kubelet is functioning properly"
}
],
"rules": [
{
"type": "permanent",
"condition": "FrequentKubeletRestartCustom",
"reason": "FrequentKubeletRestartCustom",
"path": "/home/kubernetes/bin/log-counter",
"args": [
"--journald-source=systemd",
"--log-path=/var/log/journal",
"--lookback=20m",
"--delay=5m",
"--count=5",
"--pattern=Started"
],
"timeout": "1m"
}
]
}
metrics:
# metrics.enabled -- Expose metrics in Prometheus format with default configuration.
enabled: true
serviceMonitor:
enabled: true
additionalLabels: {
release: monitoring
}
resources:
requests:
cpu: 20m
memory: 20Mi
hostNetwork: true
It's important to mention that the default configuration of the systemd-monitor-counter.json file didn't work, so I tried to create a custom configuration as you can see above with the FrequentKubeletRestartCustom.json file, but I didn't get any results either. I tried modifying the "--pattern" flag in different ways and I didn't succeed.
Below is some information about the test performed on one of the nodes:
-
I accessed the node via ssh and ran the command repeatedly.

-
I looked at the log with the command "journalctl -u kubelet -f | grep "Started"" and noticed that the message "Started Kubernetes Kubelet Server." it always appeared in the log when I restarted the kubelet.
- I ran the command "kubectl describe nodes" to check if the condition types FrequentKubeletRestartCustom or FrequentKubeletRestart changed the status from false to true. But the status always remained false as if the kubelet had not been restarted.
- the logs also didn't show any information at the time I restarted the kubelet.

I'm testing the node-problem-detector and despite restarting the kubelet service on purpose in one of the nodes, the node-problem-detector doesn't show a problem with FrequentKubeletRestart.
I'm using the helm chart available at https://github.com/deliveryhero/helm-charts/tree/master/stable/node-problem-detector to deploy the node-problem-detector.
Below is my values.yaml file.:
It's important to mention that the default configuration of the systemd-monitor-counter.json file didn't work, so I tried to create a custom configuration as you can see above with the FrequentKubeletRestartCustom.json file, but I didn't get any results either. I tried modifying the "--pattern" flag in different ways and I didn't succeed.
Below is some information about the test performed on one of the nodes:
I accessed the node via ssh and ran the command repeatedly.

I looked at the log with the command "journalctl -u kubelet -f | grep "Started"" and noticed that the message "Started Kubernetes Kubelet Server." it always appeared in the log when I restarted the kubelet.