Skip to content

Monitoring performed by systemd-monitor-counter.json file does not work for type FrequentKubeletRestart. #786

@caique-franca

Description

@caique-franca

I'm testing the node-problem-detector and despite restarting the kubelet service on purpose in one of the nodes, the node-problem-detector doesn't show a problem with FrequentKubeletRestart.

I'm using the helm chart available at https://github.com/deliveryhero/helm-charts/tree/master/stable/node-problem-detector to deploy the node-problem-detector.

Below is my values.yaml file.:

---
settings:
  log_monitors:
    - /config/kernel-monitor.json
    - /config/docker-monitor.json
    - /config/abrt-adaptor.json
    - /config/kernel-monitor-filelog.json
  custom_plugin_monitors:
    - /config/kernel-monitor-counter.json
    - /config/docker-monitor-counter.json
    - /config/systemd-monitor-counter.json
    - /config/health-checker-kubelet.json
    - /custom-config/FrequentKubeletRestartCustom.json
  custom_monitor_definitions:
    FrequentKubeletRestartCustom.json: |
      {
        "plugin": "custom",
        "pluginConfig": {
          "invoke_interval": "5m",
          "timeout": "1m",
          "max_output_length": 80,
          "concurrency": 1
        },
        "source": "systemd-monitor",
        "metricsReporting": true,
        "conditions": [
          {
            "type": "FrequentKubeletRestartCustom",
            "reason": "FrequentKubeletRestartCustom",
            "message": "kubelet is functioning properly"
          }
        ],
        "rules": [
          {
            "type": "permanent",
            "condition": "FrequentKubeletRestartCustom",
            "reason": "FrequentKubeletRestartCustom",
            "path": "/home/kubernetes/bin/log-counter",
            "args": [
              "--journald-source=systemd",
              "--log-path=/var/log/journal",
              "--lookback=20m",
              "--delay=5m",
              "--count=5",
              "--pattern=Started"
            ],
            "timeout": "1m"
          }
        ]
      }
metrics:
  # metrics.enabled -- Expose metrics in Prometheus format with default configuration.
  enabled: true
  serviceMonitor:
    enabled: true
    additionalLabels: {
      release: monitoring
    }
resources:
  requests:
    cpu: 20m
    memory: 20Mi
hostNetwork: true

It's important to mention that the default configuration of the systemd-monitor-counter.json file didn't work, so I tried to create a custom configuration as you can see above with the FrequentKubeletRestartCustom.json file, but I didn't get any results either. I tried modifying the "--pattern" flag in different ways and I didn't succeed.

Below is some information about the test performed on one of the nodes:

  1. I accessed the node via ssh and ran the command repeatedly.
    image

  2. I looked at the log with the command "journalctl -u kubelet -f | grep "Started"" and noticed that the message "Started Kubernetes Kubelet Server." it always appeared in the log when I restarted the kubelet.

image
  1. I ran the command "kubectl describe nodes" to check if the condition types FrequentKubeletRestartCustom or FrequentKubeletRestart changed the status from false to true. But the status always remained false as if the kubelet had not been restarted.
image
  1. the logs also didn't show any information at the time I restarted the kubelet.
    image

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions