node-problem-detector aims to make various node problems visible to the upstream layers in cluster management stack. It is a daemon which runs on each node, detects node problems and reports them to apiserver. node-problem-detector can either run as a DaemonSet or run standalone. Now it is running as a Kubernetes Addon enabled by default in the GCE cluster.
There are tons of node problems that could possibly affect the pods running on the node, such as:
- Infrastructure daemon issues: ntp service down;
- Hardware issues: Bad cpu, memory or disk;
- Kernel issues: Kernel deadlock, corrupted file system;
- Container runtime issues: Unresponsive runtime daemon;
Currently these problems are invisible to the upstream layers in cluster management stack, so Kubernetes will continue scheduling pods to the bad nodes.
To solve this problem, we introduced this new daemon node-problem-detector to collect node problems from various daemons and make them visible to the upstream layers. Once upstream layers have the visibility to those problems, we can discuss the remedy system.
NodeCondition to report problems to
NodeCondition: Permanent problem that makes the node unavailable for pods should be reported as
Event: Temporary problem that has limited impact on pod but is informative should be reported as
A problem daemon is a sub-daemon of node-problem-detector. It monitors a specific kind of node problems and reports them to node-problem-detector.
A problem daemon could be:
- A tiny daemon designed for dedicated usecase of Kubernetes.
- An existing node health monitoring daemon integrated with node-problem-detector.
Currently, a problem daemon is running as a goroutine in the node-problem-detector binary. In the future, we'll separate node-problem-detector and problem daemons into different containers, and compose them with pod specification.
Each category of problem daemon can be disabled at compilation time by setting corresponding build tags. If they are disabled at compilation time, then all their build dependencies, global variables and background goroutines will be trimmed out of the compiled executable.
List of supported problem daemons:
|Problem Daemon||NodeCondition||Description||Disabling Build Tag|
|KernelMonitor||KernelDeadlock||A system log monitor monitors kernel log and reports problems and metrics according to predefined rules.||disable_system_log_monitor|
|AbrtAdaptor||None||Monitor ABRT log messages and report them further. ABRT (Automatic Bug Report Tool) is health monitoring daemon able to catch kernel problems as well as application crashes of various kinds occurred on the host. For more information visit the link.||disable_system_log_monitor|
|CustomPluginMonitor||On-demand(According to users configuration)||A custom plugin monitor for node-problem-detector to invoke and check various node problems with user defined check scripts. See proposal here.||disable_custom_plugin_monitor|
|SystemStatsMonitor||None(Could be added in the future)||A system stats monitor for node-problem-detector to collect various health-related system stats as metrics. See proposal here.||disable_system_stats_monitor|
An exporter is a component of node-problem-detector. It reports node problems and/or metrics to certain back end. Some of them can be disable at compile time using a build tag. List of supported exporters:
|Exporter||Description||Disabling Build Tag|
|Kubernetes exporter||Kubernetes exporter reports node problems to Kubernetes API server: temporary problems get reported as Events, and permanent problems get reported as Node Conditions.|
|Prometheus exporter||Prometheus exporter reports node problems and metrics locally as Prometheus metrics|
|Stackdriver exporter||Stackdriver exporter reports node problems and metrics to Stackdriver Monitoring API.||disable_stackdriver_exporter|
--version: Print current version of node-problem-detector.
--hostname-override: A customized node name used for node-problem-detector to update conditions and emit events. node-problem-detector gets node name first from
NODE_NAMEenvironment variable and finally fall back to
For System Log Monitor
--config.system-log-monitor: List of paths to system log monitor configuration files, comma separated, e.g. config/kernel-monitor.json. Node problem detector will start a separate log monitor for each configuration. You can use different log monitors to monitor different system log.
For System Stats Monitor
--config.system-stats-monitor: List of paths to system stats monitor config files, comma separated, e.g. config/system-stats-monitor.json. Node problem detector will start a separate system stats monitor for each configuration. You can use different system stats monitors to monitor different problem-related system stats.
For Custom Plugin Monitor
--config.custom-plugin-monitor: List of paths to custom plugin monitor config files, comma separated, e.g. config/custom-plugin-monitor.json. Node problem detector will start a separate custom plugin monitor for each configuration. You can use different custom plugin monitors to monitor different node problems.
For Kubernetes exporter
--enable-k8s-exporter: Enables reporting to Kubernetes API server, default to
--apiserver-override: A URI parameter used to customize how node-problem-detector connects the apiserver. This is ignored if
false. The format is same as the
sourceflag of Heapster. For example, to run without auth, use the following config:
Refer heapster docs for a complete list of available options.
--address: The address to bind the node problem detector server.
--port: The port to bind the node problem detector server. Use 0 to disable.
For Prometheus exporter
--prometheus-address: The address to bind the Prometheus scrape endpoint, default to
--prometheus-port: The port to bind the Prometheus scrape endpoint, default to 20257. Use 0 to disable.
For Stackdriver exporter
--exporter.stackdriver: Path to a Stackdriver exporter config file, e.g. config/exporter/stackdriver-exporter.json, default to empty string. Set to empty string to disable.
--system-log-monitors: List of paths to system log monitor config files, comma separated. This option is deprecated, replaced by
--config.system-log-monitor, and will be removed. NPD will panic if both
--custom-plugin-monitors: List of paths to custom plugin monitor config files, comma separated. This option is deprecated, replaced by
--config.custom-plugin-monitor, and will be removed. NPD will panic if both
git clonenode-problem-detector repo into
$GOROOT/src/k8s.iowith one of the below directions:
cd $GOPATH/src/k8s.io && git clone firstname.lastname@example.org:kubernetes/node-problem-detector.git
cd $GOPATH/src/k8s.io && go get k8s.io/node-problem-detector
makein the top directory. It will:
- Build the binary.
- Build the docker image. The binary and
config/are copied into the docker image.
If you do not need certain categories of problem daemons, you could choose to disable them at compilation time. This is the
best way of keeping your node-problem-detector runtime compact without unnecessary code (e.g. global
variables, goroutines, etc). You can do so via setting the
BUILD_TAGS environment variable
make. For example:
BUILD_TAGS="disable_custom_plugin_monitor disable_system_stats_monitor" make
Above command will compile the node-problem-detector without Custom Plugin Monitor and System Stats Monitor. Check out the Problem Daemon section to see how to disable each problem daemon during compilation time.
By default node-problem-detector will be built with systemd support with
make command. This requires systemd develop files.
You should download the systemd develop files first. For Ubuntu,
libsystemd-journal-dev package should
be installed. For Debian,
libsystemd-dev package should be installed.
make push uploads the docker image to registry. By default, the image will be uploaded to
staging-k8s.gcr.io. It's easy to modify the
Makefile to push the image
to another registry.
helm install stable/node-problem-detector
Or alternatively, to install node-problem-detector manually:
Edit node-problem-detector.yaml to fit your environment. Set
logvolume to your system log directory (used by SystemLogMonitor). You can use a ConfigMap to overwrite the
configdirectory inside the pod.
Edit node-problem-detector-config.yaml to configure node-problem-detector.
Create the ConfigMap with
kubectl create -f node-problem-detector-config.yaml.
Create the DaemonSet with
kubectl create -f node-problem-detector.yaml.
To run node-problem-detector standalone, you should set
teach node-problem-detector how to access apiserver with
To run node-problem-detector standalone with an insecure apiserver connection:
For more scenarios, see here
Try It Out
You can try node-problem-detector in a running cluster by injecting messages to the logs that node-problem-detector is watching. For example, Let's assume node-problem-detector is using KernelMonitor. On your workstation, run
kubectl get events -w. On the node, run
sudo sh -c "echo 'kernel: BUG: unable to handle kernel NULL pointer dereference at TESTING' >> /dev/kmsg". Then you should see the
When adding new rules or developing node-problem-detector, it is probably easier to test it on the local workstation in the standalone mode. For the API server, an easy way is to use
kubectl proxy to make a running cluster's API server available locally. You will get some errors because your local workstation is not recognized by the API server. But you should still be able to test your new rules regardless.
For example, to test KernelMonitor rules:
make(build node-problem-detector locally)
kubectl proxy --port=8080(make a running cluster's API server available locally)
- Update KernelMonitor's
logPathto your local kernel log directory. For example, on some Linux systems, it is
./bin/node-problem-detector --logtostderr --apiserver-override=http://127.0.0.1:8080?inClusterConfig=false --config.system-log-monitor=config/kernel-monitor.json --config.system-stats-monitor=config/system-stats-monitor.json --port=20256 --prometheus-port=20257(or point to any API server address:port and Prometheus port)
sudo sh -c "echo 'kernel: BUG: unable to handle kernel NULL pointer dereference at TESTING' >> /dev/kmsg"
- You can see
KernelOopsevent in the node-problem-detector log.
sudo sh -c "echo 'kernel: INFO: task docker:20744 blocked for more than 120 seconds.' >> /dev/kmsg"
- You can see
DockerHungevent and condition in the node-problem-detector log.
- You can see
DockerHungcondition at http://127.0.0.1:20256/conditions.
- You can see disk related system metrics in Prometheus format at http://127.0.0.1:20257/metrics.
- You can see more rule examples under test/kernel_log_generator/problems.
- For KernelMonitor message injection, all messages should have
kernel:prefix (also note there is a space after
:); or use generator.sh.
- To inject other logs into journald like systemd logs, use
echo 'Some systemd message' | systemd-cat -t systemd.
node-problem-detector uses go modules
to manage dependencies. Therefore, building node-problem-detector requires
golang 1.11+. It still uses vendoring. See the
Kubernetes go modules KEP
for the design decisions. To add a new dependency, update go.mod and
GO111MODULE=on go mod vendor.
A remedy system is a process or processes designed to attempt to remedy problems detected by the node-problem-detector. Remedy systems observe events and/or node conditions emitted by the node-problem-detector and take action to return the Kubernetes cluster to a healthy state. The following remedy systems exist:
- Draino automatically drains Kubernetes nodes based on labels and node conditions. Nodes that match all of the supplied labels and any of the supplied node conditions will be prevented from accepting new pods (aka 'cordoned') immediately, and drained after a configurable time. Draino can be used in conjunction with the Cluster Autoscaler to automatically terminate drained nodes. Refer to this issue for an example production use case for Draino.
- Descheduler strategy RemovePodsViolatingNodeTaints evicts pods violating NoSchedule taints on nodes. The k8s scheduler's TaintNodesByCondition feature must be enabled. The Cluster Autoscaler can be used to automatically terminate drained nodes.
CI test results can be found at below:
Unit test is ran via
See NPD e2e test documentation for how to setup and run NPD e2e tests.
Problem maker is a program used in NPD e2e tests to generate/simulate node problems. It is ONLY intended to be used by NPD e2e tests. Please do NOT run it on your workstation, as it could cause real node problems.