-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Now i am running daemonset in all node, but how do i verify it is useful? #106
Comments
First thing you may want to check is Currently, the kernel problem detection is purely based on the kernel log, you could inject kernel log to see whether NPD takes action correspondingly, e.g. problems in https://github.com/kubernetes/node-problem-detector/tree/master/test/kernel_log_generator/problems. |
I used kubernetes 1.6 testing and inject kernel log to /var/log/dmesg,but nothing happened。Do i have right? |
@strugglingyouth Did you change the configuration correspondingly? https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-filelog.json You may want to inject log into /var/log/kern.log. Or you may want to change the configuration to point to /var/log/dmesg. |
Or you can also make use of the script https://github.com/kubernetes/node-problem-detector/blob/master/test/kernel_log_generator/generator.sh , |
@Random-Liu I simulate the problems like the following in /var/log/kern.log kernel:[534024.040037] unregister_netdevice: waiting for lo to become free. Usage count = 2 I am not sure whether the log is OKto validate the NPD? the format of the log is right? any specific style? need some time stamp? After manually modify the kern.log, A: I tried to run the command "kubectl get events | grep Node", but no the expected OOM events. B: I ran the command "oc describe node NodeName" to view the Conditions and Events sections, and check whether OOM and unregisterdevice error got caught, but no related Conditions and Events. are the steps correct to check NPD work? Could you please show the detailed steps to verify it is useful? thanks in advance! |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/reopen Then I generated a fake journald log generated:
But I can not see any output for NPD from So is kubectl logs
Thanks a lot. |
@weinliu: you can't re-open an issue/PR unless you authored it or you are assigned to it. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happens to the node wrong after creating daemonset? And set node to unscheduler?
Now i am running daemonset in all node, but how do i verify it is useful。
It is a bit difficult to simulate these problems。
Hardware issues: Bad cpu, memory or disk;
Kernel issues: Kernel deadlock, corrupted file system;
Container runtime issues: Unresponsive runtime daemon;
...
The text was updated successfully, but these errors were encountered: