-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
status doesn't change when injecting log messages #114
Comments
@jdfless Please try:
Kernel log generated by kernel is started with |
@Random-Liu: I had tried this previously, and again this morning, unfortunately it has no effect. My exact commands and output:
I can see the heartbeat time continuing to increase but the status never changes. When I exec into the pod, I am not seeing anything from journalctl -k, is that expected? All I see is:
I have verified the /var/log/journal is mounted properly inside /log/journal in the pod. |
So I made some progress when I realized I was mounting /var/log into /log instead of /var/log. This is what the front page of the github docs and also https://kubernetes.io/docs/tasks/debug-application-cluster/monitor-node-health/ have as the mount path so it tripped me up. I now can see:
in the node status, but I didn't expect that the node itself would still be reporting Ready. This seems like an issue to me, should I open a different ticket or is this expected behavior?
I also still cannot see any output from journalctl -k inside the pod, the behavior is still as above, even with the mount location fixed. I want to use the npd to change the status of the node to something other than "Ready" in case of a docker error that we see. I have that error spamming my docker logs right now on that node, but my custom monitor for Docker is not changing. Are custom patterns supported for other journald logs aside from kernel in 0.4? |
No, it's not.
Sorry, actually you should mount I'll try to send a PR to update the document this week.
Can you run Actually I recommend you to run NPD as a systemd service, as what we do in Kubernetes for Google Container-Optimized Image (GCI). If you are using As for running inside daemonset, I think we need a better solution than relying on the systemd library inside the container. Maybe |
Update: by mounting /etc/machine-id into the pod I can now view the journalctl output. Still no luck with my custom trigger however. I start the pod with Here is my custom docker monitor:
And here is the journalctl output from inside the pod (this is looping, intermingled with actual docker logs):
|
My systemd version outside the container is 229, inside is 231. I can definitely try and run as a systemd service.
|
Does kernel log work now? |
It works as described above in that I can see that status change in the json, but the node is still "Ready" which I didn't expect. I assume when one of the conditions is tripped, it is expected that the node itself will no longer report as "Ready", is that correct? |
That is expected behavior. Currently, NPD only updates node conditions it owns, e.g. Currently, scheduler won't directly react on the conditions generated by NPD. Ideally, there should be another cluster level controller analyzes these conditions and make decision, e.g. set node taint, restart VM etc. We don't have a opensource version of this, because the repair logic is very environment dependent, but in GKE, we do have auto-repair which consumes information from NPD. |
Cool. Actually I'm not quite familiar with the underlying mechanism of journald. If |
@Random-Liu: I ask for an example because I was surprised that I needed to echo "kernel: " before the pattern in the kernel logs, I assumed the npd would trigger on anything that matched the pattern and was output by "journalctl -k". Below here is a summary of other things noted/discovered in the course of this ticket:
|
I've just tried.
Will generate a NPD log:
I started NPD inside a container:
Note that the pattern you provide must be able to match to the end of last line (multi-line is supported). |
That is interesting, I was finally able to get this to work (was actually writing something up) but only once I changed the pluginConfig.source to "dockerd"
While simply changing .pluginConfig.source from "docker" to "dockerd" the npd triggers:
I am running the npd standalone with:
I am using docker version 1.12.6 which logs to journalctl as dockerd; I assume that the pluginConfig.source must match exactly to what is logging to journald and not the name of the unit file that is doing the logging. Could you post a snippet of your |
Yeah, that's possible. /cc @ajitak
We use
We need a separate monitor for each |
Great, thanks for all the help on getting this sorted out. I am going to close this issue now. |
I cannot get the node problem detector to change a node status by injecting messages.
I am using kubernetes 1.5.2, Ubuntu 16.04, kernel 4.4.0-51-generic.
I run the npd as a daemonset. I have attempted to get this to work with the npd as version 0.3.0 and 0.4.0. I start the npd with the default command, using /config/kernel-monitor.json because my nodes use journald.
I have /dev/kmsg mounted into the pod, and I echo expressions matching the regexs in the kernel-monitor.json to /dev/kmsg on the node. I can view the fake logs I've echoed to /dev/kmsg in the pod.
Steps to reproduce:
If I am not testing this properly, could you please give a detailed breakdown of how to test the node problem detector is working properly for kernel logs AND docker logs?
I have also reproduced this behavior using a custom docker_monitor.json and having the systemd docker service write to the journald docker logs. I have still been unsuccessful in getting the node status to change.
The text was updated successfully, but these errors were encountered: