Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status doesn't change when injecting log messages #114

Closed
jdfless opened this issue Jun 8, 2017 · 15 comments
Closed

status doesn't change when injecting log messages #114

jdfless opened this issue Jun 8, 2017 · 15 comments

Comments

@jdfless
Copy link

jdfless commented Jun 8, 2017

I cannot get the node problem detector to change a node status by injecting messages.

I am using kubernetes 1.5.2, Ubuntu 16.04, kernel 4.4.0-51-generic.

I run the npd as a daemonset. I have attempted to get this to work with the npd as version 0.3.0 and 0.4.0. I start the npd with the default command, using /config/kernel-monitor.json because my nodes use journald.

I have /dev/kmsg mounted into the pod, and I echo expressions matching the regexs in the kernel-monitor.json to /dev/kmsg on the node. I can view the fake logs I've echoed to /dev/kmsg in the pod.

Steps to reproduce:

# as root on the node where your pod is running
echo "task umount.aufs:123 blocked for more than 120 seconds." >> /dev/kmsg
# I have verified that these logs show up in journalctl -k

# this should match the following permanent condition in /config/kernel-monitor.json
#	{
#		"type": "permanent",
#		"condition": "KernelDeadlock",
#		"reason": "AUFSUmountHung",
#		"pattern": "task umount\\.aufs:\\w+ blocked for more than \\w+ seconds\\."
#	},

# check the node status of the node where you ran this on
kubectl get node <node>
# status will still be Ready

# for further detail examine the json
kubectl get node <node> -o json | jq .status.conditions
# you will see that the KernelDeadlock condition is still "False"

# I would expect the general status to change to "KernelDeadlock"

If I am not testing this properly, could you please give a detailed breakdown of how to test the node problem detector is working properly for kernel logs AND docker logs?

I have also reproduced this behavior using a custom docker_monitor.json and having the systemd docker service write to the journald docker logs. I have still been unsuccessful in getting the node status to change.

@Random-Liu
Copy link
Member

Random-Liu commented Jun 10, 2017

@jdfless Please try:

echo "kernel: task umount.aufs:123 blocked for more than 120 seconds." >> /dev/kmsg

Kernel log generated by kernel is started with kernel.

@jdfless
Copy link
Author

jdfless commented Jun 12, 2017

@Random-Liu: I had tried this previously, and again this morning, unfortunately it has no effect.

My exact commands and output:

17:20 root@kube-minion1:~# echo "kernel: task umount.aufs:123 blocked for more than 120 seconds." >> /dev/kmsg

17:24 $ journalctl -k
-- Logs begin at Fri 2017-06-09 05:04:31 UTC, end at Mon 2017-06-12 17:24:10 UTC. --
Jun 12 17:23:59 kube-minion1 kernel: task umount.aufs:123 blocked for more than 120 seconds.

17:28 $ kubectl.sh get node kube-minion1 -o json | jq -r .status.conditions
...
{
    "lastHeartbeatTime": "2017-06-12T17:28:31Z",
    "lastTransitionTime": "2017-06-08T22:15:45Z",
    "message": "kernel has no deadlock",
    "reason": "KernelHasNoDeadlock",
    "status": "False",
    "type": "KernelDeadlock"
  },

I can see the heartbeat time continuing to increase but the status never changes.

When I exec into the pod, I am not seeing anything from journalctl -k, is that expected? All I see is:

[root@node-problem-detector-20170608-221537-rx7vd /]# journalctl -k
No journal files were found.
-- No entries --
# there appears to be nothing at all in journalctl, despite being a plethora of files in /log/journal
[root@node-problem-detector-20170608-221537-rx7vd /]# journalctl
No journal files were found.
-- No entries --

I have verified the /var/log/journal is mounted properly inside /log/journal in the pod.

@jdfless
Copy link
Author

jdfless commented Jun 12, 2017

So I made some progress when I realized I was mounting /var/log into /log instead of /var/log. This is what the front page of the github docs and also https://kubernetes.io/docs/tasks/debug-application-cluster/monitor-node-health/ have as the mount path so it tripped me up. I now can see:

{
    "lastHeartbeatTime": "2017-06-12T18:00:39Z",
    "lastTransitionTime": "2017-06-12T17:48:21Z",
    "message": "task umount.aufs:123 blocked for more than 120 seconds.",
    "reason": "AUFSUmountHung",
    "status": "True",
    "type": "KernelDeadlock"
  },

in the node status, but I didn't expect that the node itself would still be reporting Ready. This seems like an issue to me, should I open a different ticket or is this expected behavior?

$ kubectl.sh get node kube-minion1
NAME           STATUS    AGE
kube-minion1   Ready     4d
  {
    "lastHeartbeatTime": "2017-06-12T18:01:32Z",
    "lastTransitionTime": "2017-06-08T07:23:47Z",
    "message": "kubelet is posting ready status",
    "reason": "KubeletReady",
    "status": "True",
    "type": "Ready"
  },

I also still cannot see any output from journalctl -k inside the pod, the behavior is still as above, even with the mount location fixed. I want to use the npd to change the status of the node to something other than "Ready" in case of a docker error that we see. I have that error spamming my docker logs right now on that node, but my custom monitor for Docker is not changing. Are custom patterns supported for other journald logs aside from kernel in 0.4?

@Random-Liu
Copy link
Member

Random-Liu commented Jun 12, 2017

When I exec into the pod, I am not seeing anything from journalctl -k, is that expected?

No, it's not.

I have verified the /var/log/journal is mounted properly inside /log/journal in the pod.

Sorry, actually you should mount /var/log to /var/log. The document is out-of-date...Please use the setup in Kubernetes as a reference.

I'll try to send a PR to update the document this week.

however once I fixed this and made sure the rest of the daemonset yaml had no other differences (although I don't use the --logtostderr flag I assume that is harmless), none of the behavior I am seeing is changed.

Can you run systemctl --version and show me the result. I think it's caused by some mismatch between the systemd version inside and outside of the container. Actually, I've just tried and it doesn't even work for our new Google Container-Optimized Image now, which did work before.

Actually I recommend you to run NPD as a systemd service, as what we do in Kubernetes for Google Container-Optimized Image (GCI). If you are using kube-up.sh, you could just mimic what we do to GCI; if not, you need to deal with some token stuff, which I could help you figure out if you need.

As for running inside daemonset, I think we need a better solution than relying on the systemd library inside the container. Maybe chroot is a more generic solution. Will file an issue for this.
#115

@jdfless
Copy link
Author

jdfless commented Jun 12, 2017

Update: by mounting /etc/machine-id into the pod I can now view the journalctl output. Still no luck with my custom trigger however.

I start the pod with /node-problem-detector--system-log-monitors=/config/kernel-monitor.json,/config/custom_docker_monitor.json

Here is my custom docker monitor:

{
    "plugin": "journald",
    "pluginConfig": {
        "source": "docker"
    },
    "logPath": "/var/log/journal",
    "lookback": "5m",
    "bufferSize": 10,
    "source": "docker-monitor",
    "conditions": [
        {
            "type": "DockerHung",
            "reason": "DockerNotHung",
            "message": "docker not detected to be hung"
        }

    ],
    "rules": [
        {
            "type": "permanent",
            "condition": "DockerHung",
            "reason": "ShimError",
            "pattern": "shim error: context deadline exceeded"
        }
    ]
}

And here is the journalctl output from inside the pod (this is looping, intermingled with actual docker logs):

$ journalctl -u docker -f
Jun 12 18:43:28 kube-minion1 error_gen.sh[28216]: ++++++++++++++++++++++++++++++
Jun 12 18:43:28 kube-minion1 error_gen.sh[28216]: TESTING NODE PROBLEM DETECTOR
Jun 12 18:43:28 kube-minion1 error_gen.sh[28216]: shim error: context deadline exceeded
Jun 12 18:43:28 kube-minion1 error_gen.sh[28216]: ++++++++++++++++++++++++++++++

@jdfless
Copy link
Author

jdfless commented Jun 12, 2017

My systemd version outside the container is 229, inside is 231. I can definitely try and run as a systemd service.

# in pod
[root@node-problem-detector-20170612-184039-5lsc6 /]# systemctl --version
systemd 231
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

# on node
$ systemctl --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN

@Random-Liu
Copy link
Member

Random-Liu commented Jun 12, 2017

Still no luck with my custom trigger however.

Does kernel log work now?

@jdfless
Copy link
Author

jdfless commented Jun 12, 2017

It works as described above in that I can see that status change in the json, but the node is still "Ready" which I didn't expect.

I assume when one of the conditions is tripped, it is expected that the node itself will no longer report as "Ready", is that correct?

@Random-Liu
Copy link
Member

It works as described above in that I can see that status change in the json, but the node is still "Ready" which I didn't expect.

That is expected behavior. Currently, NPD only updates node conditions it owns, e.g. KernelDeadlock.
NodeReady is only managed by Kubelet.

Currently, scheduler won't directly react on the conditions generated by NPD. Ideally, there should be another cluster level controller analyzes these conditions and make decision, e.g. set node taint, restart VM etc.

We don't have a opensource version of this, because the repair logic is very environment dependent, but in GKE, we do have auto-repair which consumes information from NPD.

@Random-Liu
Copy link
Member

Update: by mounting /etc/machine-id into the pod I can now view the journalctl output. Still no luck with my custom trigger however.

Cool. Actually I'm not quite familiar with the underlying mechanism of journald. If /etc/machine-id is required, maybe we should document this and update the default yaml.

@jdfless
Copy link
Author

jdfless commented Jun 13, 2017

@Random-Liu:
I wanted to circle back to the main issue of this ticket since it is still unresolved for me. I am outside of GCE and need to have the npd alert on a custom string observed in a journald docker log. Can you confirm that this feature works in GCE and please provide an example for how you verify this?

I ask for an example because I was surprised that I needed to echo "kernel: " before the pattern in the kernel logs, I assumed the npd would trigger on anything that matched the pattern and was output by "journalctl -k".

Below here is a summary of other things noted/discovered in the course of this ticket:

  • journalctl does not work inside the pod unless you mount /etc/machine-id. This is because the pod doesn't have the same machine-id as the host and journald writes entries to files in /var/log/journal/<machine-id>/. More info here. This is helpful, but does not appear to be strictly necessary, at least for kernel logs, although I don't know about other journald unit logs yet.
  • The main page docs need updating for running this as a daemonset to not mount /var/log to /log (you noted this above so assuming no ticket needed for that)
  • In Kubernetes you run the npd as a systemd service, however this is not documented and it may be helpful for others to know that is how you are testing.

@Random-Liu
Copy link
Member

Random-Liu commented Jun 14, 2017

I've just tried.

{
	"plugin": "journald",
		"pluginConfig": {
			"source": "docker"
		},
		"logPath": "/var/log/journal",
		"lookback": "5m",
		"bufferSize": 10,
		"source": "docker-monitor",
		"conditions": [
		{
			"type": "TestCondition",
			"reason": "TestReason",
			"message": "test reason"
		}
		],
		"rules": [
		{
			"type": "permanent",
			"condition": "TestCondition",
			"reason": "TestDockerIssue",
			"pattern": "time=.*level=debug msg=\"Calling GET /v1.23/images/json\""
		}
		]
}

Will generate a TestCondition for time="2017-06-14T21:43:46.395377174Z" level=debug msg="Calling GET /v1.23/images/json"

NPD log:

I0614 21:51:59.378357       1 log_monitor.go:123] New status generated: &{Source:docker-monitor Events:[] Conditions:[{Type:KernelDeadlock Status:true Transition:2017-06-14 21:46:54.146649 +0000 UTC Reason:RandomDockerIssue Message:time="2017-06-14T21:46:54.145774420Z" level=debug msg="Calling GET /v1.23/images/json"}]}

I started NPD inside a container:

docker run -v /var/log:/var/log -v ~/docker-monitor.json:/config/docker-monitor.json -v /etc/machine-id:/etc/machine-id -it --entrypoint=/node-problem-detector gcr.io/google_containers/node-problem-detector:v0.4.0 --alsologtostderr --apiserver-override=https://10.0.0.1?inClusterConfig=false --system-log-monitors=/config/docker-monitor.json

Note that the pattern you provide must be able to match to the end of last line (multi-line is supported).

@jdfless
Copy link
Author

jdfless commented Jun 14, 2017

That is interesting, I was finally able to get this to work (was actually writing something up) but only once I changed the pluginConfig.source to "dockerd"
For instance, the following generates nothing:

{
    "plugin": "journald",
    "pluginConfig": {
        "source": "docker"
    },  
    "logPath": "/var/log/journal",
    "lookback": "5m",
    "bufferSize": 10, 
    "source": "docker-monitor",
    "conditions": [
        {   
            "type": "DockerHung",
            "reason": "DockerHung",
            "message": "docker cannot recover, needs restart"
        }   

    ],  
    "rules": [  
        {   
            "type": "permanent",
            "condition": "DockerHung",
            "reason": "TestTrigger",
            "pattern": "Calling.*"
        }   
    ]   
}

While simply changing .pluginConfig.source from "docker" to "dockerd" the npd triggers:

I0614 21:58:37.904309   14535 log_monitor.go:123] New status generated: &{Source:docker-monitor Events:[] Conditions:[{Type:DockerHung Status:true Transition:2017-06-14 21:53:37.136648 +0000 UTC Reason:TestTrigger Message:time="2017-06-14T21:53:37.136519703Z" level=debug msg="Calling GET /v1.24/containers/json?all=1&limit=0"}]}

I am running the npd standalone with:

KUBERNETES_SERVICE_HOST=127.0.0.1 KUBERNETES_SERVICE_PORT=443 /opt/kubernetes/bin/node-problem-detector --logtostderr --apiserver-override=https://\<APISERVER_IP\>:6443?inClusterConfig=false\&useServiceAccount=true\&auth=/path/to/kubeconfig --system-log-monitors=/npd/kernel-monitor.json,/npd/custom_docker_monitor.json

I am using docker version 1.12.6 which logs to journalctl as dockerd; I assume that the pluginConfig.source must match exactly to what is logging to journald and not the name of the unit file that is doing the logging. Could you post a snippet of your journalctl -u docker? This seems to make it very difficult to detect errors from any forked systemd units the log from multiple process into one file as you would need a monitor per process, not per systemd service.

@Random-Liu
Copy link
Member

Random-Liu commented Jun 14, 2017

I am using docker version 1.12.6 which logs to journalctl as dockerd

Yeah, that's possible. /cc @ajitak

I assume that the pluginConfig.source must match exactly to what is logging to journald and not the name of the unit file that is doing the logging.

We use SD_JOURNAL_FIELD_SYSLOG_IDENTIFIER, see https://github.com/kubernetes/node-problem-detector/blob/master/pkg/systemlogmonitor/logwatchers/journald/log_watcher.go#L174.

Could you post a snippet of your journalctl -u docker?

Jun 14 21:43:48 e2e-test-lantaol-master docker[1307]: time="2017-06-14T21:43:48.497761111Z" level=debug msg="Calling GET /v1.23/containers/json?filters=%7B%22label%22%3A%7B%22io.kubernetes.docker.type%3Dpodsandbox%22%3Atrue%7D%7D&limit=0"
Jun 14 21:43:48 e2e-test-lantaol-master docker[1307]: time="2017-06-14T21:43:48.500387091Z" level=debug msg="Calling GET /v1.23/containers/json?all=1&filters=%7B%22label%22%3A%7B%22io.kubernetes.docker.type%3Dcontainer%22%3Atrue%7D%2C%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0"
Jun 14 21:43:48 e2e-test-lantaol-master docker[1307]: time="2017-06-14T21:43:48.520630256Z" level=debug msg="Calling GET /v1.23/containers/json?filters=%7B%22label%22%3A%7B%22io.kubernetes.docker.type%3Dpodsandbox%22%3Atrue%7D%7D&limit=0"

This seems to make it very difficult to detect errors from any forked systemd units the log from multiple process into one file as you would need a monitor per process, not per systemd service.

We need a separate monitor for each SD_JOURNAL_FIELD_SYSLOG_IDENTIFIER. Since each monitor is just a goroutine, it should be fine. :)

@jdfless
Copy link
Author

jdfless commented Jun 14, 2017

Great, thanks for all the help on getting this sorted out. I am going to close this issue now.

@jdfless jdfless closed this as completed Jun 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants