Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does this tool work outside Google? #21

Closed
sols1 opened this issue Jun 23, 2016 · 22 comments
Closed

does this tool work outside Google? #21

sols1 opened this issue Jun 23, 2016 · 22 comments
Labels

Comments

@sols1
Copy link

sols1 commented Jun 23, 2016

Panic crash:

kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                                READY     STATUS             RESTARTS   AGE       NODE
default       node-problem-detector-0kgkw         0/1       CrashLoopBackOff   3          1m        192.168.78.15
default       node-problem-detector-ar3tk         0/1       CrashLoopBackOff   3          1m        192.168.78.16

kubectl logs node-problem-detector-0kgkw
I0623 01:02:18.560287       1 kernel_monitor.go:86] Finish parsing log file: {WatcherConfig:{KernelLogPath:/log/kern.log} BufferSize:10 Source:kernel-monitor DefaultConditions:[{Type:KernelDeadlock Status:false Transition:0001-01-01 00:00:00 +0000 UTC Reason:KernelHasNoDeadlock Message:kernel has no deadlock}] Rules:[{Type:temporary Condition: Reason:OOMKilling Pattern:Kill process \d+ (.+) score \d+ or sacrifice child\nKilled process \d+ (.+) total-vm:\d+kB, anon-rss:\d+kB, file-rss:\d+kB} {Type:temporary Condition: Reason:TaskHung Pattern:task \S+:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:AUFSUmountHung Pattern:task umount\.aufs:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:DockerHung Pattern:task docker:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:UnregisterNetDeviceIssue Pattern:unregister_netdevice: waiting for \w+ to become free. Usage count = \d+}]}
I0623 01:02:18.560413       1 kernel_monitor.go:93] Got system boot time: 2016-06-17 17:51:02.560408109 +0000 UTC
panic: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

goroutine 1 [running]:
panic(0x15fc280, 0xc8204fc000)
    /usr/local/go/src/runtime/panic.go:464 +0x3e6
k8s.io/node-problem-detector/pkg/problemclient.NewClientOrDie(0x0, 0x0)
    /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/pkg/problemclient/problem_client.go:56 +0x132
k8s.io/node-problem-detector/pkg/problemdetector.NewProblemDetector(0x7faa4f155140, 0xc8202a6900, 0x0, 0x0)
    /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/pkg/problemdetector/problem_detector.go:45 +0x36
main.main()
    /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/node_problem_detector.go:33 +0x56
@Random-Liu
Copy link
Member

Random-Liu commented Jun 23, 2016

@sols1 It should work.
/var/run/secrets/kubernetes.io/serviceaccount/token is the token file used by the api client. It should be set by default in each pod.

I started a random pod, and then exec into the pod:

/ # cat /var/run/secrets/kubernetes.io/serviceaccount/token 
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3Nlcn...

Which K8s version are you using?
Can you do the same to make sure there is this token file in your pod? :)

@sols1
Copy link
Author

sols1 commented Jun 23, 2016

When I try to do the same thing on my k8s cluster I get different result:

kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                                READY     STATUS    RESTARTS   AGE       NODE
default       collectd-0e6qk                      1/1       Running   0          3h        192.168.78.15
default       collectd-bal2i                      1/1       Running   0          3h        192.168.78.16
default       graphite-5rr7m                      1/1       Running   0          4h        192.168.78.16
default       ha-service-loadbalancer-egvmk       1/1       Running   0          4h        192.168.78.15
default       ha-service-loadbalancer-p6ikc       1/1       Running   0          4h        192.168.78.16
kube-system   kube-dns-v11-b8grr                  4/4       Running   0          4h        192.168.78.16
kube-system   kube-registry-v0-5fly7              1/1       Running   0          4h        192.168.78.16
kube-system   kubernetes-dashboard-v1.0.0-t458m   1/1       Running   0          4h        192.168.78.16

kubectl exec collectd-0e6qk cat /var/run/secrets/kubernetes.io/serviceaccount/token
cat: /var/run/secrets/kubernetes.io/serviceaccount/token: No such file or directory
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1

As you can see many pods are running but /var/run/secrets/kubernetes.io/serviceaccount/token does not exist.

@Random-Liu
Copy link
Member

Random-Liu commented Jun 23, 2016

  1. Can you try to start the pod in kube-system namespace to see whether that will make any difference?
  2. Which version of kubernetes are you using?

I'll make sure whether the token file is set by default. At least according to the document, for pod in kube-system namespace, the file should be there. http://kubernetes.io/docs/user-guide/accessing-the-cluster/#accessing-the-api-from-a-pod

@sols1
Copy link
Author

sols1 commented Jun 23, 2016

As you can see from above I am running several pods in kube-system namespace and those don't have this token file either:

kubectl exec kube-dns-v11-b8grr --namespace=kube-system cat /var/run/secrets/kubernetes.io/serviceaccount/token
cat: can't open '/var/run/secrets/kubernetes.io/serviceaccount/token': No such file or directory
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1

@sols1
Copy link
Author

sols1 commented Jun 23, 2016

kubernetes version ~1.3

@sols1
Copy link
Author

sols1 commented Jun 23, 2016

The thing that might be unusual about this setup is that kubernetes runs on top of RancherOS, which is containerized.

But when I am trying to access this file inside host namespace I get the same result:

docker run -v /:/rootfs --net=host --pid=host --privileged phusion/baseimage cat /rootfs/var/run/secrets/kubernetes.io/serviceaccount/token
cat: /rootfs/var/run/secrets/kubernetes.io/serviceaccount/token: No such file or directory

Or inside kubelet container:

docker exec kubelet cat /var/run/secrets/kubernetes.io/serviceaccount/token
cat: /var/run/secrets/kubernetes.io/serviceaccount/token: No such file or directory

Or kube-proxy:

docker exec kube-proxy cat /var/run/secrets/kubernetes.io/serviceaccount/token
cat: /var/run/secrets/kubernetes.io/serviceaccount/token: No such file or directory

@Random-Liu
Copy link
Member

Random-Liu commented Jun 23, 2016

  1. Can you check whether you have default service account? It should be created by default.
$ kubectl get serviceaccount
NAME      SECRETS   AGE
default   1         2d
  1. If you didn't see it. I guess you didn't start your apiserver with admission-control:
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,ResourceQuota

I believe you need to enable the ServiceAccount in the admission control.

The thing that might be unusual about this setup is that kubernetes runs on top of RancherOS, which is containerized.

That may be the reason, it looks like RancherOS didn't enable the service account control.

@sols1
Copy link
Author

sols1 commented Jun 23, 2016

Yes, serviceaccount does not show any secrets:

kubectl get serviceaccount
NAME      SECRETS   AGE
default   0         5d

And yes, there is no --admission-control option in apiserver.

But nothing says it is a required option, particularly for bare metal cluster.

This brings back the original question: is this tool supposed to work for any k8s cluster?

@Random-Liu
Copy link
Member

@sols1 It is supposed to be.

If there is no admission control, and the pod could talk with apiserver without auth, then it should still work. I'll see whether we could fix it.

@Random-Liu
Copy link
Member

@sols1 Please see kubernetes/kubernetes#27973 (comment). :)

@sols1
Copy link
Author

sols1 commented Jun 25, 2016

Even after this #kubernetes/kubernetes#27973 (comment) I got:

kubectl logs node-problem-detector-bcrpv
I0625 00:35:56.475486       1 kernel_monitor.go:86] Finish parsing log file: {WatcherConfig:{KernelLogPath:/log/kern.log} BufferSize:10 Source:kernel-monitor DefaultConditions:[{Type:KernelDeadlock Status:false Transition:0001-01-01 00:00:00 +0000 UTC Reason:KernelHasNoDeadlock Message:kernel has no deadlock}] Rules:[{Type:temporary Condition: Reason:OOMKilling Pattern:Kill process \d+ (.+) score \d+ or sacrifice child\nKilled process \d+ (.+) total-vm:\d+kB, anon-rss:\d+kB, file-rss:\d+kB} {Type:temporary Condition: Reason:TaskHung Pattern:task \S+:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:AUFSUmountHung Pattern:task umount\.aufs:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:DockerHung Pattern:task docker:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:UnregisterNetDeviceIssue Pattern:unregister_netdevice: waiting for \w+ to become free. Usage count = \d+}]}
I0625 00:35:56.475661       1 kernel_monitor.go:93] Got system boot time: 2016-06-17 17:51:02.475654619 +0000 UTC
I0625 00:35:56.476730       1 kernel_monitor.go:102] Start kernel monitor
I0625 00:35:56.476768       1 kernel_log_watcher.go:87] kernel log "/log/kern.log" is not found, kernel monitor doesn't support the os distort
I0625 00:35:56.476782       1 problem_detector.go:60] Problem detector started
E0625 00:35:57.536246       1 manager.go:130] failed to update node conditions: the server does not allow this method on the requested resource
E0625 00:35:58.478941       1 manager.go:130] failed to update node conditions: the server does not allow this method on the requested resource
E0625 00:35:59.478941       1 manager.go:130] failed to update node conditions: the server does not allow this method on the requested resource

So, which version of OS is supported?

Also, is there another option necessary to change the server does not allow this method on the requested resource?

@Random-Liu
Copy link
Member

@sols1 This is because node-problem-detector uses Patch method to update node status. (See #9)

The apiserver before kubernetes/kubernetes#26381 didn't allow patch operation on node status. I believe that your kubernetes version doesn't contain kubernetes/kubernetes#26381 yet.

@Random-Liu
Copy link
Member

Random-Liu commented Jun 25, 2016

@sols1 And from the log, it seems that the kernel log is not at /var/log/kern.log on your host. What is your os distro?

@ApsOps
Copy link
Contributor

ApsOps commented Jul 11, 2016

@Random-Liu Hi. I'm facing the same issue. The thing is, I have been running my cluster without ServiceAccount admission control plugin since I'm running cluster without auth.

I was facing a similar situation when trying to run heapser, but it has an option inClusterConfig=false to allow running without service account tokens.

Refer: https://github.com/kubernetes/heapster/blob/cd2301d0bca468dff796a9d26bc093efdcc1be2d/docs/source-configuration.md#kubernetes

@Random-Liu
Copy link
Member

@ApsOps That's quite useful. I'll also add a inscure option in node problem detector.

@andyxning
Copy link
Member

andyxning commented Dec 13, 2016

@Random-Liu Maybe we can refer to Heapster for it's convenience to deploy both in Kubernetes cluster and stand alone.

@andyxning
Copy link
Member

We may close this once #49 is merged. @sols1 @ApsOps @Random-Liu

@Random-Liu
Copy link
Member

Close this one since #49 is merged. Please take look at the document about how to use the newly introduced --apiserver-override flag https://github.com/kubernetes/node-problem-detector#flags.

@ApsOps
Copy link
Contributor

ApsOps commented Jan 10, 2017

@Random-Liu I'm unable to build (I've seen the open issue regarding journald usage). I also tried using image that @shyamjvs pushed (tagged v0.3), but it fails with

panic: strconv.ParseBool: parsing "": invalid syntax

goroutine 1 [running]:
panic(0x18330c0, 0xc42000cd80)
        /usr/local/google/home/shyamjvs/.gvm/gos/go1.7/src/runtime/panic.go:500 +0x1a1
k8s.io/node-problem-detector/pkg/problemclient.NewClientOrDie(0x7ffed64509e3, 0x55, 0xd, 0xc4204d4300)
        /usr/local/google/home/shyamjvs/go/src/k8s.io/node-problem-detector/pkg/problemclient/problem_client.go:62 +0x3f1
k8s.io/node-problem-detector/pkg/problemdetector.NewProblemDetector(0x2684c80, 0xc4204ea0d0, 0x7ffed64509e3, 0x55, 0x32, 0xc4200001a0)
        /usr/local/google/home/shyamjvs/go/src/k8s.io/node-problem-detector/pkg/problemdetector/problem_detector.go:45 +0x3f
main.main()
        /usr/local/google/home/shyamjvs/go/src/k8s.io/node-problem-detector/node_problem_detector.go:45 +0x77

It seems to be missing a few commits or I might be doing something wrong in config. Could you please release a docker image, or suggest something I can try? :)

@Random-Liu
Copy link
Member

Random-Liu commented Jan 10, 2017

I'm unable to build (I've seen the open issue regarding journald usage).

@ApsOps To build, you need to install libsystemd-journal-dev or libsystemd-dev to build it for journald support for the journald support.
After #39 is merged, you can avoid building journald support.

I also tried using image that @shyamjvs pushed (tagged v0.3), but it fails.

The error seems to me that you set inClusterConfig or some other boolean options in --apiserver-override, but not give it a value.

I've tried v0.3 myself and it seems to work for me except a recently introduced issue #61.

Could you please release a docker image?

The issue is tracked here #60.

@ApsOps
Copy link
Contributor

ApsOps commented Jan 10, 2017

Ah, makes sense. I had &auth=&insecure= from README. Removing these params worked!

@Random-Liu
Copy link
Member

Random-Liu commented Jan 10, 2017

OK. Then the README is a bit misleading then.

Could you file an issue or pr for this? Thanks a lot~ :) @ApsOps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants