Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --insecure-connection and --hostname-override flag in node problem detector #24

Closed
wants to merge 1 commit into from

Conversation

Random-Liu
Copy link
Member

@Random-Liu Random-Liu commented Jul 11, 2016

Fix #23.
Fix #21.

This PR added 2 node problem detector flags:

  • --insecure-connection: This flag will let node problem detector skip TLS verification when talking with apiserver.
  • --hostname-override: The user could override the host name with this flag. Notice that if you want to override the hostname, you may have to run node problem detector as static pod on each node, and render the flag accordingly with your deployment tool (or manually) to make sure that every node problem detector is properly configured.

@sols1 @ApsOps can any of you help me verify the --insecure-connection? I don't have a cluster with no admission control in hand. The image version is v0.2.

@sols1 @ApsOps @wangyumi
/cc @dchen1107

@ApsOps
Copy link
Contributor

ApsOps commented Jul 12, 2016

@Random-Liu Not sure if this is related, but I'm getting these logs:

I0712 06:17:53.528150       1 kernel_monitor.go:86] Finish parsing log file: {WatcherConfig:{KernelLogPath:/log/kern.log} BufferSize:10 Source:kernel-monitor DefaultConditions:[{Type:KernelDeadlock Status:false Transition:0001-01-01 00:00:00 +0000 UTC Reason:KernelHasNoDeadlock Message:kernel has no deadlock}] Rules:[{Type:temporary Condition: Reason:OOMKilling Pattern:Kill pro
cess \d+ (.+) score \d+ or sacrifice child\nKilled process \d+ (.+) total-vm:\d+kB, anon-rss:\d+kB, file-rss:\d+kB} {Type:temporary Condition: Reason:TaskHung Pattern:task \S+:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:AUFSUmountHung Pattern:task umount\.aufs:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelD
eadlock Reason:DockerHung Pattern:task docker:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:UnregisterNetDeviceIssue Pattern:unregister_netdevice: waiting for \w+ to become free. Usage count = \d+}]}
I0712 06:17:53.528332       1 kernel_monitor.go:93] Got system boot time: 2016-05-20 09:05:18.528327583 +0000 UTC
I0712 06:17:53.529605       1 kernel_monitor.go:102] Start kernel monitor
I0712 06:17:53.675293       1 kernel_log_watcher.go:173] unable to parse line: "", can't find timestamp prefix "kernel: [" in line ""
I0712 06:17:53.675427       1 kernel_log_watcher.go:110] Start watching kernel log
I0712 06:17:53.675494       1 problem_detector.go:60] Problem detector started
2016/07/12 06:17:53 Seeked /log/kern.log - &{Offset:0 Whence:0}
E0712 06:17:54.830063       1 manager.go:130] failed to update node conditions: the server has asked for the client to provide credentials (patch nodes ip-10-0-61-117)
E0712 06:17:55.665537       1 manager.go:130] failed to update node conditions: the server has asked for the client to provide credentials (patch nodes ip-10-0-61-117)
E0712 06:17:56.561070       1 manager.go:130] failed to update node conditions: the server has asked for the client to provide credentials (patch nodes ip-10-0-61-117)

And kern.log looks like this:

Jul 12 06:23:03 ip-10-0-61-117 kernel: [4569464.801861] type=1400 audit(1468304583.897:408220): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30713 comm="ps" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jul 12 06:23:03 ip-10-0-61-117 kernel: [4569464.801921] type=1400 audit(1468304583.897:408221): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30713 comm="ps" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jul 12 06:23:07 ip-10-0-61-117 kernel: [4569468.210181] type=1400 audit(1468304587.305:408222): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30725 comm="gohai" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jul 12 06:23:07 ip-10-0-61-117 kernel: [4569468.211038] type=1400 audit(1468304587.305:408223): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30725 comm="gohai" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jul 12 06:23:07 ip-10-0-61-117 kernel: [4569468.211303] type=1400 audit(1468304587.305:408224): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30725 comm="gohai" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jul 12 06:23:07 ip-10-0-61-117 kernel: [4569468.211542] type=1400 audit(1468304587.305:408225): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30725 comm="gohai" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jul 12 06:23:23 ip-10-0-61-117 kernel: [4569484.656008] type=1400 audit(1468304603.749:408226): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30743 comm="ps" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jul 12 06:23:23 ip-10-0-61-117 kernel: [4569484.656837] type=1400 audit(1468304603.753:408227): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30743 comm="ps" requested_mask="trace" denied_mask="trace" peer="docker-default"

I don't know the endpoint in apiserver to look for reported problems.

@Random-Liu
Copy link
Member Author

Random-Liu commented Jul 12, 2016

@ApsOps Hm, so you set the --insecure-connection=true and get the result?
You said that you are running cluster without auth, but it looks like the apiserver is asking for auth:

failed to update node conditions: the server has asked for the client to provide credentials

Are you able to run heapster with InClusterConfig=false?

@ApsOps
Copy link
Contributor

ApsOps commented Jul 12, 2016

@Random-Liu Yes I'm using --insecure-connection=true flag. I found it weird, I'm indeed running apiserver without auth.

$> ps fax | grep apiserver
24517 ?        Ssl  1020:59  \_ /hyperkube apiserver --admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ResourceQuota --allow-privileged=true --cloud-provider=aws --etcd-servers=http://127.0.0.1:4001 --service-cluster-ip-range=10.0.100.224/27 --token-auth-file=/dev/null --insecure-bind-address=0.0.0.0 --advertise-address=10.0.100.14

$> kubectl get serviceaccounts
NAME      SECRETS   AGE
default   0         111d

$> kubectl describe serviceaccounts
Name:           default
Namespace:      default
Labels:         <none>

Image pull secrets:     <none>
Mountable secrets:      <none>
Tokens:                 <none>

I can chat in Kubernetes slack channel if you want me to try any more things :)

@dchen1107
Copy link
Member

I am closing this pr. There will be a separate pr to solve hostname-override issue.

@dchen1107 dchen1107 closed this Aug 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants