node-problem-detector does not work if I change kubelet hostname to node ip. #23

wangyumi · 2016-07-11T07:59:29Z

I am running k8s 1.3.0 and with docker image node-problem-detector:v0.1

I changed kubelet --hostname-override with node ip on ehch minion:

root@SZX1000116607:~# cat /etc/default/kubelet
KUBELET_OPTS='--hostname-override=10.22.109.119 --api-servers=http://10.22.109.119:8080,http://10.22.69.237:8080,http://10.22.117.82:8080 --pod-infra-container-image=xxxxx/kubernetes/pause:latest --cluster-dns=192.168.1.1  --cluster-domain=test1 --low-diskspace-threshold-mb=2048 --cert-dir=/var/run/kubelet --allow-privileged=true'

when I start node-problem-detector as a daemon set, I got a lot of error messages:

2016-07-11T07:49:17.310422203Z I0711 07:49:17.309761       1 kernel_monitor.go:86] Finish parsing log file: {WatcherConfig:{KernelLogPath:/log/kern.log} BufferSize:10 Source:kernel-monitor DefaultConditions:[{Type:KernelDeadlock Status:false Transition:0001-01-01 00:00:00 +0000 UTC Reason:KernelHasNoDeadlock Message:kernel has no deadlock}] Rules:[{Type:temporary Condition: Reason:OOMKilling Pattern:Kill process \d+ (.+) score \d+ or sacrifice child\nKilled process \d+ (.+) total-vm:\d+kB, anon-rss:\d+kB, file-rss:\d+kB} {Type:temporary Condition: Reason:TaskHung Pattern:task \S+:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:AUFSUmountHung Pattern:task umount\.aufs:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:DockerHung Pattern:task docker:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:UnregisterNetDeviceIssue Pattern:unregister_netdevice: waiting for \w+ to become free. Usage count = \d+}]}
2016-07-11T07:49:17.310510490Z I0711 07:49:17.310026       1 kernel_monitor.go:93] Got system boot time: 2016-07-08 01:40:54.310019188 +0000 UTC
2016-07-11T07:49:17.311878174Z I0711 07:49:17.311436       1 kernel_monitor.go:102] Start kernel monitor
2016-07-11T07:49:17.311907663Z I0711 07:49:17.311654       1 kernel_log_watcher.go:173] unable to parse line: "", can't find timestamp prefix "kernel: [" in line ""
2016-07-11T07:49:17.311926515Z I0711 07:49:17.311696       1 kernel_log_watcher.go:110] Start watching kernel log
2016-07-11T07:49:17.311942118Z I0711 07:49:17.311720       1 problem_detector.go:60] Problem detector started
2016-07-11T07:49:17.314020808Z 2016/07/11 07:49:17 Seeked /log/kern.log - &{Offset:0 Whence:0}
2016-07-11T07:49:18.355974460Z E0711 07:49:18.355712       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:19.314395255Z E0711 07:49:19.314110       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:20.314302138Z E0711 07:49:20.313982       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:21.315151897Z E0711 07:49:21.314849       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:22.313977681Z E0711 07:49:22.313834       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:23.313906286Z E0711 07:49:23.313619       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:24.314428608Z E0711 07:49:24.314141       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:25.316626549Z E0711 07:49:25.316326       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:26.314346471Z E0711 07:49:26.314142       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:27.313901894Z E0711 07:49:27.313759       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:28.314356686Z E0711 07:49:28.314198       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:29.314747334Z E0711 07:49:29.314450       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found
2016-07-11T07:49:30.314562756Z E0711 07:49:30.314235       1 manager.go:130] failed to update node conditions: nodes "SZX1000116591" not found

It seems node-problem-detector still use the original os hostname to access node instead of overrided hostname that is registered in etcd.

/wangyumi

The text was updated successfully, but these errors were encountered:

Random-Liu · 2016-07-11T17:48:14Z

@wangyumi Node problem detector also needs a hostname-overwrite flag. I'll add it soon.
Thanks for reporting the issue! :)

It seems node-problem-detector still use the original os hostname to access node instead of overrided hostname that is registered in etcd.

node-problem-detector needs to use the node name to access the resource in etcd to get the node name. There is an chicken-egg issue here.

Random-Liu · 2016-07-11T18:00:18Z

@wangyumi It seems that kube-proxy also has the problem if you don't manually change salt to override the hostname. Does kube-proxy work in your cluster after you override the hostname? :)

KeithTt · 2017-11-02T09:53:51Z

meet the same issue:

Nov  2 05:48:49 uy05-13 kubelet[29322]: E1102 05:48:49.083813   29322 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:390: Failed to list *v1.Node: Get https://192.168.5.42:6443/api/v1/nodes?fieldSelector=metadata.name%3D192.168.5.42&resourceVersion=0: dial tcp 192.168.5.42:6443: getsockopt: connection refused
Nov  2 05:48:49 uy05-13 kubelet[29322]: E1102 05:48:49.281283   29322 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:382: Failed to list *v1.Service: Get https://192.168.5.42:6443/api/v1/services?resourceVersion=0: dial tcp 192.168.5.42:6443: getsockopt: connection refused

I modify the unit file of kubeadm:

Nov  2 05:48:49 uy05-13 kubelet[29322]: E1102 05:48:49.083813   29322 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:390: Failed to list *v1.Node: Get https://192.168.5.42:6443/api/v1/nodes?fieldSelector=metadata.name%3D192.168.5.42&resourceVersion=0: dial tcp 192.168.5.42:6443: getsockopt: connection refused
Nov  2 05:48:49 uy05-13 kubelet[29322]: E1102 05:48:49.281283   29322 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:382: Failed to list *v1.Service: Get https://192.168.5.42:6443/api/v1/services?resourceVersion=0: dial tcp 192.168.5.42:6443: getsockopt: connection refused

but it does not work...

KeithTt · 2017-11-02T10:26:54Z

furthermore, it always dial tcp port 6443, but this port is not listening.

Random-Liu added the enhancement label Jul 11, 2016

This was referenced Jul 11, 2016

Add --insecure-connection and --hostname-override flag in node problem detector #24

Closed

Get node name from downward api #25

Closed

Random-Liu mentioned this issue Aug 21, 2016

NPD: Get node name from the downward api. #30

Merged

dchen1107 closed this as completed in #30 Aug 23, 2016

Random-Liu self-assigned this Sep 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node-problem-detector does not work if I change kubelet hostname to node ip. #23

node-problem-detector does not work if I change kubelet hostname to node ip. #23

wangyumi commented Jul 11, 2016 •

edited

Loading

Random-Liu commented Jul 11, 2016 •

edited

Loading

Random-Liu commented Jul 11, 2016

KeithTt commented Nov 2, 2017

KeithTt commented Nov 2, 2017

node-problem-detector does not work if I change kubelet hostname to node ip. #23

node-problem-detector does not work if I change kubelet hostname to node ip. #23

Comments

wangyumi commented Jul 11, 2016 • edited Loading

Random-Liu commented Jul 11, 2016 • edited Loading

Random-Liu commented Jul 11, 2016

KeithTt commented Nov 2, 2017

KeithTt commented Nov 2, 2017

wangyumi commented Jul 11, 2016 •

edited

Loading

Random-Liu commented Jul 11, 2016 •

edited

Loading