Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

Kube-Proxy endpoint connection refused #16476

Closed
ctyjrsy opened this issue Aug 21, 2019 · 12 comments
Closed

Kube-Proxy endpoint connection refused #16476

ctyjrsy opened this issue Aug 21, 2019 · 12 comments

Comments

@ctyjrsy
Copy link

ctyjrsy commented Aug 21, 2019

Describe the bug
Kube-Proxy endpoints are not being scraped by Prometheus, as a result many default metrics are unavailable. This is kubernets cluster in Digital Ocean.

Version of Helm and Kubernetes:
helm version
Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}

kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Which chart:
stable/prometheus-operator

What happened:

http://10.136.37.130:10249/metrics down endpoint="http-metrics" instance="10.136.37.130:10249" job="kube-proxy" namespace="kube-system" pod="kube-proxy-cv6rf" service="monitoring-prometheu-kube-proxy" 1m7.029s ago 2.565ms Get http://10.136.37.130:10249/metrics: dial tcp 10.136.37.130:10249: connect: connection refused
http://10.136.61.125:10249/metrics down endpoint="http-metrics" instance="10.136.61.125:10249" job="kube-proxy" namespace="kube-system" pod="kube-proxy-g68zj" service="monitoring-prometheu-kube-proxy" 40.156s ago 834.1us Get http://10.136.61.125:10249/metrics: dial tcp 10.136.61.125:10249: connect: connection refused

What you expected to happen:
The same chart

How to reproduce it (as minimally and precisely as possible):
helm install --name monitoring -f prom-operator.yaml stable/prometheus-operator --set kubelet.serviceMonitor.https=true --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false --set kube-proxy.servicemonitor.https=true

Anything else we need to know:

@ctyjrsy
Copy link
Author

ctyjrsy commented Aug 21, 2019

kubectl logs kube-proxy-cv6rf -n kube-system
W0407 15:06:52.310483 1 proxier.go:480] Failed to read file /lib/modules/4.19.0-0.bpo.2-amd64/modules.builtin with error open /lib/modul
es/4.19.0-0.bpo.2-amd64/modules.builtin: no such file or directory. You can ignore this message when kube-proxy is running inside container wi
thout mounting /lib/modules
W0407 15:06:52.330868 1 proxier.go:493] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is
running inside container without mounting /lib/modules
W0407 15:06:52.349989 1 proxier.go:493] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy
is running inside container without mounting /lib/modules
W0407 15:06:52.350973 1 proxier.go:493] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-prox
y is running inside container without mounting /lib/modules
W0407 15:06:52.352054 1 proxier.go:493] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy
is running inside container without mounting /lib/modules
W0407 15:06:52.352898 1 proxier.go:493] Failed to load kernel module nf_conntrack with modprobe. You can ignore this message when kube-p
roxy is running inside container without mounting /lib/modules
I0407 15:06:52.382215 1 server_others.go:148] Using iptables Proxier.
I0407 15:06:52.382338 1 server_others.go:178] Tearing down inactive rules.
I0407 15:06:52.407964 1 server.go:483] Version: v1.13.5
I0407 15:06:52.421782 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0407 15:06:52.421820 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0407 15:06:52.421887 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0407 15:06:52.421915 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0407 15:06:52.422710 1 config.go:202] Starting service config controller
I0407 15:06:52.422728 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I0407 15:06:52.422747 1 config.go:102] Starting endpoints config controller
I0407 15:06:52.422753 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I0407 15:06:52.522902 1 controller_utils.go:1034] Caches are synced for service config controller
I0407 15:06:52.522902 1 controller_utils.go:1034] Caches are synced for endpoints config controller
E0409 19:39:19.307109 1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Service: the server has asked f
or the client to provide credentials (get services)




0626 12:56:07.323006 1 reflector.go:125] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: the server was unable to return a response in the time allotted, but may still be processing the request (get endpoints)
E0626 15:16:15.776676 1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Endpoints: the server has asked for the client to provide credentials (get endpoints)
E0626 15:16:53.444860 1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Service: the server has asked for the client to provide credentials (get services)
E0710 14:06:54.957356 1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Endpoints: the server has asked for the client to provide credentials (get endpoints)
E0727 18:55:05.266861 1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Endpoints: the server has asked for the client to provide credentials (get endpoints)
E0808 03:05:11.502834 1 proxier.go:1335] Failed to execute iptables-restore: exit status 1 (iptables-restore: line 11 failed

@truncj
Copy link

truncj commented Aug 27, 2019

@ctyjrsy have you set the kube-proxy argument for metric-bind-address?

kubernetes/kubernetes#74300

@teochenglim
Copy link

Fixed it with the hint from @truncj
For some reason the config map is change to secure version

$ kubectl edit cm/kube-proxy -n kube-system
## Change from
    metricsBindAddress: 127.0.0.1:10249 ### <--- Too secure
## Change to
    metricsBindAddress: 0.0.0.0:10249
$ kubectl delete pod -l k8s-app=kube-proxy -n kube-system

@ctyjrsy
Copy link
Author

ctyjrsy commented Sep 6, 2019

I do not see any existing metricsBindAddress in my kube-proxy config

kubectl edit cm/kube-proxy -n kube-system

apiVersion: v1
data:
kube-proxy-config: |-
---
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
kubeconfig: "/etc/kubernetes/kubeconfig/kube-proxy.kubeconfig"
clusterCIDR: "10.35.0.0/16"
mode: "iptables"
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"kube-proxy-config":"---\nkind: KubeProxyConfiguration\napiVersion: kubeproxy.config.k8s.io/v1alpha1\nclientConnection:\n kubeconfig: "/etc/kubernetes/kubeconfig/kube-proxy.kubeconfig"\nclusterCIDR: "10.35.0.0/16"\nmode: "iptables""},"kind":"ConfigMap","metadata":{"annotations":{},"name":"kube-proxy","namespace":"kube-system"}}
creationTimestamp: "2019-04-07T11:05:31Z"
name: kube-proxy
namespace: kube-system
resourceVersion: "137"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: 96323990-5946-11e9-bfc2-26253d84

If i introduce the bind address manually in the CM, and after the kube-system pods are restarted, the metrics start getting collected from kube-proxy.

Thank you @truncj and @teochenglim

@ctyjrsy ctyjrsy closed this as completed Sep 6, 2019
@txynidakis
Copy link

For the instance I was working on the config map was kube-proxy-config

@demisx
Copy link

demisx commented Nov 6, 2019

I don't even have kube-proxy[-config] configmap in my k8s v1.14.6 created by kops. Not sure even where to set this metric-bind-address setting.

@teochenglim
Copy link

@demisx if you can give me your settings with this 2 lines. I would able to help you

kubectl get pod -n kube-system # this shall output your kube-proxy pod name..
kubectl get pod/kube-proxy-jpgsr -n kube-system -o yaml # pod name kube-proxy-jpgsr from last command

example of mine.. i know why it is in config map..

$ kubectl get pod -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
.
.
kube-proxy-jpgsr                          1/1     Running   0          16d # this is what i choose
kube-proxy-rprpg                          1/1     Running   0          16d # you can use this too..
.
.

$ kubectl get pod/kube-proxy-jpgsr -n kube-system -o yaml
  containers:
  - command:
    - kube-proxy
    - --v=2
    - --config=/var/lib/kube-proxy-config/config # oh... so startup config is here
.
.
    - mountPath: /var/lib/kube-proxy-config/ # it is kinda of a config mounting? But where?
      name: config 
.
.
  volumes:
.
.
  - configMap:
      defaultMode: 420
      name: kube-proxy-config # so it is in configmap..

@demisx
Copy link

demisx commented Nov 9, 2019

@teochenglim Thank you so much for your response. Here is the output that you've requested:

$ kubectl get pod -n kube-system
-------
kube-proxy-ip-172-20-121-121.us-west-2.compute.internal               1/1     Running   0          6d13h
kube-proxy-ip-172-20-41-132.us-west-2.compute.internal                1/1     Running   0          6d13h
kube-proxy-ip-172-20-44-4.us-west-2.compute.internal                  1/1     Running   0          6d13h
kube-proxy-ip-172-20-66-190.us-west-2.compute.internal                1/1     Running   0          6d13h
kube-proxy-ip-172-20-78-191.us-west-2.compute.internal                1/1     Running   0          6d13h
kube-proxy-ip-172-20-98-153.us-west-2.compute.internal                1/1     Running   0          6d13h
$ kubectl -n kube-system get pod/kube-proxy-ip-172-20-121-121.us-west-2.compute.internal -o yaml
------
spec:
  containers:
  - command:
    - /bin/sh
    - -c
    - mkfifo /tmp/pipe; (tee -a /var/log/kube-proxy.log < /tmp/pipe & ) ; exec /usr/local/bin/kube-proxy
      --cluster-cidr=100.96.0.0/11 --conntrack-max-per-core=131072 --hostname-override=ip-172-20-121-121.us-west-2.compute.internal
      --kubeconfig=/var/lib/kube-proxy/kubeconfig --master=https://api.internal.prod.dimaslist.org
      --oom-score-adj=-998 --resource-container="" --v=2 > /tmp/pipe 2>&1
    image: k8s.gcr.io/kube-proxy:v1.14.6
    imagePullPolicy: IfNotPresent
    name: kube-proxy
    resources:
      requests:
        cpu: 100m
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/kube-proxy/kubeconfig
      name: kubeconfig
      readOnly: true
    - mountPath: /var/log/kube-proxy.log
      name: logfile
    - mountPath: /lib/modules
      name: modules
      readOnly: true
    - mountPath: /etc/ssl/certs
      name: ssl-certs-hosts
      readOnly: true
    - mountPath: /run/xtables.lock
      name: iptableslock
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  nodeName: ip-172-20-121-121.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  terminationGracePeriodSeconds: 30
  tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoExecute
    operator: Exists
  volumes:
  - hostPath:
      path: /var/lib/kube-proxy/kubeconfig
      type: ""
    name: kubeconfig
  - hostPath:
      path: /var/log/kube-proxy.log
      type: ""
    name: logfile
  - hostPath:
      path: /lib/modules
      type: ""
    name: modules
  - hostPath:
      path: /usr/share/ca-certificates
      type: ""
    name: ssl-certs-hosts
  - hostPath:
      path: /run/xtables.lock
      type: FileOrCreate
    name: iptableslock
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-11-02T13:41:26Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-11-02T13:41:32Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-11-02T13:41:32Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-11-02T13:41:26Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://076e5703ed1e273cb27251f1b217ebed329184551360af47e940fbeb1b4daa4e
    image: k8s.gcr.io/kube-proxy:v1.14.6
    imageID: docker-pullable://k8s.gcr.io/kube-proxy@sha256:e5c364dc75d816132bebf2d84b35518f0661fdeae39c686d92f9e5f9a07e96b9
    lastState: {}
    name: kube-proxy
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2019-11-02T13:41:32Z"
  hostIP: 172.20.121.121
  phase: Running
  podIP: 172.20.121.121
  qosClass: Burstable
  startTime: "2019-11-02T13:41:26Z"

@teochenglim
Copy link

teochenglim commented Nov 11, 2019

Hi @demisx ,

Sure,

I did a quick investigate and the kube-proxy metrics default listening to 127.0.0.1:10249
You can ssh to you kops cluster, any node will do, sudo to root user, "netstat -ntlp" and you shall able to see it.
You can also "curl http://127.0.0.1:10249/metrics" to see the metric the kube-proxy exposed.

!!!!! To update it will/may cause down time, please I shall bare no responsibility if this is your production system.!!!!

Yes. From the result you gave, it is reside on the hostPath, however if you ssh to the kops ckuster "nodes".. it is not the information like eks cluster.

  volumes:
  - hostPath:
      path: /var/lib/kube-proxy/kubeconfig
      type: ""

To patch it, i use

$ export NAME=[Your cluster name]
$ kops edit cluster $NAME

insert this (How I know? kubernetes/kops#6472) and then run "kops update cluster $NAME --yes --state=s3://[your s3 store]" once

spec:
.
.
  kubeProxy:
    metricsBindAddress: 0.0.0.0

Then this part is I don't know how to do it better. Basically i tried both "cluster update" and "cluster rolling-update".. both doesn't trigger node update. Since I am on AWS, I go to EC2-Auto-Scaling-Groups and down the node by set minimum and desired node to be 0. Wait EC2 node to be shutting down, trigger the node back to your desire pod count again.

!!!!! If you want zero down time, also must depends on your application is able to survive from kubernetes node taint, please node taint it after grown. https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ You can do the reverse that grow the autoscaling group, taint it and then reduce it. Best to schedule down time to do this.

Both doesn't work for me

kops update cluster $NAME --yes --state=s3://[your s3 store]
kops rolling-update cluster

After scale down and up. I ssh to the kops nodes again
This time i do "netstat -ntlp" with root user and indeed metrics pod is listening at 0.0.0.0:10249

Cheers.

@demisx
Copy link

demisx commented Nov 18, 2019

@teochenglim Thank you for the detail explanation. This is pretty much what I did as well. I've also edited the cluster yaml file to add the lines below, so next time I create a cluster from scratch I won't have to edit it manually:

spec:
  kubeProxy:
    metricsBindAddress: 0.0.0.0

@teochenglim
Copy link

Good that I know it works for you. cheers and have a nice day.

@mahaffey
Copy link

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants