Kube-Proxy endpoint connection refused #16476

ctyjrsy · 2019-08-21T07:09:39Z

Describe the bug
Kube-Proxy endpoints are not being scraped by Prometheus, as a result many default metrics are unavailable. This is kubernets cluster in Digital Ocean.

Version of Helm and Kubernetes:
helm version
Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}

kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Which chart:
stable/prometheus-operator

What happened:

http://10.136.37.130:10249/metrics	down	endpoint="http-metrics" instance="10.136.37.130:10249" job="kube-proxy" namespace="kube-system" pod="kube-proxy-cv6rf" service="monitoring-prometheu-kube-proxy"	1m7.029s ago	2.565ms	Get http://10.136.37.130:10249/metrics: dial tcp 10.136.37.130:10249: connect: connection refused
http://10.136.61.125:10249/metrics	down	endpoint="http-metrics" instance="10.136.61.125:10249" job="kube-proxy" namespace="kube-system" pod="kube-proxy-g68zj" service="monitoring-prometheu-kube-proxy"	40.156s ago	834.1us	Get http://10.136.61.125:10249/metrics: dial tcp 10.136.61.125:10249: connect: connection refused

What you expected to happen:
The same chart

How to reproduce it (as minimally and precisely as possible):
helm install --name monitoring -f prom-operator.yaml stable/prometheus-operator --set kubelet.serviceMonitor.https=true --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false --set kube-proxy.servicemonitor.https=true

Anything else we need to know:

ctyjrsy · 2019-08-21T07:20:18Z

kubectl logs W0407 15:06:52.310483 es/4.19.0-0.bpo.2-amd thout mounting /lib/modules
W0407 15:06:52.330868 running inside W0407 15:06:52.349989 is running inside W0407 15:06:52.350973 y is running W0407 15:06:52.352054 is running inside W0407 15:06:52.352898 roxy is running I0407 15:06:52.382215 I0407 15:06:52.382338 I0407 15:06:52.407964 I0407 15:06:52.421782 I0407 15:06:52.421820 I0407 15:06:52.421887 I0407 15:06:52.421915 I0407 15:06:52.422710 I0407 15:06:52.422728 I0407 15:06:52.422747 I0407 15:06:52.422753 I0407 15:06:52.522902 I0407 15:06:52.522902 E0409 19:39:19.307109 or the client kube-proxy-cv6rf -n kube-system
1 proxier.go:480] Failed to read file /lib/modules/4.19.0-0.bpo.2-amd64/modules.builtin with error open /lib/modul
64/modules.builtin: no such file or directory. You can ignore this message when kube-proxy is running inside container wi
1 proxier.go:493] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is
container without mounting /lib/modules
1 proxier.go:493] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy
container without mounting /lib/modules
1 proxier.go:493] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-prox
inside container without mounting /lib/modules
1 proxier.go:493] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy
container without mounting /lib/modules
1 proxier.go:493] Failed to load kernel module nf_conntrack with modprobe. You can ignore this message when kube-p
inside container without mounting /lib/modules
1 server_others.go:148] Using iptables Proxier.
1 server_others.go:178] Tearing down inactive rules.
1 server.go:483] Version: v1.13.5
1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
1 conntrack.go:52] Setting nf_conntrack_max to 131072
1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
1 config.go:202] Starting service config controller
1 controller_utils.go:1027] Waiting for caches to sync for service config controller
1 config.go:102] Starting endpoints config controller
1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
1 controller_utils.go:1034] Caches are synced for service config controller
1 controller_utils.go:1034] Caches are synced for endpoints config controller
1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Service: the server has asked f
to provide credentials (get services)

0626 12:56:07.323006 1 reflector.go:125] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: the server was unable to return a response in the time allotted, but may still be processing the request (get endpoints)
E0626 15:16:15.776676 1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Endpoints: the server has asked for the client to provide credentials (get endpoints)
E0626 15:16:53.444860 1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Service: the server has asked for the client to provide credentials (get services)
E0710 14:06:54.957356 1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Endpoints: the server has asked for the client to provide credentials (get endpoints)
E0727 18:55:05.266861 1 reflector.go:237] k8s.io/client-go/informers/factory.go:132: Failed to watch *v1.Endpoints: the server has asked for the client to provide credentials (get endpoints)
E0808 03:05:11.502834 1 proxier.go:1335] Failed to execute iptables-restore: exit status 1 (iptables-restore: line 11 failed

truncj · 2019-08-27T21:10:26Z

@ctyjrsy have you set the kube-proxy argument for metric-bind-address?

kubernetes/kubernetes#74300

teochenglim · 2019-09-06T02:27:06Z

Fixed it with the hint from @truncj
For some reason the config map is change to secure version

$ kubectl edit cm/kube-proxy -n kube-system
## Change from
    metricsBindAddress: 127.0.0.1:10249 ### <--- Too secure
## Change to
    metricsBindAddress: 0.0.0.0:10249
$ kubectl delete pod -l k8s-app=kube-proxy -n kube-system

ctyjrsy · 2019-09-06T06:46:58Z

I do not see any existing metricsBindAddress in my kube-proxy config

kubectl edit cm/kube-proxy -n kube-system

apiVersion: v1
data:
kube-proxy-config: |-
---
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
kubeconfig: "/etc/kubernetes/kubeconfig/kube-proxy.kubeconfig"
clusterCIDR: "10.35.0.0/16"
mode: "iptables"
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"kube-proxy-config":"---\nkind: KubeProxyConfiguration\napiVersion: kubeproxy.config.k8s.io/v1alpha1\nclientConnection:\n kubeconfig: "/etc/kubernetes/kubeconfig/kube-proxy.kubeconfig"\nclusterCIDR: "10.35.0.0/16"\nmode: "iptables""},"kind":"ConfigMap","metadata":{"annotations":{},"name":"kube-proxy","namespace":"kube-system"}}
creationTimestamp: "2019-04-07T11:05:31Z"
name: kube-proxy
namespace: kube-system
resourceVersion: "137"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: 96323990-5946-11e9-bfc2-26253d84

If i introduce the bind address manually in the CM, and after the kube-system pods are restarted, the metrics start getting collected from kube-proxy.

Thank you @truncj and @teochenglim

txynidakis · 2019-10-15T00:19:42Z

For the instance I was working on the config map was kube-proxy-config

demisx · 2019-11-06T22:47:54Z

I don't even have kube-proxy[-config] configmap in my k8s v1.14.6 created by kops. Not sure even where to set this metric-bind-address setting.

teochenglim · 2019-11-08T02:25:58Z

@demisx if you can give me your settings with this 2 lines. I would able to help you

kubectl get pod -n kube-system # this shall output your kube-proxy pod name..
kubectl get pod/kube-proxy-jpgsr -n kube-system -o yaml # pod name kube-proxy-jpgsr from last command

example of mine.. i know why it is in config map..

$ kubectl get pod -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
.
.
kube-proxy-jpgsr                          1/1     Running   0          16d # this is what i choose
kube-proxy-rprpg                          1/1     Running   0          16d # you can use this too..
.
.

$ kubectl get pod/kube-proxy-jpgsr -n kube-system -o yaml
  containers:
  - command:
    - kube-proxy
    - --v=2
    - --config=/var/lib/kube-proxy-config/config # oh... so startup config is here
.
.
    - mountPath: /var/lib/kube-proxy-config/ # it is kinda of a config mounting? But where?
      name: config 
.
.
  volumes:
.
.
  - configMap:
      defaultMode: 420
      name: kube-proxy-config # so it is in configmap..

demisx · 2019-11-09T02:55:28Z

@teochenglim Thank you so much for your response. Here is the output that you've requested:

$ kubectl get pod -n kube-system
-------
kube-proxy-ip-172-20-121-121.us-west-2.compute.internal               1/1     Running   0          6d13h
kube-proxy-ip-172-20-41-132.us-west-2.compute.internal                1/1     Running   0          6d13h
kube-proxy-ip-172-20-44-4.us-west-2.compute.internal                  1/1     Running   0          6d13h
kube-proxy-ip-172-20-66-190.us-west-2.compute.internal                1/1     Running   0          6d13h
kube-proxy-ip-172-20-78-191.us-west-2.compute.internal                1/1     Running   0          6d13h
kube-proxy-ip-172-20-98-153.us-west-2.compute.internal                1/1     Running   0          6d13h

$ kubectl -n kube-system get pod/kube-proxy-ip-172-20-121-121.us-west-2.compute.internal -o yaml
------
spec:
  containers:
  - command:
    - /bin/sh
    - -c
    - mkfifo /tmp/pipe; (tee -a /var/log/kube-proxy.log < /tmp/pipe & ) ; exec /usr/local/bin/kube-proxy
      --cluster-cidr=100.96.0.0/11 --conntrack-max-per-core=131072 --hostname-override=ip-172-20-121-121.us-west-2.compute.internal
      --kubeconfig=/var/lib/kube-proxy/kubeconfig --master=https://api.internal.prod.dimaslist.org
      --oom-score-adj=-998 --resource-container="" --v=2 > /tmp/pipe 2>&1
    image: k8s.gcr.io/kube-proxy:v1.14.6
    imagePullPolicy: IfNotPresent
    name: kube-proxy
    resources:
      requests:
        cpu: 100m
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/kube-proxy/kubeconfig
      name: kubeconfig
      readOnly: true
    - mountPath: /var/log/kube-proxy.log
      name: logfile
    - mountPath: /lib/modules
      name: modules
      readOnly: true
    - mountPath: /etc/ssl/certs
      name: ssl-certs-hosts
      readOnly: true
    - mountPath: /run/xtables.lock
      name: iptableslock
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  nodeName: ip-172-20-121-121.us-west-2.compute.internal
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  terminationGracePeriodSeconds: 30
  tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoExecute
    operator: Exists
  volumes:
  - hostPath:
      path: /var/lib/kube-proxy/kubeconfig
      type: ""
    name: kubeconfig
  - hostPath:
      path: /var/log/kube-proxy.log
      type: ""
    name: logfile
  - hostPath:
      path: /lib/modules
      type: ""
    name: modules
  - hostPath:
      path: /usr/share/ca-certificates
      type: ""
    name: ssl-certs-hosts
  - hostPath:
      path: /run/xtables.lock
      type: FileOrCreate
    name: iptableslock
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-11-02T13:41:26Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-11-02T13:41:32Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-11-02T13:41:32Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-11-02T13:41:26Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://076e5703ed1e273cb27251f1b217ebed329184551360af47e940fbeb1b4daa4e
    image: k8s.gcr.io/kube-proxy:v1.14.6
    imageID: docker-pullable://k8s.gcr.io/kube-proxy@sha256:e5c364dc75d816132bebf2d84b35518f0661fdeae39c686d92f9e5f9a07e96b9
    lastState: {}
    name: kube-proxy
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2019-11-02T13:41:32Z"
  hostIP: 172.20.121.121
  phase: Running
  podIP: 172.20.121.121
  qosClass: Burstable
  startTime: "2019-11-02T13:41:26Z"

teochenglim · 2019-11-11T01:09:09Z

Hi @demisx ,

Sure,

I did a quick investigate and the kube-proxy metrics default listening to 127.0.0.1:10249
You can ssh to you kops cluster, any node will do, sudo to root user, "netstat -ntlp" and you shall able to see it.
You can also "curl http://127.0.0.1:10249/metrics" to see the metric the kube-proxy exposed.

!!!!! To update it will/may cause down time, please I shall bare no responsibility if this is your production system.!!!!

Yes. From the result you gave, it is reside on the hostPath, however if you ssh to the kops ckuster "nodes".. it is not the information like eks cluster.

  volumes:
  - hostPath:
      path: /var/lib/kube-proxy/kubeconfig
      type: ""

To patch it, i use

$ export NAME=[Your cluster name]
$ kops edit cluster $NAME

insert this (How I know? kubernetes/kops#6472) and then run "kops update cluster $NAME --yes --state=s3://[your s3 store]" once

spec:
.
.
  kubeProxy:
    metricsBindAddress: 0.0.0.0

Then this part is I don't know how to do it better. Basically i tried both "cluster update" and "cluster rolling-update".. both doesn't trigger node update. Since I am on AWS, I go to EC2-Auto-Scaling-Groups and down the node by set minimum and desired node to be 0. Wait EC2 node to be shutting down, trigger the node back to your desire pod count again.

!!!!! If you want zero down time, also must depends on your application is able to survive from kubernetes node taint, please node taint it after grown. https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ You can do the reverse that grow the autoscaling group, taint it and then reduce it. Best to schedule down time to do this.

Both doesn't work for me

kops update cluster $NAME --yes --state=s3://[your s3 store]
kops rolling-update cluster

After scale down and up. I ssh to the kops nodes again
This time i do "netstat -ntlp" with root user and indeed metrics pod is listening at 0.0.0.0:10249

Cheers.

demisx · 2019-11-18T15:47:19Z

@teochenglim Thank you for the detail explanation. This is pretty much what I did as well. I've also edited the cluster yaml file to add the lines below, so next time I create a cluster from scratch I won't have to edit it manually:

spec:
  kubeProxy:
    metricsBindAddress: 0.0.0.0

teochenglim · 2019-11-18T23:29:54Z

Good that I know it works for you. cheers and have a nice day.

mahaffey · 2019-11-25T20:15:17Z

kubernetes/kops#6472 (comment)

ctyjrsy closed this as completed Sep 6, 2019

vijay-veeranki mentioned this issue Feb 7, 2020

Setup monitoring/prometheus-operator-kube-proxy metrics ministryofjustice/cloud-platform#1636

Closed

YangLu1031 mentioned this issue Feb 17, 2020

Enable prometheus server for APIResponsiveness test kubernetes/test-infra#16327

Merged

dbrennand mentioned this issue Jul 14, 2020

kube-proxy config configmap missing - where is it ? needed for Prometheus operator Azure/aks-engine#2652

Closed

davidbIQ mentioned this issue Nov 11, 2020

Prometheus scrape metrics for kube-proxy Azure/AKS#1695

Closed

shan100github mentioned this issue May 19, 2021

[prometheus-kube-stack] kube-proxy metrics status with connection refused. prometheus-community/helm-charts#977

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kube-Proxy endpoint connection refused #16476

Kube-Proxy endpoint connection refused #16476

ctyjrsy commented Aug 21, 2019 •

edited

ctyjrsy commented Aug 21, 2019

truncj commented Aug 27, 2019

teochenglim commented Sep 6, 2019

ctyjrsy commented Sep 6, 2019

txynidakis commented Oct 15, 2019

demisx commented Nov 6, 2019 •

edited

teochenglim commented Nov 8, 2019

demisx commented Nov 9, 2019 •

edited

teochenglim commented Nov 11, 2019 •

edited

demisx commented Nov 18, 2019

teochenglim commented Nov 18, 2019

mahaffey commented Nov 25, 2019

Kube-Proxy endpoint connection refused #16476

Kube-Proxy endpoint connection refused #16476

Comments

ctyjrsy commented Aug 21, 2019 • edited

ctyjrsy commented Aug 21, 2019

truncj commented Aug 27, 2019

teochenglim commented Sep 6, 2019

ctyjrsy commented Sep 6, 2019

txynidakis commented Oct 15, 2019

demisx commented Nov 6, 2019 • edited

teochenglim commented Nov 8, 2019

demisx commented Nov 9, 2019 • edited

teochenglim commented Nov 11, 2019 • edited

demisx commented Nov 18, 2019

teochenglim commented Nov 18, 2019

mahaffey commented Nov 25, 2019

ctyjrsy commented Aug 21, 2019 •

edited

demisx commented Nov 6, 2019 •

edited

demisx commented Nov 9, 2019 •

edited

teochenglim commented Nov 11, 2019 •

edited