Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes endpoint investigation #5636

Closed
cakrit opened this issue Mar 14, 2019 · 10 comments
Closed

Kubernetes endpoint investigation #5636

cakrit opened this issue Mar 14, 2019 · 10 comments
Assignees
Labels
area/collectors Everything related to data collection discussion K8s Kubernetes related
Milestone

Comments

@cakrit
Copy link
Contributor

cakrit commented Mar 14, 2019

  • Investigate discrepancies between kubelet pods info and API server pods info
  • Investigate access to endpoints for kube-dns
@cakrit cakrit added this to the v1.14-rc0 milestone Mar 14, 2019
@cakrit cakrit self-assigned this Mar 14, 2019
@cakrit
Copy link
Contributor Author

cakrit commented Mar 18, 2019

Ιnvestigate discrepancies between kubelet pods info and API server pods info

There is no discrepancy, we were probably comparing different responses when we saw it.

@cakrit
Copy link
Contributor Author

cakrit commented Mar 18, 2019

I still haven't figured out how to access the other endpoints on the GKE cluster. I added some more permissions to make it identical to Prometheus, but all I can see are the kubelet and kube-proxy endpoints. The service account seems to make no difference at all. I'll see what we get in minikube with the script below. @varyumin and @paulfantom, if you have any ideas, let me know. The list below is what @ilyam8 collected in #5392

#!/bin/sh

KUBE_TOKEN="$(</var/run/secrets/kubernetes.io/serviceaccount/token)"
while read n u ; do
  curl -S $u 2>/dev/null 1>$n
  if [ $? -eq 0 ] ; then
    echo "PASS: $n $u"
  else
    echo "FAIL: $n $u"
  fi
  if [ ! -z ${KUBE_TOKEN} ]; then 
    curl -S $u 2>/dev/null 1>$n 
    if [ $? -eq 0 ] ; then
      echo "PASS: SECURE $n $u"
    else
      echo "FAIL: SECURE $n $u"
    fi
  fi
done <<< "etcd http://localhost:2379/metrics
api-server https://localhost:8443/metrics
kube-scheduler http://127.0.0.1:10251/metrics
kube-controller-manager http://127.0.0.1:10252/metrics
kube-state-metrics http://127.0.0.1:80/metrics
kubelet-metrics http://127.0.0.1:10255/metrics
kubelet-pods http://127.0.0.1:10255/pods
kubelet-specs http://127.0.0.1:10255/specs
kubelet-healthz http://127.0.0.1:10255/healthz
kubelet-stats http://127.0.0.1:10255/stats
kubelet-stats-summary http://127.0.0.1:10255/stats/summary
kubelet-stats-container http://127.0.0.1:10255/stats/container
kube-dns-10055 http://127.0.0.1:10055/metrics
kube-dns-9153 http://127.0.0.1:9153/metrics
kube-proxy http://127.0.0.1:10249/metrics"

@ilyam8
Copy link
Member

ilyam8 commented Mar 18, 2019

I still haven't figured out how to access the other endpoints on the GKE cluster

About GKE, i found

prometheus/prometheus#2641 (comment)
prometheus/prometheus#2641

So it seems there is no way to access node directly, only via api-server/proxy.

@varyumin
Copy link

@cakrit
About GKE.
In GKE you don't get full access to API-server/proxy. Your rights are curtailed.

In minikube yes you have full access to API and proxy

@cakrit
Copy link
Contributor Author

cakrit commented Mar 19, 2019

Ok, the comment on the thread @ilyam8 posted makes sense. They mention

Instead of client side certificates that might not work for all Kubernetes 'distributions' scraping the node metrics through the API node proxy will work for any Kubernetes setup. The kubernetes example config is updated to use this method.

I'd like to test 'scraping the node metrics through the API node proxy' on GKE, though @varyumin 's comment suggests that perhaps even that might fail. Do you know how I could try that?

If it doesn't work, I guess we'll tell users to set up client certificates in GKE and we will have to use them in our collector. So we'll need to test that. I'd like to get to that after eliminating the first option though...

@cakrit
Copy link
Contributor Author

cakrit commented Mar 21, 2019

I did some more digging for kube-dns on GKE. The issue is actually that kube-dns does not expose metrics there, at all. Also, the port 10055 is no longer correct, coreDNS (which is also called kube-dns now) uses 9153. Here's the difference between minikube and GKE:

minikube

christopher@chris-msi helmchart]$ kubectl -n kube-system describe service kube-dns
Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=KubeDNS
Annotations:       prometheus.io/port: 9153
                   prometheus.io/scrape: true
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.96.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         172.17.0.7:53,172.17.0.8:53
...

Here, I can get metrics via curl -S http://172.17.0.7:9153/metrics. I can NOT get metrics from localhost:9153.

GKE

[christopher@chris-msi helmchart]$ kubectl -n kube-system describe service kube-dns
Name:              kube-dns
Namespace:         kube-system
Labels:            addonmanager.kubernetes.io/mode=Reconcile
                   k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=KubeDNS
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","k8s-app":"kube-d...
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.70.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         10.8.0.2:53,10.8.2.6:53
...

So, autodiscovery is a must and we should let users define the labels we will be using to find the IPs and ports of the available services.

@cakrit
Copy link
Contributor Author

cakrit commented Mar 21, 2019

I'm closing this, we'll go into the design of auto-discovery in the next sprint.

@cakrit cakrit closed this as completed Mar 21, 2019
@ilyam8
Copy link
Member

ilyam8 commented Mar 21, 2019

Also, the port 10055 is no longer correct, coreDNS (which is also called kube-dns now) uses 9153.

10055 is kube-dns (optional), 9153 is coreDNS (current default)

@ilyam8
Copy link
Member

ilyam8 commented Mar 21, 2019

@cakrit
Copy link
Contributor Author

cakrit commented Mar 21, 2019

From https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#introduction

The CoreDNS Deployment is exposed as a Kubernetes Service with a static IP. Both the CoreDNS and kube-dns Service are named kube-dns in the metadata.name field. This is done so that there is greater interoperability with workloads that relied on the legacy kube-dns Service name to resolve addresses internal to the cluster. It abstracts away the implementation detail of which DNS provider is running behind that common endpoint.

@ilyam8 ilyam8 added area/collectors Everything related to data collection K8s Kubernetes related and removed area/external labels Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/collectors Everything related to data collection discussion K8s Kubernetes related
Projects
None yet
Development

No branches or pull requests

3 participants