Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl cluster-info lists services that are not reachable #14232

Closed
justinsb opened this issue Sep 19, 2015 · 37 comments
Closed

kubectl cluster-info lists services that are not reachable #14232

justinsb opened this issue Sep 19, 2015 · 37 comments
Labels
area/kubectl lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cli Categorizes an issue or PR as relevant to SIG CLI.

Comments

@justinsb
Copy link
Member

kubectl cluster-info shows e.g.

> kubectl cluster-info
Kubernetes master is running at https://52.19.39.28
KubeDNS is running at https://52.19.39.28/api/v1/proxy/namespaces/kube-system/services/kube-dns
KubeUI is running at https://52.19.39.28/api/v1/proxy/namespaces/kube-system/services/kube-ui

But kube-dns is not (meaningfully) reachable over HTTP; it gives this error:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "no endpoints available for service \"kube-dns\"",
  "reason": "ServiceUnavailable",
  "code": 503
}

A user reported this, along with a similar error for monitoring-influxdb which I suspect might be the same problem.

I think the problem here is that we show services as running, and suggest an HTTP endpoint, but that HTTP endpoint is invalid for some services (I think).

We should avoid showing endpoints that will give users errors. It's particularly problematic because (at least on AWS) we print the output from kubectl cluster-info once kube-up succeeds, so the natural thing for users to do is to explore these endpoints. Not sure if GCE has the same problem.

@davidopp davidopp added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 22, 2015
@bgrant0607 bgrant0607 added area/kubectl priority/backlog Higher priority than priority/awaiting-more-evidence. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Oct 8, 2015
@okigan
Copy link

okigan commented Nov 11, 2015

This issue is reproducible with try out instruction for k8s and for k8s v.1.1.1 following services listed:

./cluster/kubectl.sh cluster-info
Kubernetes master is running at https://10.245.1.2
Heapster is running at https://10.245.1.2/api/v1/proxy/namespaces/kube-system/services/heapster
KubeDNS is running at https://10.245.1.2/api/v1/proxy/namespaces/kube-system/services/kube-dns
KubeUI is running at https://10.245.1.2/api/v1/proxy/namespaces/kube-system/services/kube-ui
Grafana is running at https://10.245.1.2/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana
InfluxDB is running at https://10.245.1.2/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb

but only KubeUI is working (after a while) all others still return: "no endpoints available for service ...",

@smugcloud
Copy link

I'm also having this issue on a fresh cluster (lots of details here). I can access Elasticsearch/Kibana/KubeUI/Grafana, but cannot access Heapster/KubeDNS/InfluxDB.

@cmoad
Copy link

cmoad commented Nov 18, 2015

I was having the same issue with KubeUI. I don't recall exactly how I found the message but the error reported that there was no "default" service account in the "kube-system" namespace. I created this manually by running kubectl create -f serviceaccount.yml --namespace="kube-system" and that resolved the problem.

where serviceaccount.yml was this:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: default

@smugcloud
Copy link

Hmm, interesting @cmoad. I actually already had the default serviceAccount in my cluster so I don't think that's it for my issue.

 k get serviceAccounts --all-namespaces
NAMESPACE     NAME      SECRETS   AGE
default       default   1         5d
kube-system   default   1         5d

@rroopreddy
Copy link

Same issue when moved to Kubernetes version 1.1.1 and created a fresh cluster

  1. Grafana Dashboards emit error

At https://xx.yy.zzz.abc/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana

Dashboard init failed
Template variables could not be initialized: InfluxDB Error:
Couldn't find series: memory/limit_bytes_gauge

  1. InfluxDB access URL gives the following error

https://xx.yy.zzz.abc/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb

{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service "monitoring-influxdb"",
"reason": "ServiceUnavailable",
"code": 503
}

@jtblin
Copy link
Contributor

jtblin commented Nov 25, 2015

There are different ports for influxdb so you need to specify which one in the url i.e. https://xx.yy.zzz.abc/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb:http/.

Influxdb ui is not really usable anyhow through the proxy for other issues and I can't get grafana to talk to influxdb properly yet with the same error as above.

@shenshouer
Copy link

the same issue with my fresh cluster (v1.1.2)

@jpbruinsslot
Copy link

I'm having the same problem as @rroopreddy, when accessing Grafana I'm gettting the same error.

Dashboard init failed
Template variables could not be initialized: InfluxDB Error: 
Couldn't find series: memory/limit_bytes_gauge

@pidah
Copy link

pidah commented Dec 7, 2015

I just tested a new cluster (v1.1.2) with kube-up on AWS and I can access influxdb with http appended at the end of the url as follows: https://x.x.x.x/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb:http
I can access elasticsearch/kibana/grafana/influxdb/kube-ui
I can't access heapster which returns a Temporary Redirect link to https://x.x.x.x/validate/
With kube-dns I still receive the 503 response, however the dns service seems to be working when I followed the steps here: https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns#how-do-i-test-if-it-is-working

$ cat busybox.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always


$ cluster/kubectl.sh create  -f busybox.yaml 
pod "busybox" created

$ cluster/kubectl.sh get pods busybox
NAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   0          18s
Peters-MacBook-Air:kubernetes peteridah$ cluster/kubectl.sh  exec busybox -- nslookup kubernetes.default
Server:    10.0.0.10
Address 1: 10.0.0.10 ip-10-0-0-10.eu-west-1.compute.internal

Name:      kubernetes.default
Address 1: 10.0.0.1 ip-10-0-0-1.eu-west-1.compute.internal

@vishh
Copy link
Contributor

vishh commented Dec 9, 2015

cc @brendandburns

@okigan
Copy link

okigan commented Dec 9, 2015

it is a bit odd that this issue slipped through -- meaning these are the default/demo/showcase k8s services and having them broken "out of the box" leaves a bad impression

@brendandburns
Copy link
Contributor

@okigan 100% agree, we'll get someone on checking this out. (and then improve e2e testing...)

@vishh
Copy link
Contributor

vishh commented Dec 10, 2015

cc @zmerlynn @a-robinson

@bscott
Copy link

bscott commented Mar 5, 2016

Is this still a issue?, I'm getting the same when accessing InfluxDB using the latest version of k8s

@tsitaru
Copy link

tsitaru commented Mar 24, 2016

Using version 1.2.0, I am still having issues accessing Grafana, getting the same error as above
Dashboard init failed Template variables could not be initialized: InfluxDB Error: Couldn't find series: memory/limit_bytes_gauge

I can access Influxdb at https://x.x.x.x/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb:http, the interface loads up, however it is unusable as it cannot connect to influxdb and gives the following error:
0 error - Could not connect to https://x.x.x.x:8086

@vishh
Copy link
Contributor

vishh commented Mar 24, 2016

@tsitaru: The Grafana error seems like there is some issue with heapster or with kube-proxy. I'd recommend running Grafana with an external service - Nodeport or loadbalancer

InfluxDB is not proxy friendly. So it is not expected to work via the API proxy.

@yidinghan
Copy link

@vishh hi
I used the example kubernetes/cluster/addons/cluster-monitoring/influxdb/ at commit 2e89f55

Even though i change the type of both svc:monitoring-influxdb and svc:monitoring-grafana from ClusterIP to LoadBalancer, but still got the same issue as @erroneousboat and @rroopreddy issued

$ kubectl describe svc monitoring-grafana
Name:           monitoring-grafana
Namespace:      kube-system
Labels:         kubernetes.io/cluster-service=true,kubernetes.io/name=Grafana
Selector:       k8s-app=influxGrafana
Type:           LoadBalancer
IP:         10.0.0.11
Port:           <unset> 80/TCP
NodePort:       <unset> 31050/TCP
Endpoints:      10.1.51.3:3000
Session Affinity:   None
No events.

Is there any thing I done correctable?

@vishh
Copy link
Contributor

vishh commented Apr 11, 2016

Take a look at the comments here in the grafana config.
You will have to change some env variables, in addition to changing the service type.

@luispabon
Copy link

This is still broken. I can access grafana, kibana, kube status and influxdb (the latter by appending :http to the url as above) but I cannot access the dns control panel or heapster (get a 404 on this one)

@chowyu08
Copy link

@yidinghan @luispabon ;I have the same issue , is still a issue now?

@luispabon
Copy link

Still is, on 1.2.4

@mitsuh
Copy link

mitsuh commented Jun 29, 2016

Still an issue.

@chowyu08
Copy link

me too ,but still thanks!

2016-06-30 1:10 GMT+08:00 mitsuh notifications@github.com:

Still an issue.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#14232 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/APuEvrt2e7UyLW6N9xhNBTqHt2DjC9Yhks5qQqchgaJpZM4GAYsV
.

@rsrchboy
Copy link

...aaaand still an issue.

@erie149
Copy link

erie149 commented Sep 29, 2016

All of you may have noticed this already but if you have a pod with multiple ports you will need the port in the url to connect to the pod (since the master actually connects via pod not service)

Something like the following:
http:///api/v1/proxy/namespaces/default/services/xref-service:8080

@tnsasse
Copy link

tnsasse commented Oct 6, 2016

Still an issue, ran the AWS deployment script and receive the following:

Kubernetes master is running at https://xxx
Elasticsearch is running at https://xxx/api/v1/proxy/namespaces/kube-system/services/elasticsearch-logging
Heapster is running at https://xxx/api/v1/proxy/namespaces/kube-system/services/heapster
Kibana is running at https://xxx/api/v1/proxy/namespaces/kube-system/services/kibana-logging
KubeDNS is running at https://xxx/api/v1/proxy/namespaces/kube-system/services/kube-dns
kubernetes-dashboard is running at https://xxx/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
Grafana is running at https://xxx/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana
InfluxDB is running at https://xxx/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb

Now, checking the URLs:

https://xxx/api/v1/proxy/namespaces/kube-system/services/kube-dns yields:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "no endpoints available for service \"kube-dns\"",
  "reason": "ServiceUnavailable",
  "code": 503
}

https://xxx/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana shows a dashboard, but no data is displayed, only the following error message on the little red corners of the graphs: InfluxDB Error Response: error parsing query: found SELECT, expected ; at line 2, char 1

Which leads me to influx on https://xxx/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb which shows

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "no endpoints available for service \"monitoring-influxdb\"",
  "reason": "ServiceUnavailable",
  "code": 503
}

Kubernetes Dashboard on the other hand works, so that's some success. But I think the other tools / links should also work. This is kind of frustrating. I am very new to Kubernetes and its ecosystem so I am not really aware on where should be some UI and which might be pure service URLs, so please forgive me if this is expected behaviour above.

@timbunce
Copy link

@tnsasse that InfluxDB problem is probably #33775.

@sputnick
Copy link

Since I ended in this thread for a related problem I'll share a nugget that others might find useful if they are making a custom cluster-service using the label kubernetes.io/cluster-service: "true"

If you use a named port for your container that the service selects then you need to add ":portname" at the end of the URL that "kubectl cluster-info" reports. Otherwise you will get a "no endpoints..." error.

@vnalla
Copy link

vnalla commented Apr 10, 2017

I am having the same issue. I used the port name dns-tcp-local and dns-tcp both requests get timed out in the browser.
Version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"87d9d8d7bc5aa35041a8ddfe3d4b367381112f89", GitTreeState:"clean", BuildDate:"2017-01-23T14:15:25Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"87d9d8d7bc5aa35041a8ddfe3d4b367381112f89", GitTreeState:"clean", BuildDate:"2017-01-23T14:15:25Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Cluster Info
Kubernetes master is running at http://centos-master:8080
KubeDNS is running at http://centos-master:8080/api/v1/proxy/namespaces/kube-system/services/kube-dns
kubernetes-dashboard is running at http://centos-master:8080/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard

Kube DNS URL Error
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service "kube-dns"",
"reason": "ServiceUnavailable",
"code": 503
}

DNS RC YAML File contents
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v20
namespace: kube-system
labels:
k8s-app: kube-dns
version: v20
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v20
template:
metadata:
labels:
k8s-app: kube-dns
version: v20
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
containers:
- name: kubedns
image: gcr.io/google_containers/kubedns-amd64:1.8
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
livenessProbe:
httpGet:
path: /healthz-kubedns
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 3
timeoutSeconds: 5
args:
# command = "/kube-dns"
- --domain=cluster.local.
- --dns-port=10053
- --kube-master-url=http://192.168.56.110:8080
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- name: dnsmasq
image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4
livenessProbe:
httpGet:
path: /healthz-dnsmasq
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --cache-size=1000
- --no-resolv
- --server=127.0.0.1#10053
- --log-facility=-
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- name: healthz
image: gcr.io/google_containers/exechealthz-amd64:1.2
resources:
limits:
memory: 50Mi
requests:
cpu: 10m
# Note that this container shouldn't really need 50Mi of memory. The
# limits are set higher than expected pending investigation on #29688.
# The extra memory was stolen from the kubedns container to keep the
# net memory requested by the pod constant.
memory: 50Mi
args:
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- --url=/healthz-dnsmasq
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
- --url=/healthz-kubedns
- --port=8080
- --quiet
ports:
- containerPort: 8080
protocol: TCP
dnsPolicy: Default # Don't use cluster DNS.

If I use this url http://192.168.56.110:8080/api/v1/proxy/namespaces/kube-system/services/kube-dns:dns-tcp/ (note centos-master' ip is 192.168.56.110) it is giving following error:
Error: 'EOF'
Trying to reach: 'http://172.30.17.3:53/'

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@0xmichalis
Copy link
Contributor

/sig cli

@k8s-ci-robot k8s-ci-robot added the sig/cli Categorizes an issue or PR as relevant to SIG CLI. label Jun 10, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 10, 2017
@olethanh
Copy link

Also have the issue, installed from https://github.com/kubernetes/contrib/tree/master/ansible

@apurvann
Copy link

What's the fix for this, am still facing this issue in 1.8

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 26, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 28, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@ZhiyuanChen
Copy link

ZhiyuanChen commented Jan 18, 2019

Well, I met the same issue for both Gafana and InfluxDB (haven't test if other service hace the same issue
My k8s was installed from microk8s

For more information, I runned the kubectl -n kube-system -l=k8s-app=kube-dns get pods and received:
NAME READY STATUS RESTARTS AGE
kube-dns-6ccd496668-2wgw8 0/3 ContainerCreating 0 17h

@jimmyjimmy94
Copy link

Well, I met the same issue for both Gafana and InfluxDB (haven't test if other service hace the same issue
My k8s was installed from microk8s

For more information, I runned the kubectl -n kube-system -l=k8s-app=kube-dns get pods and received:
NAME READY STATUS RESTARTS AGE
kube-dns-6ccd496668-2wgw8 0/3 ContainerCreating 0 17h

This might be a very late reply but tried the entire day on Chrome with no success but Firefox works perfectly fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubectl lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cli Categorizes an issue or PR as relevant to SIG CLI.
Projects
None yet
Development

No branches or pull requests