New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiller pods can't connect to k8s apiserver #2464

Closed
willise opened this Issue May 18, 2017 · 32 comments

Comments

Projects
None yet
@willise
Contributor

willise commented May 18, 2017

logs:

(kube-master) tiller # kcm logs tiller-deploy-3354596499-4q16f
Cannot initialize Kubernetes connection: Get http://localhost:8080/api: dial tcp [::1]:8080: getsockopt: connection refused

This issue has been raised in #1591 #1791.

In #1791, there gets a resolution(thank @iamzhout) of adding KUBERNETES_MASTER in pods env but nobody explains the root cause of the issue.

I installed the k8s cluster manually and it seems that tiller pod can't get KUBERNETES_MASTER from env as this item doesn't exist in the env at all.

No docs are found about the relation between k8s cluster and the configuration. I don't think it is a good idea to add the env item manually(and how to helm init with the configuration???)

Is there anybody who can offer the help?

Thanks anyway (^.^).

@technosophos

This comment has been minimized.

Member

technosophos commented May 18, 2017

@willise You might want to ask about this in the Slack channel if possible. It is very likely a configuration issue with Kubernetes, and people there may be able to help quickly.

@bacongobbler

This comment has been minimized.

Member

bacongobbler commented May 18, 2017

Agreed. This looks to be more of a misconfigured kubernetes cluster causing the KUBERNETES_SERVICE_HOST environment variable to either be pointing in the wrong location or it doesn't exist at all. Tiller needs that envvar in the pod to connect to the kubernetes apiserver.

I don't think there's anything actionable here in helm itself, save perhaps some docs clarification on how tiller communicates with the apiserver.

@willise

This comment has been minimized.

Contributor

willise commented May 19, 2017

@technosophos Ooops... Kind of your reminding and I'll note that.

@willise

This comment has been minimized.

Contributor

willise commented May 19, 2017

@bacongobbler KUBERNETES_SERVICE_HOST exists in env indeed but it's value is a cluster IP like 10.254.0.1. If I don't set KUBERNETES_MASTER, tiller is unable to connect apiserver port 8080. Only if it is set to master physical address like 192.168.56.101, it's ok.

So the problem may be tiller can't get the correct apiserver address from k8s when tiller initializes the k8s apiserver connection.

@allamand

This comment has been minimized.

allamand commented May 30, 2017

I've got a similar error where tiller retrieve the correct ip and port but can't connect due to tls verification (i use skip-tls-verify on my kubernetes.

2017-05-30T10:52:56.35985785Z Cannot initialize Kubernetes connection: Get https://10.100.200.1:443/api: x509: cannot validate certificate for 10.100.200.1 because it doesn't contain any IP SANs

is there a way to ask tiller to skip this verification ?

@penfree

This comment has been minimized.

penfree commented Aug 4, 2017

I meet the same problem, it was due to KUBECONFIG is not default. so you shoud specify KUBECONFIG env to the right position.

@mvernimmen

This comment has been minimized.

mvernimmen commented Dec 28, 2017

@bacongobbler I'm having the same issue here, while KUBERNETES_SERVICE_HOST is set correctly:

$ kubectl exec -it tiller-deploy-1490305151-6jq17 -n integrations-test -- /bin/sh
/ # env | grep -i kube
KUBERNETES_PORT=tcp://10.96.0.1:443
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
KUBERNETES_SERVICE_HOST=10.96.0.1

To fix the problem I set the KUBERNETES_MASTER in the running tiller pod and tried again, but when doing an install, the problem remains:

$ kubectl logs tiller-deploy-1490305151-6jq17 -n integrations-test
...
[tiller] 2017/12/28 13:34:09 preparing install for rmlater
[storage] 2017/12/28 13:34:09 getting release history for "rmlater"
[storage/driver] 2017/12/28 13:34:09 query: failed to query with labels: Get http://localhost:8080/api/v1/namespaces/integrations-test/configmaps?labelSelector=NAME%3Drmlater%2COWNER%3DTILLER: dial tcp 127.0.0.1:8080: getsockopt: connection refused
[tiller] 2017/12/28 13:34:09 failed install prepare step: Get http://localhost:8080/version: dial tcp 127.0.0.1:8080: getsockopt: connection refused

And trying the install with --kube-context integrations-test or with KUBECONFIG set to the path of my .kube/config file did not resolve the problem either.

According to the kubernetes manual, the best practice is to resolve the kubernetes (fqdn) hostname (https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod), is that not what is being done here?
Why is the KUBERNETES_SERVICE_HOST not being used to access the API?
This is all with version 2.7.2

I'm probably missing something obvious, as are some of the other people having this problem. Could you, or someone else please have a look into this?

@mvernimmen

This comment has been minimized.

mvernimmen commented Dec 28, 2017

I have found 1 possible cause for this problem.
On a cluster set up with kubespray, running version 1.8.3, helm init and helm install work when executed by a user when deploying to a specific namespace. But only when the serviceaccount name is 'tiller'. When using serviceaccount name 'tiller-integrations-acc', the 127.0.0.1 error appears.
I've pasted my steps here: https://pastebin.com/TwCypx4Q
Did I do it wrong or did I run into a bug?

However, this did not solve the problem on a cluster set up using kubeadm and running k8s version 1.7.8, there the 127.0.0.1 API problem remains despite using the serviceaccount name 'tiller'. So perhaps there are multiple possible causes for the error message.

@mattus

This comment has been minimized.

mattus commented Jan 11, 2018

Had something very similar to this, our eventual fix was to ensure automountServiceAccountToken on the ServiceAccount running tiller was set to true. We were creating the service account with terraform which was defaulting to false for reasons discussed in terraform-providers/terraform-provider-kubernetes#38.

Manually creating the service account with kubectl does not set this value (it defaults to true when unset).

@mvernimmen

This comment has been minimized.

mvernimmen commented Jan 23, 2018

Hi Mattus, thank you for suggesting that. I tried it, but unfortunately for me it doesn't fix the problem. installing charts when using a service account name other than 'tiller' results in helm not being able to find 'tiller'.

@bacongobbler

This comment has been minimized.

Member

bacongobbler commented Apr 11, 2018

According to #3816 and a comment in #1791, it appears that the missing environment variable that is injected from cloud providers and assumed to be present as per client-go is KUBERNETES_MASTER, not KUBERNETES_SERVICE_HOST. I'm not sure what the resolution is, but this assumption is embedded in client-go and is consistent across all the major cloud providers, so it's probably something missing in the bare metal cluster setup guide which breaks this assumption. :/

tl;dr this is a bare metal cluster doc issue, not necessarily something wrong with helm's assumptions about the running environment.

@fossxplorer

This comment has been minimized.

fossxplorer commented Apr 12, 2018

I'm posting the comment here as well from issue #3870:
It seems to be an issue with the running Tiller pod:
[root]# docker logs --tail 1 k8s_tiller_tiller-deploy-686c785c58-cl22s_kube-system_34734fe7-3e4c-11e8-a4dd-fa163e87b224_0
[storage/driver] 2018/04/12 13:02:59 list: failed to list: Get http://localhost:8080/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%3DTILLER: dial tcp 127.0.0.1:8080: connect: connection refused

The above was triggered by helm on my workstation so the command is run remotely against the cluster:
$ helm list
Error: Get http://localhost:8080/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 127.0.0.1:8080: connect: connection refused.

Is Tiller supposed to be running something (local proxy of some kind that uses KUBE* env variables to forward traffic?) locally on port 8080?
As far as i can see, there is nothing listening on port 8080 within the Tiller pod (which makes sense with the error message from Helm):
docker exec -i k8s_tiller_tiller-deploy-686c785c58-cl22s_kube-system_34734fe7-3e4c-11e8-a4dd-fa163e87b224_0 netstat -tlnup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 :::44134 :::* LISTEN 1/tiller
tcp 0 0 :::44135 :::* LISTEN 1/tiller

Is there anything else i might have missed and should check?

@fossxplorer

This comment has been minimized.

fossxplorer commented Apr 13, 2018

@mattus Thanks a lot, i was stuck for ~ 3 days with this at work trying to deploy a k8s cluster. This should really be documented somewhere.
What i did to solve the issue was:

  • kubectl --namespace=kube-system edit deployment/tiller-deploy and changed automountServiceAccountToken to true.
    Then 'helm list' was giving me:
    Error: configmaps is forbidden: User "system:serviceaccount:kube-system:default" cannot list configmaps in the namespace "kube-system"
    That was fixed with solution from #2687:
  • kubectl --namespace=kube-system create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
@bacongobbler

This comment has been minimized.

Member

bacongobbler commented Apr 13, 2018

@fossxplorer if you want to document your findings we'd really appreciate it!

@johnhamelink

This comment has been minimized.

johnhamelink commented Apr 13, 2018

FYI this is happening when I use terraform to build an AKS k8s cluster in Azure. This issue gave a clue: terraform-providers/terraform-provider-kubernetes#38

Editing the deploy like @fossxplorer mentions seems to help me. Here's a handy one-liner to do just that:

kubectl -n kube-system patch deployment tiller-deploy -p '{"spec": {"template": {"spec": {"automountServiceAccountToken": true}}}}'
@kfox1111

This comment has been minimized.

kfox1111 commented Apr 27, 2018

I just hit this with the newest version on the newest minikube. :/

@kfox1111

This comment has been minimized.

kfox1111 commented Apr 27, 2018

editing tiller-deployment,
spec.template.spec.automountServiceAccountToken and setting it to true (it was false) seemed to help.

@milescrabill

This comment has been minimized.

milescrabill commented Apr 27, 2018

I ran into this on a fresh GKE cluster that I manually enabled RBAC on per Google's docs here.

I had already installed Helm, so things broke down when I enabled RBAC. After creating a Serviceaccount and ClusterRoleBinding for tiller, upgrading Helm with helm init --service-account tiller --upgrade was still producing the error above: Error: Get http://localhost:8080/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp [::1]:8080: connect: connection refused.

As others have said, updating the tiller deployment to change automountServiceAccountToken: true fixed my issue.

@andrewgdavis

This comment has been minimized.

andrewgdavis commented Apr 27, 2018

Alternatively, one is able to install with the default service account specified:
helm init --service-account default
which will install tiller to the kube-system namespace.

@kfox1111

This comment has been minimized.

kfox1111 commented Apr 27, 2018

looks like minikube in the newest version deploys with rbac enabled. That is another difference.

@bacongobbler

This comment has been minimized.

Member

bacongobbler commented Apr 27, 2018

If you would be so kind as to test #3990, that would be appreciated. Seems like there was a regression in 2.9.

@bacongobbler

This comment has been minimized.

Member

bacongobbler commented Apr 27, 2018

alternative PR: #3991

@miroadamy

This comment has been minimized.

miroadamy commented Apr 29, 2018

The solution presented above by fossxplorer works for version v0.25.2 of minikube, but same sequence seems to be failing for v0.26.1.

@AmazingTurtle

This comment has been minimized.

AmazingTurtle commented May 4, 2018

@miroadamy minikube v0.26.* seems to be broken at all, certificates fucked up. v0.25.2 is safe to use right now

@bacongobbler

This comment has been minimized.

Member

bacongobbler commented May 14, 2018

This has been fixed in helm v2.9.1, which is available here: https://github.com/kubernetes/helm/releases/v2.9.1

Thanks everyone!

@AmazingTurtle

This comment has been minimized.

AmazingTurtle commented May 23, 2018

Upgraded vom v2.7.2 to v2.9.1 using helm init --upgrade (also tried helm init --service-account default --upgrade). Still had to apply fix #2464 (comment) (automountServiceAccountToken helped).

Worth to mention: Running minikube with rbac enabled (https://gist.github.com/F21/08bfc2e3592bed1e931ec40b8d2ab6f5)

@ramyala

This comment has been minimized.

ramyala commented Jun 15, 2018

@bacongobbler Can we reopen this issue. If your serviceaccount has automountServiceAccountToken=false, v2.9.1 will fail with continue to fail. The pod configuration for tiller-deploy should allow serviceaccount mounts by explicitly specifying automountServiceAccountToken:true.

@bacongobbler bacongobbler reopened this Jun 15, 2018

@ramyala

This comment has been minimized.

ramyala commented Jun 16, 2018

I sent out a PR which fixes this issue and am able to get terraform service_accounts to work with helm.

PTAL @ #4229

@ChristopherHanson

This comment has been minimized.

ChristopherHanson commented Jun 20, 2018

None of this stuff worked for me and I had this exact same error as 3870. The only thing I got to work was helm init --net-host which actually allowed Tiller to reach api-server via localhost in the Node's network namespace.

For the record, this was done on a k8s server built by hand by cloning the k8s repo and building the binaries, not deployed by kubeadm or some other deployer; arguably, @bacongobbler's comment about a "misconfigured kubernetes cluster" would likely apply in my case, but it's a lab box and that is okay w/ me.

@ramyala

This comment has been minimized.

ramyala commented Jul 16, 2018

did you try #4229? It resolved the issues in my case

@paultiplady

This comment has been minimized.

paultiplady commented Aug 9, 2018

I hit this issue on a fresh install of Kubernetes under Docker for Mac.

The above suggestion to set automountServiceAccountToken to true resolved the issue for me.

@johnmshields

This comment has been minimized.

johnmshields commented Sep 19, 2018

This might help others that come upon this issue when working with terraform. The following terraform configuration was able to both create the service account and properly initialize tiller. The key is the override parameter to the helm provider.

data "google_client_config" "current" {}

provider "kubernetes" {
  load_config_file = false

  host                   = "https://${var.master_ip}"
  token                  = "${data.google_client_config.current.access_token}"
  cluster_ca_certificate = "${base64decode(var.cluster_ca_certificate)}"
}

provider "helm" {
  namespace       = "kube-system"
  service_account = "${kubernetes_service_account.tiller.metadata.0.name}"
  override        = ["spec.template.spec.automountserviceaccounttoken=true"]

  kubernetes {
    host                   = "https://${var.master_ip}"
    token                  = "${data.google_client_config.current.access_token}"
    cluster_ca_certificate = "${base64decode(var.cluster_ca_certificate)}"
  }
}

resource "kubernetes_service_account" "tiller" {
  metadata {
    name      = "tiller"
    namespace = "kube-system"
  }
}

resource "kubernetes_cluster_role_binding" "tiller" {
  metadata {
    name = "tiller"
  }

  # TODO: give specific permissions
  role_ref {
    name = "cluster-admin"
    kind = "ClusterRole"
  }

  subject {
    kind      = "ServiceAccount"
    name      = "tiller"
    namespace = "kube-system"
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment