Plan stalls due to failed tiller during `helm_resource` state refresh #315

joatmon08 · 2019-08-07T21:06:05Z

Terraform Version

Terraform v0.12.6

provider.helm v0.10.2

Affected Resource(s)

helm_resource
tiller

Terraform Configuration Files

main.tf

terraform {
  required_version = "~> 0.12"
}

provider "helm" {
  version = "~> 0.10"
  install_tiller = true
}

module "helm-consul" {
  source    = "./helm-consul"
  name      = "consul"
  namespace = var.namespace
  enable    = true
}

module file (located in `./helm-consul`)

resource "helm_release" "consul" {
  name      = var.name
  chart     = "${path.module}/consul-helm"  ## official Helm Consul chart, local
  namespace = var.namespace

  set {
    name  = "server.replicas"
    value = var.replicas
  }

  set {
    name  = "server.bootstrapExpect"
    value = var.replicas
  }

  set {
    name  = "server.connect"
    value = true
  }

  provisioner "local-exec" {
    command = "helm test ${var.name}"
  }
}

Debug Output

https://gist.github.com/joatmon08/c77de83d65709c06e5313331f3aa8c4a

Expected Behavior

Tiller pod should be re-initialized or error message should return "could not find a ready tiller pod".

Actual Behavior

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

module.helm-consul.helm_release.consul[0]: Refreshing state... [id=consul]

Error: timeout while waiting for state to become 'Running' (last state: 'Pending', timeout: 5m0s)

Steps to Reproduce

Create a Kubernetes cluster.
Run terraform init with install_tiller = true. Tiller initializes correctly in cluster.
Successfully deploy a helm_resource using terraform apply. This gets logged into Terraform state.
Scale Tiller deployment down using kubectl scale deployment/tiller-deploy -n kube-system --replicas=0. (To mimic failed tiller.)
Run terraform plan. It will wait for available tiller pod and times out.

Important Factoids

Initially, we discovered this when we created a managed Kubernetes cluster and updated some configuration. This caused the Kubernetes cluster to destroy and re-create itself. When the cluster re-initialized, Tiller was stuck in a failed state. Running helm init again re-deployed the Tiller pod and allows the plan to complete.

While this does not apply to Helm v3, any cluster running Helm v2 that is re-created could result in a failed tiller pod and cause the plan to stall. Initially discussed this with @alexsomesan, posting here to collect input.

References

N/A

The text was updated successfully, but these errors were encountered:

alexsomesan · 2019-08-08T00:10:27Z

Able to reproduce this. I'll be working on a fix in the coming days.

mmclane · 2019-10-10T15:59:58Z

I am seeing this issue. Any updates?

fl-max · 2019-10-31T17:46:12Z

I'm seeing this same issue however Tiller never failed and is still healthy

Stuck on:
helm_release.airflow: Refreshing state... [id=heroic-seahorse]

In Tiller, I see:

[storage] 2019/10/31 17:14:19 getting last revision of "heroic-seahorse"
[storage] 2019/10/31 17:14:19 getting release history for "heroic-seahorse"
[storage] 2019/10/31 17:14:20 getting last revision of "heroic-seahorse"
[storage] 2019/10/31 17:14:20 getting release history for "heroic-seahorse"

Cancelling and running the plan again seems to fix it.

zzzuzik · 2019-10-31T20:35:30Z

+1 constantly stuck with
getting last revision... getting release ...

ryudice · 2019-11-04T16:57:52Z

Hi, same issue here

zzzuzik · 2019-11-04T19:10:34Z

btw, since my resource is recyclable, workarounded the problem by issuing terraform destroy for the resource and creating it with a different terraform name, like resource.mysql -> resource.mysql2

suheb · 2019-11-07T14:57:26Z

I found a workaround to this. You need to delete the failed tiller-deploy pod in your cluster.
Run kubectl -n kube-system get pods | grep tiller to get the pod name.
Then, run kubectl delete pods <pod>.

After this, terraform plan should run normally.

cliedeman · 2019-11-07T20:29:53Z

In my case I had made a mistake and the deploy never started the tiller pod because the sa account could not be found

kubectl -n kube-system get deploy | grep tiller-deploy

Deleting the deploy fixed my issue. Can't wait for helm 3

ArthurSens · 2019-12-04T12:31:45Z

I'm experiencing a similar issue.

I've created an eks cluster with terraform and deployed tiller and some helm_resources with the helm provider.

After that I deleted my eks cluster with terraform destroy -target=module.eks

Of course, all my pods were deleted with the cluster and I cannot perform any of the following comands terraform plan, terraform apply, terraform destroy.. with the following logs:

module.helm-releases.helm_release.metrics-server: Refreshing state... [id=metrics-server]


Error: timeout while waiting for state to become 'Running' (last state: 'Pending', timeout: 5m0s)

Just to let you guys know...
The issue was solved with

terraform state rm module.helm-releases.helm_release.metrics-server
terraform destroy -auto-approve

pio2pio · 2019-12-30T16:59:26Z

after removing tiller pod manually, terraform was unable to refresh the state got stack in
module.kubernetes.module.config.module.gitlab-ci.helm_release.gitlab[0]: Refreshing state... [id=gitlab-runner]

Removing missing resource from the state file resolved the issue

$ terraform state rm module.kubernetes.module.config.module.gitlab-ci.helm_release.gitlab[0]
Removed module.kubernetes.module.config.module.gitlab-ci.helm_release.gitlab[0]
Successfully removed 1 resource instance(s).

nidhi5885 · 2020-01-21T07:09:07Z

The issue is intermittent, I am also facing the same.

mcuadros · 2020-04-10T18:37:43Z

Closing this issue since is making reference to a version based on Helm 2, if this is still valid to the master branch please reopen it. Thanks.

ghost · 2020-05-11T05:13:00Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

mcuadros closed this as completed Apr 10, 2020

ghost locked and limited conversation to collaborators May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan stalls due to failed tiller during `helm_resource` state refresh #315

Plan stalls due to failed tiller during `helm_resource` state refresh #315

joatmon08 commented Aug 7, 2019

alexsomesan commented Aug 8, 2019

mmclane commented Oct 10, 2019

fl-max commented Oct 31, 2019 •

edited

zzzuzik commented Oct 31, 2019

ryudice commented Nov 4, 2019

zzzuzik commented Nov 4, 2019 •

edited

suheb commented Nov 7, 2019

cliedeman commented Nov 7, 2019

ArthurSens commented Dec 4, 2019 •

edited

pio2pio commented Dec 30, 2019 •

edited

nidhi5885 commented Jan 21, 2020

mcuadros commented Apr 10, 2020

ghost commented May 11, 2020

Plan stalls due to failed tiller during helm_resource state refresh #315

Plan stalls due to failed tiller during helm_resource state refresh #315

Comments

joatmon08 commented Aug 7, 2019

Terraform Version

Affected Resource(s)

Terraform Configuration Files

main.tf

module file (located in ./helm-consul)

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

alexsomesan commented Aug 8, 2019

mmclane commented Oct 10, 2019

fl-max commented Oct 31, 2019 • edited

zzzuzik commented Oct 31, 2019

ryudice commented Nov 4, 2019

zzzuzik commented Nov 4, 2019 • edited

suheb commented Nov 7, 2019

cliedeman commented Nov 7, 2019

ArthurSens commented Dec 4, 2019 • edited

pio2pio commented Dec 30, 2019 • edited

nidhi5885 commented Jan 21, 2020

mcuadros commented Apr 10, 2020

ghost commented May 11, 2020

Plan stalls due to failed tiller during `helm_resource` state refresh #315

Plan stalls due to failed tiller during `helm_resource` state refresh #315

module file (located in `./helm-consul`)

fl-max commented Oct 31, 2019 •

edited

zzzuzik commented Nov 4, 2019 •

edited

ArthurSens commented Dec 4, 2019 •

edited

pio2pio commented Dec 30, 2019 •

edited