connection refused error caused by missing 'Authorization' header in request to Kubernetes API #1152

taylorturner · 2021-02-05T20:00:43Z

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 0.14.2
Kubernetes provider version: 1.13.3
Kubernetes version: 1.18.14

Affected Resource(s)

All Kubernetes resources deployed by Terraform.

Terraform Configuration Files

AKS module output (used by Kubernetes provider)

output "cluster_hostname" {
  value = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.host
}

output "cluster_ca_certificate" {
  value = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.cluster_ca_certificate
}

output "cluster_password" {
  value = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.password
}

Deployment file main.tf

provider "kubernetes" {
  load_config_file       = false
  host                   = module.aks.cluster_hostname
  cluster_ca_certificate = base64decode(module.aks.cluster_ca_certificate)
  token                  = module.aks.cluster_password
}

module "aks" {
  source  = "git@github.com:****/terraform-azure-aks.git"

  providers = {
    azurerm = azurerm
    azuread = azuread
  }

  create_aks    = true # this should tear down all resources if set to false
  # deploy_cluster = false

  cluster_name        = var.cluster_name
  resource_group_name = var.cluster_name
  cluster_dns_name    = var.cluster_dns_name
  location            = var.location
  kubernetes_version  = var.kubernetes_version

  node_count     = 6
  node_size      = "Standard_E4_v3"
  node_disk_size = 60 # size in GB

  custom_tags = var.tags
}

module "k8s" {
  source  = "git@github.com:****/terraform-azure-k8s.git"

  providers  = {
    kubernetes = kubernetes
    kubectl    = kubectl
  }
  depends_on = [module.aks]

  create_k8s = true
  create_pvs = true # set to false, apply, then set back to true to refresh stuck PVs

  cluster_name = var.cluster_name
  region       = var.location

  storage_account_name = var.storage_account_name
  storage_account_key  = module.storage.storage_account_key
  namespace_depends_on = module.aks.wait_for_kubeconfig

  azure_disk_mariadb_id           = module.storage.azure_disk_mariadb_id
  azure_disk_mariadb_name         = module.storage.azure_disk_mariadb_name
  azure_disk_postgres_id          = module.storage.azure_disk_postgres_id
  azure_disk_postgres_name        = module.storage.azure_disk_postgres_name
  azure_disk_messagebus_id        = module.storage.azure_disk_messagebus_id
  azure_disk_messagebus_name      = module.storage.azure_disk_messagebus_name
  azure_disk_templatejournal_id   = module.storage.azure_disk_templatejournal_id
  azure_disk_templatejournal_name = module.storage.azure_disk_templatejournal_name
  azure_disk_redisgraph_id        = module.storage.azure_disk_redisgraph_id
  azure_disk_redisgraph_name      = module.storage.azure_disk_redisgraph_name

  registry_auth = var.registry_auth
}

Debug Output

I have 1 deployment that's working and 4 that are failing
Failing deployment output: https://gist.github.com/taylorturner/ab4ddf84e89bd646b984019b9f35eb13
Succeeding deployment output: https://gist.github.com/taylorturner/54f608a01351a44cc2c200adeb7a0843

Steps to Reproduce

terraform plan

Expected Behavior

The kubernetes provider should be passing the Authorization header to the Kubernetes API.

Actual Behavior

It isn't doing that.

Important Factoids

Factoid 1
We're using a service principal to connect to the Azure providers. However, the AKS cluster has Azure AD RBAC integration enabled for authenticating kubectl. We have a 'Cluster Super Admin' AD group that's mapped to the 'cluster-admin' ClusterRole which is configured as the admin for the cluster. That's why we're using the admin context outputs.

Factoid 2
We've been doing a SPIKE on leveraging Terraform Cloud. I had all my deployments working in Terraform Cloud using remote runners. Then I ran into a bug with the Terraform Cloud Private Module Registry. Since then I've been slowly unraveling the Terraform Cloud changes trying to get everything back into a working state.

As of right now, all my workspaces are configured for local execution and the modules are sourced from Github. So far the Terraform Cloud product seems much less "mature" than I was hoping for.

References

Not related to this necessarily, but I opened a bug report regarding the bug I found in Terraform Cloud. hashicorp/terraform#27695

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

taylorturner · 2021-02-05T20:14:43Z

TL;DR

Here's the debug output for the failing deployment, specific to the provider REQUEST:

2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: ---[ REQUEST ]---------------------------------------
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: GET /apis/rbac.authorization.k8s.io/v1/clusterroles/default-reader HTTP/1.1
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Host: localhost
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: User-Agent: HashiCorp/1.0 Terraform/0.14.2
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Accept: application/json, */*
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Accept-Encoding: gzip

Compared to the working deployment, notice the Authorization:

2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: ---[ REQUEST ]---------------------------------------
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: GET /apis/rbac.authorization.k8s.io/v1/namespaces/element-platform/rolebindings/element-reader-binding HTTP/1.1
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Host: staging-platform-eec6a4e5.hcp.eastus.azmk8s.io:443
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: User-Agent: HashiCorp/1.0 Terraform/0.14.2
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Accept: application/json, */*
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Authorization: Bearer 5f5493f4807fa16be36cb1ad4171b512a56ed9def2b1dcb98259e985138695e0f55ecd72d93cd7ce46ac514e553b8eda555d1e7bec1ec72eb20533e8bd7f96e5
Encoding: gzip

dak1n1 · 2021-02-09T22:58:12Z

I took a quick look at the output from the failing debug log. Offhand, it looks a bit like what I ran into when using the Kubernetes provider with AKS. It can be tricky to get the dependencies right when creating a cluster and creating Kubernetes resources in a single apply. It doesn't work for every scenario, and may require a work-around.

Can you try using a depends_on with your azurerm data source? Like this:

data "azurerm_kubernetes_cluster" "this" {
  depends_on          = [module.aks]
...
}

I'm asking because I see that data.azurerm_kubernetes_cluster.this.kube_admin_config.0.password (output from the AKS module) is being passed into the Kubernetes provider. There's a good chance that value is outdated, or even doesn't exist at the time of the Kubernetes provider being initialized.

We have a working example for AKS with a short guide, in case it helps to view our current limitations and work-arounds regarding stacking resources like this in a single apply.

Alternatively, if you completely separate the Kubernetes resources from the cluster infrastructure resources, and use two applies, that will always work. The guide I linked above will give you the exact commands needed, but I'll post it here too:

terraform apply -target=module.aks

taylorturner · 2021-02-10T22:28:54Z

@dak1n1 thanks for the response!

I am aware that the recommendation is to isolate the cluster creation from the cluster configuration, but it has been working up until this point.

Following your recommendation, I pulled the data object out of the module and put it in my main.tf deployment file. I also added the depends_on and things are working much better.

I'm happy to close this issue now.

aareet · 2021-02-10T22:50:48Z

Thanks for getting back to us - in that case, I'll close this issue :)

taylorturner · 2021-02-12T18:44:16Z

I'm still having some weird issues with this provider and I can't figure out why.

Here's the error from Terraform Cloud:

Here's the config that works (and also randomly fails):

data "azurerm_kubernetes_cluster" "this" {
  name                = var.cluster_name
  resource_group_name = var.resource_group_name
  depends_on          = [module.aks]
}

provider "kubernetes" {
  load_config_file       = false
  host                   = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.host
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.this.kube_admin_config.0.cluster_ca_certificate)
  token                  = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.password
}

Here's the process I've been having to follow every day to fix this:

Switch the Terraform Cloud execution mode to 'local'.
Run terraform plan locally, same 'connection refused' error.
Change 'kubernetes provider' arg load_config_file = true.
Run terraform plan, succeeds - no changes.
Change 'kubernetes provider' arg load_config_file = false.
Run terraform plan, succeeds - no changes.
7.' Switch execution mode to 'remote'
Push changes to Github, manually trigger a terraform planin Terraform Cloud. Succeeds.

This is how I got all my workspaces to a working state, which prompted me to close this issue. However, the next day I pushed a change for Terraform to modify a date within a tag on the AKS cluster and they all went back to the same 'connection refused' deadlocked state.

What I'm thinking might be a better way to make this more permanent is to have Terraform write out a kubeconfig after deploying the cluster, then telling the provider to use those details. Any other ideas or thoughts? Happy to provide debug output for the above chain of commands.

Any time there is a 'connection refused' I'll usually find a line in the debug output that states "Provider configuration has failed to load." (along those lines) which then causes the Authorization header to be missing.

taylorturner · 2021-02-12T18:44:43Z

@aareet Can we possibly reopen this issue?

apeschel · 2022-02-19T00:54:24Z

I think this is a copy of #1307

github-actions · 2023-02-20T00:00:39Z

Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!

taylorturner added the bug label Feb 5, 2021

dak1n1 mentioned this issue Feb 10, 2021

Removing state of dependent providers (partial fix for progressive apply) hashicorp/terraform#27728

Closed

jrhouston added the waiting-response label Feb 10, 2021

ghost removed waiting-response labels Feb 10, 2021

aareet closed this as completed Feb 10, 2021

aareet reopened this Feb 12, 2021

dak1n1 added the theme/auth label Mar 22, 2021

ElizabethStirling mentioned this issue Apr 8, 2021

Intermittent Terraform failures on k8s provider sourcegraph/sourcegraph#19857

Closed

dak1n1 mentioned this issue Jul 15, 2021

A way to refresh provider credentials hashicorp/terraform#29182

Open

github-actions bot added the stale label Feb 20, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 22, 2023

github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connection refused error caused by missing 'Authorization' header in request to Kubernetes API #1152

connection refused error caused by missing 'Authorization' header in request to Kubernetes API #1152

taylorturner commented Feb 5, 2021

taylorturner commented Feb 5, 2021

dak1n1 commented Feb 9, 2021

taylorturner commented Feb 10, 2021

aareet commented Feb 10, 2021

taylorturner commented Feb 12, 2021

taylorturner commented Feb 12, 2021

apeschel commented Feb 19, 2022

github-actions bot commented Feb 20, 2023

connection refused error caused by missing 'Authorization' header in request to Kubernetes API #1152

connection refused error caused by missing 'Authorization' header in request to Kubernetes API #1152

Comments

taylorturner commented Feb 5, 2021

Terraform Version, Provider Version and Kubernetes Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Steps to Reproduce

Expected Behavior

Actual Behavior

Important Factoids

References

Community Note

taylorturner commented Feb 5, 2021

dak1n1 commented Feb 9, 2021

taylorturner commented Feb 10, 2021

aareet commented Feb 10, 2021

taylorturner commented Feb 12, 2021

taylorturner commented Feb 12, 2021

apeschel commented Feb 19, 2022

github-actions bot commented Feb 20, 2023