Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection refused error caused by missing 'Authorization' header in request to Kubernetes API #1152

Closed
taylorturner opened this issue Feb 5, 2021 · 8 comments

Comments

@taylorturner
Copy link

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 0.14.2
Kubernetes provider version: 1.13.3
Kubernetes version: 1.18.14

Affected Resource(s)

All Kubernetes resources deployed by Terraform.

Terraform Configuration Files

AKS module output (used by Kubernetes provider)

output "cluster_hostname" {
  value = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.host
}

output "cluster_ca_certificate" {
  value = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.cluster_ca_certificate
}

output "cluster_password" {
  value = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.password
}

Deployment file main.tf

provider "kubernetes" {
  load_config_file       = false
  host                   = module.aks.cluster_hostname
  cluster_ca_certificate = base64decode(module.aks.cluster_ca_certificate)
  token                  = module.aks.cluster_password
}

module "aks" {
  source  = "git@github.com:****/terraform-azure-aks.git"

  providers = {
    azurerm = azurerm
    azuread = azuread
  }

  create_aks    = true # this should tear down all resources if set to false
  # deploy_cluster = false

  cluster_name        = var.cluster_name
  resource_group_name = var.cluster_name
  cluster_dns_name    = var.cluster_dns_name
  location            = var.location
  kubernetes_version  = var.kubernetes_version

  node_count     = 6
  node_size      = "Standard_E4_v3"
  node_disk_size = 60 # size in GB

  custom_tags = var.tags
}

module "k8s" {
  source  = "git@github.com:****/terraform-azure-k8s.git"

  providers  = {
    kubernetes = kubernetes
    kubectl    = kubectl
  }
  depends_on = [module.aks]

  create_k8s = true
  create_pvs = true # set to false, apply, then set back to true to refresh stuck PVs

  cluster_name = var.cluster_name
  region       = var.location

  storage_account_name = var.storage_account_name
  storage_account_key  = module.storage.storage_account_key
  namespace_depends_on = module.aks.wait_for_kubeconfig

  azure_disk_mariadb_id           = module.storage.azure_disk_mariadb_id
  azure_disk_mariadb_name         = module.storage.azure_disk_mariadb_name
  azure_disk_postgres_id          = module.storage.azure_disk_postgres_id
  azure_disk_postgres_name        = module.storage.azure_disk_postgres_name
  azure_disk_messagebus_id        = module.storage.azure_disk_messagebus_id
  azure_disk_messagebus_name      = module.storage.azure_disk_messagebus_name
  azure_disk_templatejournal_id   = module.storage.azure_disk_templatejournal_id
  azure_disk_templatejournal_name = module.storage.azure_disk_templatejournal_name
  azure_disk_redisgraph_id        = module.storage.azure_disk_redisgraph_id
  azure_disk_redisgraph_name      = module.storage.azure_disk_redisgraph_name

  registry_auth = var.registry_auth
}

Debug Output

I have 1 deployment that's working and 4 that are failing
Failing deployment output: https://gist.github.com/taylorturner/ab4ddf84e89bd646b984019b9f35eb13
Succeeding deployment output: https://gist.github.com/taylorturner/54f608a01351a44cc2c200adeb7a0843

Steps to Reproduce

  1. terraform plan

Expected Behavior

The kubernetes provider should be passing the Authorization header to the Kubernetes API.

Actual Behavior

It isn't doing that.

Important Factoids

Factoid 1
We're using a service principal to connect to the Azure providers. However, the AKS cluster has Azure AD RBAC integration enabled for authenticating kubectl. We have a 'Cluster Super Admin' AD group that's mapped to the 'cluster-admin' ClusterRole which is configured as the admin for the cluster. That's why we're using the admin context outputs.

Factoid 2
We've been doing a SPIKE on leveraging Terraform Cloud. I had all my deployments working in Terraform Cloud using remote runners. Then I ran into a bug with the Terraform Cloud Private Module Registry. Since then I've been slowly unraveling the Terraform Cloud changes trying to get everything back into a working state.

As of right now, all my workspaces are configured for local execution and the modules are sourced from Github. So far the Terraform Cloud product seems much less "mature" than I was hoping for.

References

Not related to this necessarily, but I opened a bug report regarding the bug I found in Terraform Cloud. hashicorp/terraform#27695

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@taylorturner
Copy link
Author

TL;DR

Here's the debug output for the failing deployment, specific to the provider REQUEST:

2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: ---[ REQUEST ]---------------------------------------
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: GET /apis/rbac.authorization.k8s.io/v1/clusterroles/default-reader HTTP/1.1
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Host: localhost
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: User-Agent: HashiCorp/1.0 Terraform/0.14.2
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Accept: application/json, */*
2021-02-05T11:12:48.025-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Accept-Encoding: gzip

Compared to the working deployment, notice the Authorization:

2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: ---[ REQUEST ]---------------------------------------
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: GET /apis/rbac.authorization.k8s.io/v1/namespaces/element-platform/rolebindings/element-reader-binding HTTP/1.1
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Host: staging-platform-eec6a4e5.hcp.eastus.azmk8s.io:443
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: User-Agent: HashiCorp/1.0 Terraform/0.14.2
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Accept: application/json, */*
2021-02-05T11:25:41.148-0800 [DEBUG] plugin.terraform-provider-kubernetes_v1.13.3_x4: Authorization: Bearer 5f5493f4807fa16be36cb1ad4171b512a56ed9def2b1dcb98259e985138695e0f55ecd72d93cd7ce46ac514e553b8eda555d1e7bec1ec72eb20533e8bd7f96e5
Encoding: gzip

@dak1n1
Copy link
Contributor

dak1n1 commented Feb 9, 2021

I took a quick look at the output from the failing debug log. Offhand, it looks a bit like what I ran into when using the Kubernetes provider with AKS. It can be tricky to get the dependencies right when creating a cluster and creating Kubernetes resources in a single apply. It doesn't work for every scenario, and may require a work-around.

Can you try using a depends_on with your azurerm data source? Like this:

data "azurerm_kubernetes_cluster" "this" {
  depends_on          = [module.aks]
...
}

I'm asking because I see that data.azurerm_kubernetes_cluster.this.kube_admin_config.0.password (output from the AKS module) is being passed into the Kubernetes provider. There's a good chance that value is outdated, or even doesn't exist at the time of the Kubernetes provider being initialized.

We have a working example for AKS with a short guide, in case it helps to view our current limitations and work-arounds regarding stacking resources like this in a single apply.

Alternatively, if you completely separate the Kubernetes resources from the cluster infrastructure resources, and use two applies, that will always work. The guide I linked above will give you the exact commands needed, but I'll post it here too:

terraform apply -target=module.aks

@taylorturner
Copy link
Author

@dak1n1 thanks for the response!

I am aware that the recommendation is to isolate the cluster creation from the cluster configuration, but it has been working up until this point.

Following your recommendation, I pulled the data object out of the module and put it in my main.tf deployment file. I also added the depends_on and things are working much better.

I'm happy to close this issue now.

@ghost ghost removed waiting-response labels Feb 10, 2021
@aareet
Copy link
Member

aareet commented Feb 10, 2021

Thanks for getting back to us - in that case, I'll close this issue :)

@aareet aareet closed this as completed Feb 10, 2021
@taylorturner
Copy link
Author

I'm still having some weird issues with this provider and I can't figure out why.

Here's the error from Terraform Cloud:
image

Here's the config that works (and also randomly fails):

data "azurerm_kubernetes_cluster" "this" {
  name                = var.cluster_name
  resource_group_name = var.resource_group_name
  depends_on          = [module.aks]
}

provider "kubernetes" {
  load_config_file       = false
  host                   = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.host
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.this.kube_admin_config.0.cluster_ca_certificate)
  token                  = data.azurerm_kubernetes_cluster.this.kube_admin_config.0.password
}

Here's the process I've been having to follow every day to fix this:

  1. Switch the Terraform Cloud execution mode to 'local'.
  2. Run terraform plan locally, same 'connection refused' error.
  3. Change 'kubernetes provider' arg load_config_file = true.
  4. Run terraform plan, succeeds - no changes.
  5. Change 'kubernetes provider' arg load_config_file = false.
  6. Run terraform plan, succeeds - no changes.
    7.' Switch execution mode to 'remote'
  7. Push changes to Github, manually trigger a terraform planin Terraform Cloud. Succeeds.

This is how I got all my workspaces to a working state, which prompted me to close this issue. However, the next day I pushed a change for Terraform to modify a date within a tag on the AKS cluster and they all went back to the same 'connection refused' deadlocked state.

What I'm thinking might be a better way to make this more permanent is to have Terraform write out a kubeconfig after deploying the cluster, then telling the provider to use those details. Any other ideas or thoughts? Happy to provide debug output for the above chain of commands.

Any time there is a 'connection refused' I'll usually find a line in the debug output that states "Provider configuration has failed to load." (along those lines) which then causes the Authorization header to be missing.

@taylorturner
Copy link
Author

@aareet Can we possibly reopen this issue?

@apeschel
Copy link

I think this is a copy of #1307

@github-actions
Copy link

Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!

@github-actions github-actions bot added the stale label Feb 20, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants