Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Recreate Causes Kustomize Module To Fail #156

Open
Spazzy757 opened this issue Jan 13, 2021 · 6 comments
Open

Cluster Recreate Causes Kustomize Module To Fail #156

Spazzy757 opened this issue Jan 13, 2021 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@Spazzy757
Copy link
Contributor

Spazzy757 commented Jan 13, 2021

Problem

We are recreating our cluster to enable the Private Node Pools, the issue seems to be that because we are recreating the cluster, the Kustomize Provider is trying to communicate with the Kubernetes Cluster on the default Localhost

Logs

Error: ResourceDiff: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused

  on .terraform/modules/gke_zero/common/cluster_services/main.tf line 16, in resource "kustomization_resource" "current":
  16: resource "kustomization_resource" "current" {


Error: Process completed with exit code 1.

Steps To Reproduce

Create a cluster with the setting:

enable_private_nodes = false

Then Once created change the value:

enable_private_nodes = true

and run on that TF workspace:

terraform plan

Workaround

Currently there is a workaround by using:

terraform apply --target=<cluster module>

This will update the cluster which should then fix the problem

@pst
Copy link
Member

pst commented Jan 18, 2021

This needs more investigation, but I've also seen this myself. The module receives the credentials the cluster resources output as an input. My preliminary investigation of this issue suggests that somehow when creating a plan to create cluster and cluster services, Terraform has the dependency graph correct. Also on destroy, the order seems correct, K8s resources first, then the cluster. But if the cluster gets destroyed and recreated, the graph does not first destroy the K8s resources, then destroy the cluster, then recreate the cluster and finally recreate the resources. That means the resources stay in the state but there are no cluster credentials to refresh them during plan.

@pst pst added the bug Something isn't working label Jan 20, 2021
@pst pst self-assigned this Jan 20, 2021
@pst
Copy link
Member

pst commented Jan 26, 2021

To make it easier to understand, I created a simple config to reproduce the issue. https://github.com/pst/debugrecreateplan

The example repo shows the behaviour with both the official kubernetes provider as well as my kustomize provider on top of a KinD cluster. So it's also not Google provider specific.

And so far it seems to support my theory. Create and destroy plans correctly handle resources and clusters. But destroy and re-create plans do not handle the K8s resources on the cluster at all.

Create plan

[pst@pst-ryzen5 kind-kustomize]$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.kustomization.current: Refreshing state...

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # kind_cluster.current will be created
  + resource "kind_cluster" "current" {
      + client_certificate     = (known after apply)
      + client_key             = (known after apply)
      + cluster_ca_certificate = (known after apply)
      + endpoint               = (known after apply)
      + id                     = (known after apply)
      + kubeconfig             = (known after apply)
      + kubeconfig_path        = (known after apply)
      + name                   = "debug-kind-kustomize"
      + node_image             = (known after apply)
      + wait_for_ready         = false

      + kind_config {
          + api_version = "kind.x-k8s.io/v1alpha4"
          + kind        = "Cluster"

          + node {
              + role = "control-plane"
            }
          + node {
              + role = "worker"
            }
        }
    }

  # kustomization_resource.current["~G_v1_Namespace|~X|debug"] will be created
  + resource "kustomization_resource" "current" {
      + id       = (known after apply)
      + manifest = jsonencode(
            {
              + apiVersion = "v1"
              + kind       = "Namespace"
              + metadata   = {
                  + creationTimestamp = null
                  + name              = "debug"
                }
              + spec       = {}
              + status     = {}
            }
        )
    }

Plan: 2 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Destroy plan

[pst@pst-ryzen5 kind-kustomize]$ terraform plan --destroy
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

kind_cluster.current: Refreshing state... [id=debug-kind-kustomize-]
data.kustomization.current: Refreshing state... [id=5ffdb4bad7b4e2b4bd9a26a69a96e21e37a92301ca7108f731dc120dd806d5a2ec22feaaf104d9ad23dca0be7b50aaf0d0587f26a19df5dcd053d4eef745b704]
kustomization_resource.current["~G_v1_Namespace|~X|debug"]: Refreshing state... [id=094e469a-08f9-47e4-a9f3-a39ae8268a89]

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # kind_cluster.current will be destroyed
  - resource "kind_cluster" "current" {
      - client_certificate     = <<~EOT
            ...
        EOT -> null
      - client_key             = <<~EOT
            ...
        EOT -> null
      - cluster_ca_certificate = <<~EOT
            ...
        EOT -> null
      - endpoint               = "https://127.0.0.1:44033" -> null
      - id                     = "debug-kind-kustomize-" -> null
      - kubeconfig             = <<~EOT
            ...
        EOT -> null
      - kubeconfig_path        = "/home/pst/Code/pst/debugrecreateplan/kind-kustomize/debug-kind-kustomize-config" -> null
      - name                   = "debug-kind-kustomize" -> null
      - wait_for_ready         = false -> null

      - kind_config {
          - api_version               = "kind.x-k8s.io/v1alpha4" -> null
          - containerd_config_patches = [] -> null
          - kind                      = "Cluster" -> null

          - node {
              - kubeadm_config_patches = [] -> null
              - role                   = "control-plane" -> null
            }
          - node {
              - kubeadm_config_patches = [] -> null
              - role                   = "worker" -> null
            }
        }
    }

  # kustomization_resource.current["~G_v1_Namespace|~X|debug"] will be destroyed
  - resource "kustomization_resource" "current" {
      - id       = "094e469a-08f9-47e4-a9f3-a39ae8268a89" -> null
      - manifest = jsonencode(
            {
              - apiVersion = "v1"
              - kind       = "Namespace"
              - metadata   = {
                  - creationTimestamp = null
                  - name              = "debug"
                }
              - spec       = {}
              - status     = {}
            }
        ) -> null
    }

Plan: 0 to add, 0 to change, 2 to destroy.

------------------------------------------------------------------------

Destroy & recreate plan

Triggered by changing node_count in main.tf. Does not include the K8s namespace.

[pst@pst-ryzen5 kind-kustomize]$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

kind_cluster.current: Refreshing state... [id=debug-kind-kustomize-]
data.kustomization.current: Refreshing state... [id=5ffdb4bad7b4e2b4bd9a26a69a96e21e37a92301ca7108f731dc120dd806d5a2ec22feaaf104d9ad23dca0be7b50aaf0d0587f26a19df5dcd053d4eef745b704]
kustomization_resource.current["~G_v1_Namespace|~X|debug"]: Refreshing state... [id=094e469a-08f9-47e4-a9f3-a39ae8268a89]

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # kind_cluster.current must be replaced
-/+ resource "kind_cluster" "current" {
      ~ client_certificate     = <<~EOT
            ...
        EOT -> (known after apply)
      ~ client_key             = <<~EOT
            ...
        EOT -> (known after apply)
      ~ cluster_ca_certificate = <<~EOT
            ...
        EOT -> (known after apply)
      ~ endpoint               = "https://127.0.0.1:44033" -> (known after apply)
      ~ id                     = "debug-kind-kustomize-" -> (known after apply)
      ~ kubeconfig             = <<~EOT
            ...
        EOT -> (known after apply)
      ~ kubeconfig_path        = "/home/pst/Code/pst/debugrecreateplan/kind-kustomize/debug-kind-kustomize-config" -> (known after apply)
        name                   = "debug-kind-kustomize"
      + node_image             = (known after apply)
        wait_for_ready         = false

      ~ kind_config {
            api_version               = "kind.x-k8s.io/v1alpha4"
          - containerd_config_patches = [] -> null
            kind                      = "Cluster"

          ~ node { # forces replacement
              - kubeadm_config_patches = [] -> null
                role                   = "control-plane"
            }
          ~ node { # forces replacement
              - kubeadm_config_patches = [] -> null
                role                   = "worker"
            }
          + node { # forces replacement
              + role = "worker" # forces replacement
            }
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

------------------------------------------------------------------------

@pst
Copy link
Member

pst commented Jan 26, 2021

Likely related upstream issue: hashicorp/terraform#22572

@Spazzy757
Copy link
Contributor Author

This seems to be fixed in the last couple of times I've used it, I'll do a test to make sure

@pst
Copy link
Member

pst commented May 17, 2021

This is definitly still an issue and it's not Kubestack specific, but generally an issue with Terraform.

I hope moving away from the in-module manifests and towards the new native modules may make the issue less frequent. But even then, e.g. the auth-configmap for EKS in the module may still cause this.

The only real workaround is a --target to deploy the changes to the cluster individually. Which is a bummer because this breaks automation. However, recreating the cluster is a disruptive change and should be rare for most teams.

@Spazzy757
Copy link
Contributor Author

Ah right, I recreated a GKE cluster that had a couple of manifests and it didn't break, was hoping that meant it was fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants