Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing Kubernetes resources after cluster creation fails with "no such host" #500

Closed
jgillich opened this issue Jun 14, 2020 · 22 comments · Fixed by #504
Closed

Accessing Kubernetes resources after cluster creation fails with "no such host" #500

jgillich opened this issue Jun 14, 2020 · 22 comments · Fixed by #504
Labels
k8s Kubernetes Kapsule issues, bugs and feature requests

Comments

@jgillich
Copy link

Creating a fresh cluster and then accessing it via the Kubernetes provider fails with a "no such host" error. It seems like there needs to be some check in place to make sure the cluster is fully created and its API is accessible.

Example:

resource "scaleway_k8s_cluster_beta" "example" {
  name             = var.kubernetes_cluster_name
  version          = "1.18.3"
  cni              = "cilium"
  enable_dashboard = true
}

resource "scaleway_k8s_pool_beta" "default" {
  cluster_id = scaleway_k8s_cluster_beta.example.id
  name       = "default"
  node_type  = "DEV1-M"
  size       = 1
}

provider "kubernetes" {
  load_config_file = "false"
  host             = scaleway_k8s_cluster_beta.example.kubeconfig[0].host
  token            = scaleway_k8s_cluster_beta.example.kubeconfig[0].token
  cluster_ca_certificate = base64decode(
    scaleway_k8s_cluster_beta.example.kubeconfig[0].cluster_ca_certificate
  )
  version = "~> 1.11"
}

resource "kubernetes_namespace" "cert_manager" {
  metadata {
    name = "cert-manager"
  }
}

Result:

�Terraform v0.12.26
Initializing plugins and modules...
2020/06/14 03:38:12 [DEBUG] Using modified User-Agent: Terraform/0.12.26 TFC/ad37d0d407
scaleway_k8s_cluster_beta.example: Creating...
scaleway_k8s_cluster_beta.example: Creation complete after 7s [id=fr-par/a1648e5d-3b7c-4845-a7a5-42d40874ede7]
scaleway_k8s_pool_beta.default: Creating...
kubernetes_service.example: Creating...
kubernetes_namespace.cert_manager: Creating...
kubernetes_secret.gitlab: Creating...
kubernetes_ingress.example: Creating...
kubernetes_namespace.example: Creating...
kubernetes_namespace.kong: Creating...
kubernetes_namespace.metallb: Creating...
kubernetes_namespace.wave: Creating...
kubernetes_deployment.example: Creating...
helm_release.cert_manager: Creating...
helm_release.wave: Creating...
scaleway_k8s_pool_beta.default: Still creating... [10s elapsed]
scaleway_k8s_pool_beta.default: Still creating... [20s elapsed]
scaleway_k8s_pool_beta.default: Still creating... [30s elapsed]
scaleway_k8s_pool_beta.default: Still creating... [40s elapsed]
scaleway_k8s_pool_beta.default: Still creating... [50s elapsed]
scaleway_k8s_pool_beta.default: Creation complete after 56s [id=fr-par/f061fbbb-b29e-41cf-8d80-9bf45520d881]
kubernetes_config_map.metallb: Creating...

Error: Post "https://a1648e5d-3b7c-4845-a7a5-42d40874ede7.api.k8s.fr-par.scw.cloud:6443/api/v1/namespaces": dial tcp: lookup a1648e5d-3b7c-4845-a7a5-42d40874ede7.api.k8s.fr-par.scw.cloud on 127.0.0.53:53: no such host

  on cert-manager.tf line 1, in resource "kubernetes_namespace" "cert_manager":
   1: resource "kubernetes_namespace" "cert_manager" {

Terraform Version

Terraform v0.12.26

  • provider.kubernetes v1.11.3
  • provider.scaleway v1.15.0
@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 14, 2020

Hey 👋

Looks like it's DNS propagation 😅 just to confirm could you try again but with ns0.online.net as your DNS server?

@jgillich
Copy link
Author

jgillich commented Jun 14, 2020

Yup, you're right. I'm using Terraform Cloud so I couldn't easily change the nameserver, but this works:

@jgillich
Copy link
Author

Errrm scratch that, it actually doesn't. I just didn't wait for it to timeout, my bad.

@jgillich
Copy link
Author

jgillich commented Jun 14, 2020

Ok so I tried a bunch of things and I don't think I can use that DNS server because it doesn't even resolve the Scaleway API:

$ nslookup api.scaleway.com 195.154.228.249
Server:		195.154.228.249
Address:	195.154.228.249#53

** server can't find api.scaleway.com: NXDOMAIN

Apparently fallback to the secondary DNS server doesn't work either. 🤷‍♂️

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 14, 2020

Huh that's really weird 🤔what about 1.1.1.1 ?

@jgillich
Copy link
Author

Nope:

Error: Post "https://f43c8624-f2de-475b-a916-f02192155e9a.api.k8s.fr-par.scw.cloud:6443/api/v1/namespaces": dial tcp: lookup f43c8624-f2de-475b-a916-f02192155e9a.api.k8s.fr-par.scw.cloud on 1.1.1.1:53: no such host

Same with Google DNS. Does this work for you?

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 14, 2020

Now it's deleted so it doesn't work 😅

@jgillich
Copy link
Author

Oh 😄 I didn't mean that URL literally, it does start working shortly after creation (DNS propagation I guess). I meant the example from above, with Cloudflare or Google DNS, when Terraform instantly tries to access the API.

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 14, 2020

Huh, I just created a cluster and I can resolve the URL on 1.1.1.1 even before the cluster is ready 🤔

@jgillich
Copy link
Author

Interesting. When I create a cluster in the web UI and don't even wait for the progress indicator to finish and copy the domain, it works:

$ nslookup ac7b202b-cc21-4810-8ee1-4c97462d35fc.api.k8s.fr-par.scw.cloud
Server:		1.1.1.1
Address:	1.1.1.1#53

Non-authoritative answer:
Name:	ac7b202b-cc21-4810-8ee1-4c97462d35fc.api.k8s.fr-par.scw.cloud
Address: 51.159.75.225

But when the cluster is created via terraform / the API and I copy the domain from the very same web UI:

$ nslookup c591f269-8f5b-4606-9e70-906a5b70b2f2.api.k8s.fr-par.scw.cloud
Server:		1.1.1.1
Address:	1.1.1.1#53

** server can't find c591f269-8f5b-4606-9e70-906a5b70b2f2.api.k8s.fr-par.scw.cloud: NXDOMAIN

😕

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 14, 2020

I think I see the problem. Quick solution, add:

  depends_on = [
    scaleway_k8s_pool_beta.default
  ]

in the kubernetes_namespace object.

It's because the cluster is first created in pool_required state since there is no pool. At this point you can download the kubeconfig, but the IP doesnt exists since the cluster is in pool_required state. But for the kubernetes provider the cluster is already ready. (and it becomes ready when the pool is created).

I'll have to think about how to manage this case 🤔

@jgillich
Copy link
Author

Ah, that makes sense. TF doesn't have any specific order of execution but the web UI instantly creates the pool. Adding a depends_on to every single resource is a pretty poor workaround though (sadly can't have it on the kubernetes provider).

The solution may be to turn scaleway_k8s_pool_beta into a no-op and make it an input of scaleway_k8s_cluster_beta?

resource "scaleway_k8s_pool_beta" "foo" {
  name       = "foo"
}

resource "scaleway_k8s_pool_beta" "bar" {
  name       = "bar"
}

resource "scaleway_k8s_cluster_beta" "example" {
  name = "example"
  pools = [
    scaleway_k8s_pool_beta.foo,
    scaleway_k8s_pool_beta.bar
  ]
}

Like that? Because with how it's currently designed, TF will simply not wait for the pool to be created.

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 14, 2020

We need a solution that's sure, the workaround is ugly 😅 though I'm not fond of the solution of giving the pools in the cluster 🤔 I'll discuss it with the team responsible for the provider next week!

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 14, 2020

A simpler workaround may be to split the cluster logic and the k8s logic

@jgillich
Copy link
Author

jgillich commented Jun 14, 2020

Sounds good!

I found a much better workaround, I can just reference the pool on the Kubernetes provider and it behaves similar to depends_on:

provider "kubernetes" {
  load_config_file = false
  host             = "${scaleway_k8s_pool_beta.default.cluster_id  == "" ? "" : ""}${scaleway_k8s_cluster_beta.example.kubeconfig[0].host}" # ugly hack
  token            = scaleway_k8s_cluster_beta.example.kubeconfig[0].token
  cluster_ca_certificate = base64decode(
    scaleway_k8s_cluster_beta.example.kubeconfig[0].cluster_ca_certificate
  )
  version = "~> 1.11"
}

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 14, 2020

It's a bit less messy but still hacky 😅
Having hashicorp/terraform#2430 could be nice though

@jerome-quere
Copy link
Contributor

jerome-quere commented Jun 17, 2020

Another solution not much cleaner could be

resource "scaleway_k8s_cluster_beta" "example" {}
resource "scaleway_k8s_pool_beta" "default" {}

resource "null_resource" "kubeconfig" {
    depends_on = [scaleway_k8s_pool_beta.default]
    triggers = {
         kubeconfig = scaleway_k8s_cluster_beta.example.kubeconfig[0]
    }
}

provider "kubernetes" {
  load_config_file = "false"
  host             = null_resource.kubeconfig.triggers.host
  token            = null_resource.kubeconfig.triggers.token
  cluster_ca_certificate = base64decode(
     null_resource.kubeconfig.triggers.cluster_ca_certificate
  )
  version = "~> 1.11"
}

The null_resource can be used as an intermediate dependency and allow waiting for other resources.

@jgillich
Copy link
Author

jgillich commented Jun 17, 2020

There also seems to be a similar issue with pool nodes:

  on kubernetes.tf line 25, in data "scaleway_instance_server" "pool_default_node_0":
  25:   name = scaleway_k8s_pool_beta.default.nodes[0].name
    |----------------
    | scaleway_k8s_pool_beta.default.nodes is empty list of object

The given key does not identify an element in this collection value.

This is just a data source that I use to pull the private IP:

data "scaleway_instance_server" "pool_default_node_0" {
  name = scaleway_k8s_pool_beta.default.nodes[0].name
}

Could be that the created pool takes a moment to launch the instances so right after creation the API doesn't return any?

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 17, 2020

Yep thtat's why. You can use the https://www.terraform.io/docs/providers/scaleway/r/k8s_pool_beta.html#wait_for_pool_ready flaf though :)

@jgillich
Copy link
Author

Oh nice, thanks!

@jgillich
Copy link
Author

Is this meant to be the final solution or just a temporary one?

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jun 23, 2020

@jgillich I think it'll be the final one. At least until some changes are done on TF side for providers 😢
Or if you have a solution (which keeps the cluster ID in the pool object) I'm all ears 😄

@remyleone remyleone added the k8s Kubernetes Kapsule issues, bugs and feature requests label Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
k8s Kubernetes Kapsule issues, bugs and feature requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants