Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assets/manifests/coredns\cluster-role-binding.yaml fails to generate on Windows #571

Closed
vroad opened this issue Oct 16, 2019 · 9 comments
Closed

Comments

@vroad
Copy link

vroad commented Oct 16, 2019

Bug

UPDATE: It seems that the problem only occurs on windows.

assets/manifests/coredns\cluster-role-binding.yaml is missing.

Only last separator is backslash, so It looks like a bug of the template provider.

Environment

  • Platform: aws
  • OS: container-linux
  • Release: Typhoon version v1.15.3, v1.16.1
  • Terraform: v0.12.9
  • Plugins:
    • provider.aws v2.31.0
    • provider.ct v0.4.0
    • provider.local v1.4.0
    • provider.null v2.1.2
    • provider.template v2.1.2
    • provider.tls v2.1.1

Terraform is running on windows machine.

Problem

Error: open assets/manifests/coredns\cluster-role-binding.yaml: The system cannot find the path specified.

  on .terraform\modules\tempest.bootkube\assets.tf line 17, in resource "template_dir" "manifests":
  17: resource "template_dir" "manifests" {

manifest directory is created, but its contents is empty.

$ find ./assets/
./assets/
./assets/auth
./assets/auth/kubeconfig
./assets/auth/kubeconfig-kubelet
./assets/auth/tempest-config
./assets/bootstrap-manifests
./assets/bootstrap-manifests/bootstrap-apiserver.yaml
./assets/bootstrap-manifests/bootstrap-controller-manager.yaml
./assets/bootstrap-manifests/bootstrap-scheduler.yaml
./assets/manifests
./assets/manifests-networking
./assets/manifests-networking/bgpconfigurations-crd.yaml
./assets/manifests-networking/bgppeers-crd.yaml
./assets/manifests-networking/blockaffinities-crd.yaml
./assets/manifests-networking/cluster-role-binding.yaml
./assets/manifests-networking/cluster-role.yaml
./assets/manifests-networking/clusterinformations-crd.yaml
./assets/manifests-networking/config.yaml
./assets/manifests-networking/daemonset.yaml
./assets/manifests-networking/default-ipv4-ippool.yaml
./assets/manifests-networking/felixconfigurations-crd.yaml
./assets/manifests-networking/globalnetworkpolicies-crd.yaml
./assets/manifests-networking/globalnetworksets-crd.yaml
./assets/manifests-networking/hostendpoints-crd.yaml
./assets/manifests-networking/ipamblocks.crd.yaml
./assets/manifests-networking/ipamconfigs-crd.yaml
./assets/manifests-networking/ipamhandles-crd.yaml
./assets/manifests-networking/ippools-crd.yaml
./assets/manifests-networking/networkpolicies-crd.yaml
./assets/manifests-networking/networksets-crd.yaml
./assets/manifests-networking/service-account.yaml
./assets/tls
./assets/tls/admin.crt
./assets/tls/admin.key
./assets/tls/apiserver.crt
./assets/tls/apiserver.key
./assets/tls/ca.crt
./assets/tls/ca.key
./assets/tls/etcd
./assets/tls/etcd/peer-ca.crt
./assets/tls/etcd/peer.crt
./assets/tls/etcd/peer.key
./assets/tls/etcd/server-ca.crt
./assets/tls/etcd/server.crt
./assets/tls/etcd/server.key
./assets/tls/etcd-ca.crt
./assets/tls/etcd-ca.key
./assets/tls/etcd-client-ca.crt
./assets/tls/etcd-client.crt
./assets/tls/etcd-client.key
./assets/tls/kubelet.crt
./assets/tls/kubelet.key
./assets/tls/service-account.key
./assets/tls/service-account.pub

Desired Behavior

Cluster starts successfully

Steps to Reproduce

module "tempest" {
  source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.15.3"

  # AWS
  cluster_name = "tempest"
  dns_zone     = "stating.example.com"
  dns_zone_id  = "<my_zone_id>"

  # configuration
  ssh_authorized_key = "ssh-rsa ..."
  asset_dir          = "${path.module}/assets/"

  # optional
  worker_count = 1
  worker_type  = "t2.micro"

  controller_type = "t3.small"
}
@vroad vroad changed the title Cluster does not start on v1.15.3, v1.16.1: assets/manifests/coredns\cluster-role-binding.yaml is missing assets/manifests/coredns\cluster-role-binding.yaml is missing on Windows Oct 16, 2019
@vroad vroad changed the title assets/manifests/coredns\cluster-role-binding.yaml is missing on Windows assets/manifests/coredns\cluster-role-binding.yaml fails to generate on Windows Oct 16, 2019
@dghubble
Copy link
Member

looks like a bug of the template provider. It seems that the problem only occurs on windows.

Looks like you've already identified the suspect template template_dir. Try to isolate a standalone template_dir example with a nested directory to report the path separator issue to upstream terraform-provider-template.

@vroad
Copy link
Author

vroad commented Oct 16, 2019

The problem also occurs when I run terraform-render-bootstrap directly, so I attached it as an example.

@dghubble
Copy link
Member

Great, thanks for reporting. I'd say track that issue to await a fix in the template provider for Windows.

The problem also occurs when I run terraform-render-bootstrap

Yes, as that's where template_dir is used. You can likely produce a much smaller reproducible example:

resource "template_dir" "manifests" {
  source_dir      = "input"
  destination_dir = "output"
  vars = {}
}
$ tree input
input/manifests/
├── manifests
│   ├── cluster-role-binding.yaml
├── somefile.yaml

I don't make use of Windows systems, so perhaps you can check if that demonstrates the problem.

@vroad
Copy link
Author

vroad commented Nov 3, 2019

I've updated example with minimal one. Unfortunately the issue is not getting attentions by maintainers.

At the same time, I've found a workaround for the issue while I was trying to customize terraform-render-bootstrap for putting generated files on Amazon S3.
You can create set of files from templates on S3 using for_each.

resource "aws_s3_bucket_object" "manifests" {
  for_each        = fileset("${path.module}/resources/manifests", "**/*.yaml")

  bucket = aws_s3_bucket.assets.id
  key    = "/manifests/${each.value}"
  content = templatefile("${path.module}/resources/manifests/${each.value}", {
    hyperkube_image        = var.container_images["hyperkube"]
    coredns_image          = var.container_images["coredns"]
    control_plane_replicas = max(2, length(var.etcd_servers))
    pod_cidr               = var.pod_cidr
    cluster_domain_suffix  = var.cluster_domain_suffix
    cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
    trusted_certs_dir      = var.trusted_certs_dir
    server                 = format("https://%s:%s", var.api_servers[0], var.external_apiserver_port)
  })
}

I've tried again with local_file resource and it could handle subdirectories without problems.

resource "local_file" "manifests" {
  for_each = fileset("${path.module}/resources/manifests", "**/*.yaml")

  filename = "${var.asset_dir}/manifests/${each.value}"
  content  = templatefile("${path.module}/resources/manifests/${each.value}", {
    hyperkube_image        = var.container_images["hyperkube"]
    coredns_image          = var.container_images["coredns"]
    control_plane_replicas = max(2, length(var.etcd_servers))
    pod_cidr               = var.pod_cidr
    cluster_domain_suffix  = var.cluster_domain_suffix
    cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
    trusted_certs_dir      = var.trusted_certs_dir
    server                 = format("https://%s:%s", var.api_servers[0], var.external_apiserver_port)
  })
}

@vroad
Copy link
Author

vroad commented Nov 3, 2019

But if we switch to this method, what value output and content hash should be? Should we concat generated content on memory and make sha1 hash?

I tried putting generated yaml files on S3, because current approach that uses local_file and template_dir has some issues.

  • When working with multiple developers or on CI environment (using remote state store like S3), files get generated on each machine, making a lot of noise on plan result. Especially on CI environment, this will occur every time when plan command is invoked.
  • I just don't want to put sensitive information like certifiactes on my local machine. They should be downloadable to only developers/machines which has right IAM permissions. I'd rather let master nodes download certificates when they launch.

@vroad
Copy link
Author

vroad commented Nov 9, 2019

@dghubble
I created a PR for generating bootstrap files without using template_dir:
poseidon/terraform-render-bootstrap#157

This solved the problem of templating on windows. I've faced another problem though: I couldn't connect to AWS EC2 with pageant. Terraform won't use ssh keys stored in pageant.

I modified typhoon, and provided the content of ssh private key to make provisioner work without agent.

@dghubble
Copy link
Member

dghubble commented Nov 9, 2019

For now, the recommendation is still to await a fix from the upstream template provider or use a non-Windows system. Moving to templatefile may be on the horizon, but for other reasons, not for whether its better/worse on Windows. It affects plugin dependencies in downstream modules, so if I go that route it would be a part of broader changes/testing and likely not in a point release.

As for the other questions, please keep focused on the issue. I'll briefly comment on two items to close them:

  • Rendering to aws_s3_bucket_objects in terraform-render-bootstrap is not a goal (and tangential to your rendering issue). The bootstrap module is pulled in as a dependency for multiple cluster modules (GCP, Azure, DO, bare-metal) and its not appropriate to take on a cloud provider dependency. AWS occupies no special prominence. Please drop references to this being a goal/direction*.
  • Terraform SSH agent support is used intentionally, it prevents your private key being present in state files (indeed, in security conscious setups with Yubikeys its not even possible to provide a private key as it doesn't reside on-disk). Probably revisit any security shortcuts you've taken, privately.

@vroad
Copy link
Author

vroad commented Nov 10, 2019

I created another issue to talk about how we should generate manifests/certs: #584 584

Regrading SSH issue, I understand that agent is better for security, but I just couldn't make it work on windows. Only environment that I could make it work is WSL ubuntu, which has ssh-agent. To make ssh-agent available for terraform running on Docker for windows, I guess some special configuration is required.

@dghubble
Copy link
Member

Closed by #587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants