Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: Terraform resource names cannot start with a digit. This is a bug in Kops, please report this in a GitHub Issue. Name: 1.etcd-events.k8s-cs.domain.net #9982

Closed
dimitrez opened this issue Sep 24, 2020 · 14 comments · Fixed by #10424
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@dimitrez
Copy link

dimitrez commented Sep 24, 2020

Version 1.18.1 (git-453d7d96be)

AWS

kops create cluster \
--cloud=aws \
--zones=us-west-2a \
--name=k8s-cs.domain.net \
--state=s3://domaininfrastructurestate/kops/cs/ \
--dns-zone=k8s-cs.domain.net \
--node-size=t3a.small \
--node-tenancy=default \
--node-volume-size=50 \
--node-count=3 \
--master-size=t3a.small \
--master-volume-size=50 \
--master-zones=us-west-2a \
--master-count=3 \
--out=. \
--target=terraform \
--yes \
--image=ami-028e52edfeb33adb2
W0924 16:33:48.893149   30386 create_cluster.go:771] Running with masters in the same AZs; redundancy will be reduced
I0924 16:33:50.377267   30386 subnets.go:184] Assigned CIDR 172.20.32.0/19 to subnet us-west-2a
I0924 16:33:54.964514   30386 create_cluster.go:1537] Using SSH public key: /home/demi4/.ssh/id_rsa.pub
I0924 16:34:08.385316   30386 executor.go:103] Tasks: 0 done / 95 total; 47 can run
I0924 16:34:08.397555   30386 dnszone.go:242] Check for existing route53 zone to re-use with name "k8s-cs.domain.net"
I0924 16:34:09.249425   30386 dnszone.go:249] Existing zone "k8s-cs.domain.net." found; will configure TF to reuse
I0924 16:34:10.792010   30386 vfs_castore.go:590] Issuing new certificate: "apiserver-aggregator-ca"
I0924 16:34:10.831807   30386 vfs_castore.go:590] Issuing new certificate: "etcd-clients-ca"
I0924 16:34:10.900833   30386 vfs_castore.go:590] Issuing new certificate: "ca"
I0924 16:34:10.919639   30386 vfs_castore.go:590] Issuing new certificate: "etcd-manager-ca-main"
I0924 16:34:11.035820   30386 vfs_castore.go:590] Issuing new certificate: "etcd-peers-ca-events"
I0924 16:34:11.106013   30386 vfs_castore.go:590] Issuing new certificate: "etcd-peers-ca-main"
I0924 16:34:11.226926   30386 vfs_castore.go:590] Issuing new certificate: "etcd-manager-ca-events"
I0924 16:34:16.248941   30386 executor.go:103] Tasks: 47 done / 95 total; 26 can run
I0924 16:34:18.305484   30386 vfs_castore.go:590] Issuing new certificate: "master"
I0924 16:34:18.365115   30386 vfs_castore.go:590] Issuing new certificate: "apiserver-aggregator"
I0924 16:34:18.463993   30386 vfs_castore.go:590] Issuing new certificate: "kops"
I0924 16:34:18.493709   30386 vfs_castore.go:590] Issuing new certificate: "kubelet"
I0924 16:34:18.506919   30386 vfs_castore.go:590] Issuing new certificate: "kube-controller-manager"
I0924 16:34:18.517012   30386 vfs_castore.go:590] Issuing new certificate: "kube-scheduler"
I0924 16:34:18.523661   30386 vfs_castore.go:590] Issuing new certificate: "kubecfg"
I0924 16:34:18.533129   30386 vfs_castore.go:590] Issuing new certificate: "kubelet-api"
I0924 16:34:18.602225   30386 vfs_castore.go:590] Issuing new certificate: "apiserver-proxy-client"
I0924 16:34:18.682028   30386 vfs_castore.go:590] Issuing new certificate: "kube-proxy"
I0924 16:34:22.575969   30386 executor.go:103] Tasks: 73 done / 95 total; 18 can run
I0924 16:34:24.249151   30386 executor.go:103] Tasks: 91 done / 95 total; 4 can run
I0924 16:34:24.249770   30386 executor.go:103] Tasks: 95 done / 95 total; 0 can run
panic: Terraform resource names cannot start with a digit. This is a bug in Kops, please report this in a GitHub Issue. Name: 1.etcd-events.k8s-cs.domain.net

goroutine 1 [running]:
k8s.io/kops/upup/pkg/fi/cloudup/terraform.tfSanitize(0xc000bf2570, 0x23, 0x3d027e5, 0xe)
        /go/src/k8s.io/kops/upup/pkg/fi/cloudup/terraform/target.go:104 +0x302
k8s.io/kops/upup/pkg/fi/cloudup/terraform.(*TerraformTarget).finish012(0xc00056e500, 0xc000a077d0, 0x0, 0x34d2ae0)
        /go/src/k8s.io/kops/upup/pkg/fi/cloudup/terraform/target_0_12.go:61 +0x6b5
k8s.io/kops/upup/pkg/fi/cloudup/terraform.(*TerraformTarget).Finish(0xc00056e500, 0xc000a077d0, 0xa, 0xc000625700)
        /go/src/k8s.io/kops/upup/pkg/fi/cloudup/terraform/target.go:200 +0x52f
k8s.io/kops/upup/pkg/fi/cloudup.(*ApplyClusterCmd).Run(0xc000868000, 0x43b59a0, 0xc000052108, 0x0, 0x0)
        /go/src/k8s.io/kops/upup/pkg/fi/cloudup/apply_cluster.go:938 +0x26c1
main.RunUpdateCluster(0x43b59a0, 0xc000052108, 0xc0003e1b60, 0x7ffca7e78ee0, 0x15, 0x4356d20, 0xc00000e018, 0xc000a41e60, 0x0, 0x0, ...)
        /go/src/k8s.io/kops/cmd/kops/update_cluster.go:274 +0x9ba
main.RunCreateCluster(0x43b59a0, 0xc000052108, 0xc0003e1b60, 0x4356d20, 0xc00000e018, 0xc00036fc00, 0xc0000e3800, 0xc000845d20)
        /go/src/k8s.io/kops/cmd/kops/create_cluster.go:1357 +0x3720
main.NewCmdCreateCluster.func1(0xc00088b400, 0xc0001fbc20, 0x0, 0x11)
        /go/src/k8s.io/kops/cmd/kops/create_cluster.go:274 +0x188
github.com/spf13/cobra.(*Command).execute(0xc00088b400, 0xc0001fbb00, 0x11, 0x12, 0xc00088b400, 0xc0001fbb00)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0x61e3f80, 0x6220128, 0x0, 0x0)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.Execute()
        /go/src/k8s.io/kops/cmd/kops/root.go:96 +0x8f
main.main()
        /go/src/k8s.io/kops/cmd/kops/main.go:25 +0x20

What did you expect to happen?

Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.**

kind: Cluster
metadata:
  creationTimestamp: "2020-09-24T13:33:51Z"
  name: k8s-cs.domain.net
spec:
  api:
    dns: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://domaininfrastructurestate/kops/cs/k8s-cs.domain.net
  containerRuntime: docker
  dnsZone: k8s-cs.domain.net
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-us-west-2a-1
      name: "1"
    - instanceGroup: master-us-west-2a-2
      name: "2"
    - instanceGroup: master-us-west-2a-3
      name: "3"
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-us-west-2a-1
      name: "1"
    - instanceGroup: master-us-west-2a-2
      name: "2"
    - instanceGroup: master-us-west-2a-3
      name: "3"
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.18.8
  masterPublicName: api.k8s-cs.domain.net
  networkCIDR: 172.20.0.0/16
  networking:
    kubenet: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-west-2a
    type: Public
    zone: us-west-2a
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-09-24T13:33:52Z"
  labels:
    kops.k8s.io/cluster: k8s-cs.domain.net
  name: master-us-west-2a-1
spec:
  image: ami-028e52edfeb33adb2
  machineType: t3a.small
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-west-2a-1
  role: Master
  rootVolumeSize: 50
  subnets:
  - us-west-2a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-09-24T13:33:52Z"
  labels:
    kops.k8s.io/cluster: k8s-cs.domain.net
  name: master-us-west-2a-2
spec:
  image: ami-028e52edfeb33adb2
  machineType: t3a.small
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-west-2a-2
  role: Master
  rootVolumeSize: 50
  subnets:
  - us-west-2a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-09-24T13:33:53Z"
  labels:
    kops.k8s.io/cluster: k8s-cs.domain.net
  name: master-us-west-2a-3
spec:
  image: ami-028e52edfeb33adb2
  machineType: t3a.small
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-west-2a-3
  role: Master
  rootVolumeSize: 50
  subnets:
  - us-west-2a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-09-24T13:33:53Z"
  labels:
    kops.k8s.io/cluster: k8s-cs.domain.net
  name: nodes
spec:
  image: ami-028e52edfeb33adb2
  machineType: t3a.small
  maxSize: 3
  minSize: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  rootVolumeSize: 50
  subnets:
  - us-west-2a
  tenancy: default
@rifelpet
Copy link
Member

Thanks for reporting this! This is definitely a bug in Kops. The problem is in the generated terraform code and the EBS volumes used by the etcd cluster. The terraform resource names for the volumes begin with the etcdMember name from the ClusterSpec.

Terraform 0.12 no longer allows resource names to begin with digits, but in your case you have etcdMember names of 1 2 and 3.

Kops could handle this in one of two ways:

  • if the etcdmember name starts with a digit, prefix the terraform resource name with something like vol
  • disallow etcdmember names to start with a digit. this gets messy because it would be a breaking change and enforcing it only for terraform users isnt possible (since the ClusterSpec is defined with a different kops command than the terraform generation)

We could also print a warning during generation that the terraform code will fail and the user needs to change the etcdmember names.

As a side note, I dont know what exactly is involved with changing the member names and how seamless that is. If its straight forward and without downtime, maybe thats what we suggest to terraform users.

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 25, 2020
@nfillot
Copy link

nfillot commented Oct 7, 2020

As a side note, I dont know what exactly is involved with changing the member names and how seamless that is. If its straight forward and without downtime, maybe thats what we suggest to terraform users.

I had the same issue as @dimitrez with etcd starting with digits and i tried renaming it.
Straight forward is not the term i would use as you have etcd certificates name mismatch during the roll-out for example

I ended up purging masters volumes and restoring from etcd backup but we're testing if it could work by scaling up (adding new members with correct naming) and scaling down old masters

@granular-ryanbonham
Copy link
Contributor

We also ran into this. Renaming the terraform resource would be less of a risk then trying to rename etcd members. I think the suggestion of "if the etcdmember name starts with a digit, prefix the terraform resource name with something like vol" is a good solution, and is inline with other 1.18 changes for terraform 12.

@jtbonhomme
Copy link

Is there a way to bypass this issue if I create a new cluster ?

@nfillot
Copy link

nfillot commented Dec 3, 2020

@jtbonhomme if you're specifying etcd members with a letter (or prefix) in a brand new cluster, it will work

Like this :

    etcdMembers:
    - instanceGroup: master-eu-west-3a-1
      name: "a1"

@hakman
Copy link
Member

hakman commented Dec 3, 2020

@jtbonhomme do you modify in any way the cluster config after creating it?

@jtbonhomme
Copy link

jtbonhomme commented Dec 3, 2020

@hakman no, I don't
@nfillot I am not sure to your point, I tried to export a cluster in yaml format (kops get --name sn-dev.k8s.local -o yaml > cluster.yaml), then I changed etcdMembers names to prefix them with a letter (the letter of the AZ), but how to i generate a terraform manifest from a cluster description file ?

@jtbonhomme
Copy link

jtbonhomme commented Dec 3, 2020

@nfillot I tried to create a brand new cluster with a new name, then update it with --target terraform flag.
It works, thank you !

@hakman
Copy link
Member

hakman commented Dec 3, 2020

Thanks @jtbonhomme. I think I found the issue and should be fixed in the next 1.18 release.
This will work only if you are creating the cluster in multiple AZs, in case it helps.

@jtbonhomme
Copy link

OK @hakman congrats for your investigations and for having found the root cause.
Thank you for your help !

@kuzaxak
Copy link

kuzaxak commented Dec 28, 2020

What should we do with existing cluster? Is it possible to upgrade?

@rifelpet
Copy link
Member

A solution is proposed here: #10424 (comment) to prevent any mistaken terraform applys without doing the necessary terraform state mv first, the idea is to require an environment variable be set for kops update cluster --target terraform whenever this issue would occur.

Any feedback on this solution would be appreciated

@kuzaxak
Copy link

kuzaxak commented Dec 29, 2020

Problem not with tf itself, but with etcd certificates naming.

Domain name for etcs nodes will be changed from etcd-1 to etcd-a, etc.
We already tried it and got a lot certificates issues, adding one by one didn't work either.

@rifelpet
Copy link
Member

With the proposed solution the etcd member names and their certificates won't change for existing clusters, only the terraform resource names. upon upgrading to kops 1.19 you'll be required to terraform state mv the aws_ebs_volume resource(s) before terraform apply, and set an env var for kops update cluster --target terraform to confirm you've done that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants