Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault / SIGSEGV when using kops-1.8.0-beta.1 segfault when concourse CI is running kops update cluster in dry-run mode aka without --yes #3943

Closed
stefancocora opened this issue Nov 28, 2017 · 5 comments
Assignees
Milestone

Comments

@stefancocora
Copy link

  1. What kops version are you running? The command kops version, will display
    this information.
kops version
Version 1.8.0-beta.1 (git-0a2f949fd)
  1. What Kubernetes version are you running? kubectl version will print the
    version if a cluster is running or provide the Kubernetes version specified as
    a kops flag.
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:30:51Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  1. What cloud provider are you using?
AWS
  1. What commands did you run? What is the simplest way to reproduce this issue?
Concourse CI was running the equivalent of

kops --state ${KOPS_STATE} -v8 update cluster ${KOPS_CLUSTER_NAME}
  1. What happened after the commands executed?
    The command breaks with a kops segfault for the dev environment but the same code from the same github shasum completes successfully for the prod account.

  2. What did you expect to happen?
    Expected the dev update cluster command to complete successfully and show me what would kops change.

  3. Please provide your cluster manifest. Execute
    kops get --name my.example.com -oyaml to display your cluster manifest.
    You may want to remove your cluster name and other sensitive information.
    Cluster manifest:

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2017-08-21T22:10:26Z
  name: k8s.dev.abcdefg.example.com
spec:
  additionalPolicies:
    node: |
      [
        {
         "Effect": "Allow",
         "Action": [
           "route53:ChangeResourceRecordSets"
         ],
         "Resource": [
           "arn:aws:route53:::hostedzone/*"
         ]
        },
        {
         "Effect": "Allow",
         "Action": [
           "route53:ListHostedZones",
           "route53:ListResourceRecordSets"
         ],
         "Resource": [
           "*"
         ]
        },
        {
         "Effect": "Allow",
         "Action": [
           "sts:AssumeRole"
         ],
          "Resource": [ "arn:aws:iam::1234567890:role/cloudwatch" , "arn:aws:iam::1234567890:role/cloudwatch" , "arn:aws:iam::1234567890:role/cloudwatch"]
        }
      ]
  api:
    loadBalancer:
      type: Internal
  authorization:
    alwaysAllow: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-eu-west-1a-1
      name: a-1
    - instanceGroup: master-eu-west-1b-1
      name: b-1
    - instanceGroup: master-eu-west-1a-2
      name: a-2
    name: main
  - etcdMembers:
    - instanceGroup: master-eu-west-1a-1
      name: a-1
    - instanceGroup: master-eu-west-1b-1
      name: b-1
    - instanceGroup: master-eu-west-1a-2
      name: a-2
    name: events
  iam:
    legacy: true
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.7.4
  masterPublicName: api.k8s.dev.abcdefg.example.com
  networkCIDR: 10.201.0.0/16
  networkID: vpc-2eeabcdefg
  networking:
    weave:
      mtu: 8912
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  sshKeyName: transit-dev-k8s
  subnets:
  - egress: nat-0dc435c1cae25cdc6
    id: subnet-eecb5c98
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - egress: nat-07ec3d8d11a1bed90
    id: subnet-6435913c
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - id: subnet-eecb5c98
    name: utility-eu-west-1a
    type: Utility
    zone: eu-west-1a
  - id: subnet-6435913c
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  topology:
    bastion:
      bastionPublicName: bastion.k8s.dev.abcdefg.example.com
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-08-21T22:10:26Z
  labels:
    kops.k8s.io/cluster: k8s.dev.abcdefg.example.com
  name: master-eu-west-1a-1
spec:
  image: kope.io/k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-05-02
  machineType: t2.small
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - eu-west-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-08-21T22:10:26Z
  labels:
    kops.k8s.io/cluster: k8s.dev.abcdefg.example.com
  name: master-eu-west-1a-2
spec:
  image: kope.io/k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-05-02
  machineType: t2.small
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - eu-west-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-08-21T22:10:26Z
  labels:
    kops.k8s.io/cluster: k8s.dev.abcdefg.example.com
  name: master-eu-west-1b-1
spec:
  image: kope.io/k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-05-02
  machineType: t2.small
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - eu-west-1b

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-08-21T22:10:26Z
  labels:
    kops.k8s.io/cluster: k8s.dev.abcdefg.example.com
  name: nodes
spec:
  image: kope.io/k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-05-02
  machineType: t2.large
  maxSize: 3
  minSize: 3
  role: Node
  subnets:
  - eu-west-1a
  - eu-west-1b
  1. Please run the commands with most verbose logging by adding the -v 10 flag.
    https://gist.github.com/stefancocora/ab2cb02cdd045ec404a318f36816280e

Concourse CI logs:

...
I1128 13:18:20.029760     558 request_logger.go:45] AWS request: autoscaling/DescribeAutoScalingGroups
I1128 13:18:20.077294     558 executor.go:91] Tasks: 90 done / 90 total; 0 can run
I1128 13:18:20.077809     558 iam_builder.go:285] Ignoring location "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/pki/" because found parent "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/"
I1128 13:18:20.077828     558 iam_builder.go:285] Ignoring location "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/secrets/" because found parent "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/"
I1128 13:18:20.077840     558 iam_builder.go:290] Found root location "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/"
I1128 13:18:20.103449     558 iam_builder.go:285] Ignoring location "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/pki/" because found parent "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/"
I1128 13:18:20.103477     558 iam_builder.go:285] Ignoring location "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/secrets/" because found parent "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/"
I1128 13:18:20.103493     558 iam_builder.go:290] Found root location "s3://dev-abcdefg-k8s-state/k8s.dev.abcdefg.example.com/"
I1128 13:18:20.121048     558 context.go:140] Performing HTTP request: GET https://kubeupv2.s3.amazonaws.com/kops/1.8.0-beta.1/images/protokube.tar.gz.sha1
I1128 13:18:20.210669     558 apply_cluster.go:884] Found hash "4f001a87cd410fa3d1c8cc1c4232a817fb30cde7" for "https://kubeupv2.s3.amazonaws.com/kops/1.8.0-beta.1/images/protokube.tar.gz"
I1128 13:18:20.300131     558 context.go:140] Performing HTTP request: GET https://kubeupv2.s3.amazonaws.com/kops/1.8.0-beta.1/images/protokube.tar.gz.sha1
I1128 13:18:20.388529     558 apply_cluster.go:884] Found hash "4f001a87cd410fa3d1c8cc1c4232a817fb30cde7" for "https://kubeupv2.s3.amazonaws.com/kops/1.8.0-beta.1/images/protokube.tar.gz"
I1128 13:18:20.450650     558 context.go:140] Performing HTTP request: GET https://kubeupv2.s3.amazonaws.com/kops/1.8.0-beta.1/images/protokube.tar.gz.sha1
I1128 13:18:20.565108     558 apply_cluster.go:884] Found hash "4f001a87cd410fa3d1c8cc1c4232a817fb30cde7" for "https://kubeupv2.s3.amazonaws.com/kops/1.8.0-beta.1/images/protokube.tar.gz"
I1128 13:18:20.668591     558 context.go:140] Performing HTTP request: GET https://kubeupv2.s3.amazonaws.com/kops/1.8.0-beta.1/images/protokube.tar.gz.sha1
I1128 13:18:20.757458     558 apply_cluster.go:884] Found hash "4f001a87cd410fa3d1c8cc1c4232a817fb30cde7" for "https://kubeupv2.s3.amazonaws.com/kops/1.8.0-beta.1/images/protokube.tar.gz"
I1128 13:18:20.772650     558 dryrun_target.go:456] Unhandled kind in asString for "": awstasks.LoadBalancerHealthCheck
I1128 13:18:20.772684     558 dryrun_target.go:456] Unhandled kind in asString for "": awstasks.LoadBalancerHealthCheck
I1128 13:18:20.778110     558 context.go:91] deleting temp dir: "/tmp/deploy851622747"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0xf042f6]
goroutine 1 [running]:
k8s.io/kops/upup/pkg/fi.(*ResourceHolder).Open(0x0, 0x0, 0xc420bd20f8, 0xc420e32380, 0x7f761bb48458)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/upup/pkg/fi/resources.go:207 +0x26
k8s.io/kops/upup/pkg/fi.CopyResource(0x4b16520, 0xc4202b1960, 0x4b178a0, 0x0, 0x0, 0x0, 0x0)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/upup/pkg/fi/resources.go:85 +0x6d
k8s.io/kops/upup/pkg/fi.ResourceAsString(0x4b178a0, 0x0, 0x0, 0x4b178a0, 0x0, 0x4bbc01)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/upup/pkg/fi/resources.go:103 +0x5f
k8s.io/kops/upup/pkg/fi.tryResourceAsString(0x2cda1c0, 0xc420c1f6f0, 0x196, 0x4b178a0, 0xc420a1e000, 0xc420c1f601)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/upup/pkg/fi/dryrun_target.go:349 +0x1ed
k8s.io/kops/upup/pkg/fi.buildChangeList(0x4b17f60, 0xc420c1f6e0, 0x4b17f60, 0xc420a1e020, 0x4b17f60, 0xc420c1f700, 0x0, 0x1, 0x0, 0x0, ...)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/upup/pkg/fi/dryrun_target.go:319 +0x89e
k8s.io/kops/upup/pkg/fi.(*DryRunTarget).PrintReport(0xc4207bfa40, 0xc42119f020, 0x4b217e0, 0xc42000c018, 0xc421107088, 0x1b)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/upup/pkg/fi/dryrun_target.go:215 +0x9cf
k8s.io/kops/upup/pkg/fi.(*DryRunTarget).Finish(0xc4207bfa40, 0xc42119f020, 0x312691e, 0xa)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/upup/pkg/fi/dryrun_target.go:476 +0x47
k8s.io/kops/upup/pkg/fi/cloudup.(*ApplyClusterCmd).Run(0xc4210640d0, 0x0, 0x0)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/upup/pkg/fi/cloudup/apply_cluster.go:867 +0x4aae
main.RunUpdateCluster(0xc4207afbe0, 0x7ffce23ab406, 0x1a, 0x4b217e0, 0xc42000c018, 0xc420a5e700, 0x4, 0xc420965d18)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/cmd/kops/update_cluster.go:216 +0x649
main.NewCmdUpdateCluster.func1(0xc420309680, 0xc420a2d440, 0x1, 0x4)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/cmd/kops/update_cluster.go:98 +0xc4
k8s.io/kops/vendor/github.com/spf13/cobra.(*Command).execute(0xc420309680, 0xc420a06b40, 0x4, 0x6, 0xc420309680, 0xc420a06b40)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/vendor/github.com/spf13/cobra/command.go:603 +0x234
k8s.io/kops/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x4d19240, 0x28fd100, 0x0, 0x0)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/vendor/github.com/spf13/cobra/command.go:689 +0x2fe
k8s.io/kops/vendor/github.com/spf13/cobra.(*Command).Execute(0x4d19240, 0x4d4d710, 0x0)
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/vendor/github.com/spf13/cobra/command.go:648 +0x2b
main.Execute()
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/cmd/kops/root.go:95 +0x91
main.main()
	/tmp/build/9299c5ca/kops-src/rootfs/kops-src/go/src/k8s.io/kops/cmd/kops/main.go:25 +0x20

  1. Anything else do we need to know?
    Possibly related to:
@chrislovecnm
Copy link
Contributor

I am looking at this now

/assign

@chrislovecnm
Copy link
Contributor

I just put in #3945 which addresses the panic, but does not address why the resource was nil. So I am not closing this issue.

@justinsb justinsb added this to the 1.8.0 milestone Nov 28, 2017
@stefancocora
Copy link
Author

... but does not address why the resource was nil ...

Which resource from the kops cluster manifest, do you think kops thinks it is nil for this dev cluster manifest ?

I've diff-ed the prod [1] and dev cluster manifests and apart from differences in resource names they have the same resources.
[1] context: the prod kops cluster update without --yes works fine.

@justinsb
Copy link
Member

justinsb commented Dec 2, 2017

Believed fixed by #3982 - please reopen once we have the next 1.8.0 release if it continues to occur.

@justinsb justinsb closed this as completed Dec 2, 2017
@stefancocora
Copy link
Author

I've built and tested kops from the master branch before the release of kops 1.8.0 and I can confirm this issue is fixed.
Thanks for all the effort !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants