Skip to content

kops update cluster stuck on nat gateway update #1917

@hollowimage

Description

@hollowimage

so im hitting a weird issue with using kops to do any sort of cluster updates post creation

[15:26:36][admin@domain.com]$ kops update cluster
Using cluster from kubectl context: prod-domain.com

I0215 15:26:55.140339   24888 dns.go:89] Private DNS: skipping DNS validation
I0215 15:26:55.147283   24888 executor.go:91] Tasks: 0 done / 96 total; 31 can run
I0215 15:27:01.612480   24888 executor.go:91] Tasks: 31 done / 96 total; 21 can run
I0215 15:27:02.418030   24888 executor.go:91] Tasks: 52 done / 96 total; 30 can run
I0215 15:27:09.089526   24888 executor.go:91] Tasks: 82 done / 96 total; 8 can run
I0215 15:27:15.795805   24888 dnsname.go:107] AliasTarget for "api.prod-domain.com." is "prod-domain.us-east-1.elb.amazonaws.com."
W0215 15:27:15.903281   24888 executor.go:109] error running task "NatGateway/us-east-1dprod-domain.com" (9m53s remaining to succeed): Field cannot be changed: ElasticIp
W0215 15:27:15.903313   24888 executor.go:109] error running task "NatGateway/us-east-1c.prod-domain.com" (9m53s remaining to succeed): Field cannot be changed: ElasticIp
W0215 15:27:15.903324   24888 executor.go:109] error running task "NatGateway/us-east-1b.prod-domain.com" (9m53s remaining to succeed): Field cannot be changed: ElasticIp
I0215 15:27:15.903342   24888 executor.go:91] Tasks: 87 done / 96 total; 6 can run
W0215 15:27:16.040721   24888 executor.go:109] error running task "NatGateway/us-east-1d.prod-domain.com" (9m53s remaining to succeed): Field cannot be changed: ElasticIp
W0215 15:27:16.040746   24888 executor.go:109] error running task "NatGateway/us-east-1c.prod-domain.com" (9m53s remaining to succeed): Field cannot be changed: ElasticIp
W0215 15:27:16.040757   24888 executor.go:109] error running task "NatGateway/us-east-1b.prod-domain.com" (9m53s remaining to succeed): Field cannot be changed: ElasticIp
I0215 15:27:16.040774   24888 executor.go:91] Tasks: 90 done / 96 total; 3 can run
W0215 15:27:16.141043   24888 executor.go:109] error running task "NatGateway/us-east-1d.prod-domain.com" (9m52s remaining to succeed): Field cannot be changed: ElasticIp
W0215 15:27:16.141069   24888 executor.go:109] error running task "NatGateway/us-east-1c.prod-domain.com" (9m52s remaining to succeed): Field cannot be changed: ElasticIp
W0215 15:27:16.141079   24888 executor.go:109] error running task "NatGateway/us-east-1b.prod-domain.com" (9m52s remaining to succeed): Field cannot be changed: ElasticIp
I0215 15:27:16.141089   24888 executor.go:124] No progress made, sleeping before retrying 3 failed task(s)

i am using Version 1.5.1 (git-01deca8) and

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

creation summary:

  1. using a shared vpc
  2. everything else was created by kops
  3. private cluster topology with weave
kops create cluster --cloud aws --name prod.domain.com --dns private --dns-zone [ZONE] --vpc=vpc-xxxxxx --topology private --master-size=m4.large --master-zones=us-east-1b,us-east-1c,us-east-1d --network-cidr =172.x.x.x/16 --node-count=3 --node-size=m4.large --zones=us-east-1b,us-east-1c,us-east-1d --networking weave --channel alpha --associate-public-ip=false

I made the following changes to the cluster post-initial creation:

  1. change egress in the route tables for utility subnets to point to custom firewall solution we have in place
  2. change security group rules to allow inbound traffic to the public subnets from 0.0.0.0/0 to x.x.x.x where the latter is the proxy source IP that handles our VPC egress and ingress
  3. i left NGWs it created with their EIPs as is. nothing was detached or deleted.

in summary the only real change was the subnet egress routing and the security group for inbound traffic to block public access.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions