Route53 issue #1386

chrislovecnm · 2017-01-08T13:21:47Z

I am running on master:

KOPS_BASE_URL=https://s3-us-west-1.amazonaws.com/${OUR_BUCKET}/kops/1.5.0-alpha1 \
kops create cluster \
  --name $CLUSTER_NAME \
  --state $KOPS_STATE_STORE \
  --node-count $NODE_COUNT \
  --zones $NODE_ZONES \
  --master-zones $MASTER_ZONES \
  --cloud aws \
  --node-size $NODE_SIZE \
  --master-size $MASTER_SIZE \
  -v $VERBOSITY \
  --kubernetes-version "http://${K8S_BUCKET}.s3-website-us-east-1.amazonaws.com/kubernetes/dev/v1.6.0-dev" \
  --yes

A few times back to back with a delete. I noticed that several Route53 records are set to an invalid ip address. API, etcd, internal api are all set to the incorrect public ip address.

chrislovecnm · 2017-01-08T13:29:26Z

I found the ip address ./upup/pkg/fi/cloudup/dns.go: PlaceholderIP = "203.0.113.123"

justinsb · 2017-01-08T14:09:26Z

I don't understand what the error is here. That the placeholder DNS records are not being replaced?

chrislovecnm · 2017-01-08T17:57:45Z

Here are the records being created:

api.clove-k8s-1-6-1.alpha.trebuchet.ai. A 203.0.113.123
etcd-events-us-east-1a.internal.api.clove-k8s-1-6-1.alpha.trebuchet.ai. A 203.0.113.123
etcd-us-east-1a.internal.api.clove-k8s-1-6-1.alpha.trebuchet.ai. A 203.0.113.123
api.internal.clove-k8s-1-6-1.alpha.trebuchet.ai. A 203.0.113.123

chrislovecnm · 2017-01-08T18:11:29Z

$ kubectl get no
Unable to connect to the server: dial tcp 203.0.113.123:443: getsockopt: no route to host

Still not happy

weaseal · 2017-01-19T23:38:04Z

This is happening to me on 1.5.0-alpha3. If I use an ELB for my master nodes, all the route53 DNS entries (except for api.$NAME) get set to 203.0.113.123.

The issue does not happen every time.

obsequiouswoe · 2017-01-27T00:49:55Z

I had this occur on Version 1.5.0-alpha4 and all entries were set to 203.0.113.123.

Just tried again using 1.5.0-beta1 and the api entry is set to an ELB with the other 3 entries pointing at the 203.0.113.123 ip.

The most recent issue with the beta kops was related to an inability of the master / nodes to get to their DNS servers as specified in DHCP scope options. Once I fixed this I terminated the instances and let the autoscaler recreate them. Everything then correctly registered in DNS and the ELB.

chrislovecnm · 2017-01-27T02:25:43Z

@obsequiouswoe so your aws kungfu is awesome. Can you ELI5 for me?? DNS is my weak point in life.

obsequiouswoe · 2017-01-29T23:55:42Z

@chrislovecnm the issue I saw was that the newly created instances were not able to connect to their assigned DNS servers. I run a custom DHCP Scope which assigns DNS servers in a different VPC. I needed to update the route table to get the relevant range routed via the peer. Once this was completed I terminated the instances so that they would be re-created by the autoscaler this then allowed them to see DNS, which once they could they were able to update their entries in Route53.

cbluth · 2017-01-31T18:26:37Z

I am experiencing the same issue with kops Version 1.5.0-beta1 (git-b419f20).
All of my records are set to "203.0.113.123" after using kops to create a cluster.

chrislovecnm · 2017-02-01T20:54:55Z

@obsequiouswoe what is the fix and how do we document. DNS is always hard :(

weaseal · 2017-02-01T21:35:28Z

I think I understand what's happening here:

203.0.113.123 is the placeholder IP that kops starts all records as just because you can't create a blank DNS record, and kops doesn't yet know the actual values since they are dynamically generated by AWS/DHCP
Eager systems admins (like myself above) who are first-time kops users are awaiting their k8s cluster and investigate route53 as a reason they can't reach the k8s API yet.
The system admin sees k8s DNS entries as 203.0.113.123 and assumes this is a mistake by kops, only having checked too early, and posts here
At some point kubernetes masters finish booting and update those DNS records to the real value and everything starts working.

I think this is true because the issue always seems to resolve itself after about 15 minutes. I have not done a code dive to verify.

chrislovecnm · 2017-02-01T22:21:59Z

That is awsome! You are correct! DNS takes a long time to propagate and create. We pre create records with the 203.0.113.123 address. The container name dns-controller then sets up the DNS for us correctly. The reason that we have to use DNS is two-fold.

We don't want an ELB address on the API server name. We want a friendly name
We have to have etcd discovery. The way we have chosen

chrislovecnm · 2017-02-01T22:33:03Z

@weaseal can I ask you to drop that into documentation? We need a FAQ page. Or should it go into a DNS doc ... You are the customer. Where would you expect it?

ghost · 2017-02-22T12:10:34Z

@chrislovecnm @weaseal I faced similar issues. Currently, I am waiting for 10 minutes for this process. Should I wait for more? I am facing this issue with v1.5.2. Any suggestion and opinions?

Marklon · 2017-02-22T20:51:09Z

@voyalab I echo your experience, I've waited 15 minutes for DNS to update / propagate correctly. I'm not aware of any work arounds on this.

ericln · 2017-03-28T16:10:56Z

seeing the same issue where the dns records are stuck at 203.0.113.123. It has been hours, no changes, any suggestion on troubleshoot this issue ? I do see the ELBs instances status are OutOfService

Marklon · 2017-03-28T17:13:07Z

Hi Eric,
I've experienced the same issue. If you haven't I would suggest exposing your master instances inbound traffic in the AWS console to accept https or whatever service you are using.

offlinehoster · 2017-03-31T13:44:24Z

Got the same Issue now.

ericln · 2017-03-31T14:39:14Z

When I setup a different sub domain zone and setup the ns record in the parent domain, then I'll have this problem. However, if I leave everything in the parent domain then all the instances are registered fine in the ELB. I'm on the release branch, I was doing everything from the master branch and it was giving us all sorts of problem

danopia · 2017-04-11T22:00:51Z

I understand the purpose of the placeholder IP and such, just chiming in to point out that kops validate doesn't have a great failure state when the IP is still there.

$ kops validate cluster
Using cluster from kubectl context: test-cluster.example.com

Validating cluster test-cluster.example.com


cannot get nodes for "test-cluster.example.com": Get https://api.test-cluster.example.com/api/v1/nodes: dial tcp 203.0.113.123:443: i/o timeout

The output mentions that there was an error talking to the placeholder IP. Seems like it would be relatively straightforward to notice the placeholder IP and print a message saying that the masters haven't updated DNS yet.

richburdon · 2017-04-18T22:15:55Z

Hitting this error but have confirmed that my DNS NS records match my Route 53 NS records.

I.e., that dig ns MY_DOMAIN matches aws route53 get-hosted-zone --id MY_ZONE_ID | jq ".DelegationSet.NameServers" (and that the AWS console's Route53 Test Record Set returns the correct NS entries).

Any other diagnostics I can try?

(kops 1.5.3)

chrislovecnm · 2017-04-19T10:38:39Z

@richburdon I would open another issue or reach out of slack

richburdon · 2017-04-19T13:53:02Z

Thanks: #2384

(I tried slack...)

chrislovecnm · 2017-05-03T22:01:20Z

@danopia great point - you kind filing an issue for that product enhancement?

If nobody minds I am going to close this issue in a couple of days

chrislovecnm · 2017-07-15T20:19:33Z

I am closing

natemurthy · 2017-08-10T20:39:52Z

Hi @chrislovecnm, could you either re-open this ticket or direct me to a possible solution to this problem? I can view my master node just fine with kubectl get nodes, but I don't see any of my worker nodes even after waiting for 15 minutes:

$ kops validate cluster --state=s3://[redacted]
Using cluster from kubectl context: [redacted]

Validating cluster [redacted]

INSTANCE GROUPS
NAME			ROLE	MACHINETYPE	MIN	MAX	SUBNETS
master-us-west-2b	Master	m4.large	1	1	us-west-2b
nodes			Node	t2.medium	2	2	us-west-2b

NODE STATUS
NAME						ROLE	READY
ip-xxx-xx-xx-xxx.us-west-2.compute.internal	master	True

Validation Failed
Ready Master(s) 1 out of 1.
Ready Node(s) 0 out of 2.

your nodes are NOT ready [redacted]

I believe this is a DNS issue that folks, myself included, are still observing.

yanpozka · 2017-11-17T19:38:25Z

I'm having the same issue than @natemurthy only the master instance is recognized by kops and kubectl, but I can see all nodes running in my Google Cloud console.

chrislovecnm · 2017-11-17T21:58:31Z

I am opening an issue to document diagnosis of route53 / google dns problems. Not going to reopen this issue.

chrislovecnm · 2017-11-17T22:35:22Z

Hey all. I opened an issue to document how to diagnosis problem such as these, please comment on #3888

anillingutla · 2018-04-06T02:42:44Z

203.0.113.123 this is an error iam remembering in my dreams.. i barely am creating cluster with kops create and the sub domain is good. iam stuck at this IP for ever. what and how did you guys solve this ? am i dealing with an improper version of KOPS ?

anillingutla · 2018-04-06T02:43:13Z

all that #3888 is good but its not helping me.

jitendra-saal · 2018-06-30T13:37:09Z

This still seems to be problem with kops Version 1.9.1.Even after waiting for hours for the DNS update(using AWS Route53),k8s api DNS still points to placeholder ip 203.0.113.123 and cluster validation fails.

BruceDevOps · 2018-08-01T23:17:06Z

I've had a time that after the creation it took me an overnight and the placeholder DNS still there not been replaced by the true IP. so eventually I delete the cluster... KOPS and AWS just not works that well!

mansurali901 · 2020-03-22T12:32:08Z

As my experience if you have difference in version of kops and kubectl and kubernetes plane version then Kops will never update the Route53 enteries you must need to have the same version for all in my case

[root@ip-20-0-0-66 kuberneteswithkops]# kops version Version 1.15.0 (git-9992b4055) [root@ip-20-0-0-66 kuberneteswithkops]# kubectl version Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Manikandan-Raj · 2020-05-19T12:51:04Z

I faced the same issue and tried the following steps, it got fixed.

changed master node type as c4.large, earlier t2.medium
updated both kobs and kubectl to 1.15.0

justinsb added this to the backlog milestone Jan 16, 2017

danopia mentioned this issue Jun 3, 2017

kops validate has bad UX during cluster launch #2679

Closed

chrislovecnm closed this as completed Jul 15, 2017

natemurthy mentioned this issue Aug 10, 2017

kops installation on aws ::::: kubectl command not working after installing and creating cluster #1599

Closed

r4j4h mentioned this issue Aug 25, 2017

AWS install does not change from place holder IPs #3273

Closed

chrislovecnm mentioned this issue Nov 17, 2017

Document Route53 and Google DNS troubleshooting #3888

Open

ilyasotkov mentioned this issue Aug 20, 2018

Unable to connect to the server: EOF when using a shared VPC and subnets #5667

Closed

Route53 issue #1386

Route53 issue #1386

Comments

chrislovecnm commented Jan 8, 2017 • edited Loading

chrislovecnm commented Jan 8, 2017

justinsb commented Jan 8, 2017

chrislovecnm commented Jan 8, 2017

chrislovecnm commented Jan 8, 2017

weaseal commented Jan 19, 2017 • edited Loading

obsequiouswoe commented Jan 27, 2017 • edited Loading

chrislovecnm commented Jan 27, 2017

obsequiouswoe commented Jan 29, 2017

cbluth commented Jan 31, 2017

chrislovecnm commented Feb 1, 2017

weaseal commented Feb 1, 2017

chrislovecnm commented Feb 1, 2017

chrislovecnm commented Feb 1, 2017

ghost commented Feb 22, 2017

Marklon commented Feb 22, 2017

ericln commented Mar 28, 2017 • edited Loading

Marklon commented Mar 28, 2017

offlinehoster commented Mar 31, 2017

ericln commented Mar 31, 2017

danopia commented Apr 11, 2017

richburdon commented Apr 18, 2017 • edited Loading

chrislovecnm commented Apr 19, 2017

richburdon commented Apr 19, 2017

chrislovecnm commented May 3, 2017

chrislovecnm commented Jul 15, 2017

natemurthy commented Aug 10, 2017 • edited Loading

yanpozka commented Nov 17, 2017

chrislovecnm commented Nov 17, 2017

chrislovecnm commented Nov 17, 2017

anillingutla commented Apr 6, 2018

anillingutla commented Apr 6, 2018

jitendra-saal commented Jun 30, 2018

BruceDevOps commented Aug 1, 2018

mansurali901 commented Mar 22, 2020

Manikandan-Raj commented May 19, 2020

chrislovecnm commented Jan 8, 2017 •

edited

Loading

weaseal commented Jan 19, 2017 •

edited

Loading

obsequiouswoe commented Jan 27, 2017 •

edited

Loading

ericln commented Mar 28, 2017 •

edited

Loading

richburdon commented Apr 18, 2017 •

edited

Loading

natemurthy commented Aug 10, 2017 •

edited

Loading