Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Route53 issue #1386

Closed
chrislovecnm opened this issue Jan 8, 2017 · 35 comments
Closed

Route53 issue #1386

chrislovecnm opened this issue Jan 8, 2017 · 35 comments
Milestone

Comments

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Jan 8, 2017

I am running on master:

KOPS_BASE_URL=https://s3-us-west-1.amazonaws.com/${OUR_BUCKET}/kops/1.5.0-alpha1 \
kops create cluster \
  --name $CLUSTER_NAME \
  --state $KOPS_STATE_STORE \
  --node-count $NODE_COUNT \
  --zones $NODE_ZONES \
  --master-zones $MASTER_ZONES \
  --cloud aws \
  --node-size $NODE_SIZE \
  --master-size $MASTER_SIZE \
  -v $VERBOSITY \
  --kubernetes-version "http://${K8S_BUCKET}.s3-website-us-east-1.amazonaws.com/kubernetes/dev/v1.6.0-dev" \
  --yes

A few times back to back with a delete. I noticed that several Route53 records are set to an invalid ip address. API, etcd, internal api are all set to the incorrect public ip address.

@chrislovecnm
Copy link
Contributor Author

I found the ip address ./upup/pkg/fi/cloudup/dns.go: PlaceholderIP = "203.0.113.123"

@justinsb
Copy link
Member

justinsb commented Jan 8, 2017

I don't understand what the error is here. That the placeholder DNS records are not being replaced?

@chrislovecnm
Copy link
Contributor Author

Here are the records being created:

api.clove-k8s-1-6-1.alpha.trebuchet.ai. A 203.0.113.123
etcd-events-us-east-1a.internal.api.clove-k8s-1-6-1.alpha.trebuchet.ai. A 203.0.113.123
etcd-us-east-1a.internal.api.clove-k8s-1-6-1.alpha.trebuchet.ai. A 203.0.113.123
api.internal.clove-k8s-1-6-1.alpha.trebuchet.ai. A 203.0.113.123

@chrislovecnm
Copy link
Contributor Author

$ kubectl get no
Unable to connect to the server: dial tcp 203.0.113.123:443: getsockopt: no route to host

Still not happy

@justinsb justinsb added this to the backlog milestone Jan 16, 2017
@weaseal
Copy link

weaseal commented Jan 19, 2017

This is happening to me on 1.5.0-alpha3. If I use an ELB for my master nodes, all the route53 DNS entries (except for api.$NAME) get set to 203.0.113.123.

The issue does not happen every time.

@obsequiouswoe
Copy link

obsequiouswoe commented Jan 27, 2017

I had this occur on Version 1.5.0-alpha4 and all entries were set to 203.0.113.123.

Just tried again using 1.5.0-beta1 and the api entry is set to an ELB with the other 3 entries pointing at the 203.0.113.123 ip.

The most recent issue with the beta kops was related to an inability of the master / nodes to get to their DNS servers as specified in DHCP scope options. Once I fixed this I terminated the instances and let the autoscaler recreate them. Everything then correctly registered in DNS and the ELB.

@chrislovecnm
Copy link
Contributor Author

@obsequiouswoe so your aws kungfu is awesome. Can you ELI5 for me?? DNS is my weak point in life.

@obsequiouswoe
Copy link

@chrislovecnm the issue I saw was that the newly created instances were not able to connect to their assigned DNS servers. I run a custom DHCP Scope which assigns DNS servers in a different VPC. I needed to update the route table to get the relevant range routed via the peer. Once this was completed I terminated the instances so that they would be re-created by the autoscaler this then allowed them to see DNS, which once they could they were able to update their entries in Route53.

@cbluth
Copy link

cbluth commented Jan 31, 2017

I am experiencing the same issue with kops Version 1.5.0-beta1 (git-b419f20).
All of my records are set to "203.0.113.123" after using kops to create a cluster.

@chrislovecnm
Copy link
Contributor Author

@obsequiouswoe what is the fix and how do we document. DNS is always hard :(

@weaseal
Copy link

weaseal commented Feb 1, 2017

I think I understand what's happening here:

  • 203.0.113.123 is the placeholder IP that kops starts all records as just because you can't create a blank DNS record, and kops doesn't yet know the actual values since they are dynamically generated by AWS/DHCP
  • Eager systems admins (like myself above) who are first-time kops users are awaiting their k8s cluster and investigate route53 as a reason they can't reach the k8s API yet.
  • The system admin sees k8s DNS entries as 203.0.113.123 and assumes this is a mistake by kops, only having checked too early, and posts here
  • At some point kubernetes masters finish booting and update those DNS records to the real value and everything starts working.

I think this is true because the issue always seems to resolve itself after about 15 minutes. I have not done a code dive to verify.

@chrislovecnm
Copy link
Contributor Author

That is awsome! You are correct! DNS takes a long time to propagate and create. We pre create records with the 203.0.113.123 address. The container name dns-controller then sets up the DNS for us correctly. The reason that we have to use DNS is two-fold.

  1. We don't want an ELB address on the API server name. We want a friendly name
  2. We have to have etcd discovery. The way we have chosen

@chrislovecnm
Copy link
Contributor Author

@weaseal can I ask you to drop that into documentation? We need a FAQ page. Or should it go into a DNS doc ... You are the customer. Where would you expect it?

@ghost
Copy link

ghost commented Feb 22, 2017

@chrislovecnm @weaseal I faced similar issues. Currently, I am waiting for 10 minutes for this process. Should I wait for more? I am facing this issue with v1.5.2. Any suggestion and opinions?

@Marklon
Copy link

Marklon commented Feb 22, 2017

@voyalab I echo your experience, I've waited 15 minutes for DNS to update / propagate correctly. I'm not aware of any work arounds on this.

@ericln
Copy link

ericln commented Mar 28, 2017

seeing the same issue where the dns records are stuck at 203.0.113.123. It has been hours, no changes, any suggestion on troubleshoot this issue ? I do see the ELBs instances status are OutOfService

@Marklon
Copy link

Marklon commented Mar 28, 2017

Hi Eric,
I've experienced the same issue. If you haven't I would suggest exposing your master instances inbound traffic in the AWS console to accept https or whatever service you are using.

@offlinehoster
Copy link

Got the same Issue now.

@ericln
Copy link

ericln commented Mar 31, 2017

When I setup a different sub domain zone and setup the ns record in the parent domain, then I'll have this problem. However, if I leave everything in the parent domain then all the instances are registered fine in the ELB. I'm on the release branch, I was doing everything from the master branch and it was giving us all sorts of problem

@danopia
Copy link

danopia commented Apr 11, 2017

I understand the purpose of the placeholder IP and such, just chiming in to point out that kops validate doesn't have a great failure state when the IP is still there.

$ kops validate cluster
Using cluster from kubectl context: test-cluster.example.com

Validating cluster test-cluster.example.com


cannot get nodes for "test-cluster.example.com": Get https://api.test-cluster.example.com/api/v1/nodes: dial tcp 203.0.113.123:443: i/o timeout

The output mentions that there was an error talking to the placeholder IP. Seems like it would be relatively straightforward to notice the placeholder IP and print a message saying that the masters haven't updated DNS yet.

@richburdon
Copy link

richburdon commented Apr 18, 2017

Hitting this error but have confirmed that my DNS NS records match my Route 53 NS records.

I.e., that dig ns MY_DOMAIN matches aws route53 get-hosted-zone --id MY_ZONE_ID | jq ".DelegationSet.NameServers" (and that the AWS console's Route53 Test Record Set returns the correct NS entries).

Any other diagnostics I can try?

(kops 1.5.3)

@chrislovecnm
Copy link
Contributor Author

@richburdon I would open another issue or reach out of slack

@richburdon
Copy link

Thanks: #2384

(I tried slack...)

@chrislovecnm
Copy link
Contributor Author

@danopia great point - you kind filing an issue for that product enhancement?

If nobody minds I am going to close this issue in a couple of days

@chrislovecnm
Copy link
Contributor Author

I am closing

@natemurthy
Copy link

natemurthy commented Aug 10, 2017

Hi @chrislovecnm, could you either re-open this ticket or direct me to a possible solution to this problem? I can view my master node just fine with kubectl get nodes, but I don't see any of my worker nodes even after waiting for 15 minutes:

$ kops validate cluster --state=s3://[redacted]
Using cluster from kubectl context: [redacted]

Validating cluster [redacted]

INSTANCE GROUPS
NAME			ROLE	MACHINETYPE	MIN	MAX	SUBNETS
master-us-west-2b	Master	m4.large	1	1	us-west-2b
nodes			Node	t2.medium	2	2	us-west-2b

NODE STATUS
NAME						ROLE	READY
ip-xxx-xx-xx-xxx.us-west-2.compute.internal	master	True

Validation Failed
Ready Master(s) 1 out of 1.
Ready Node(s) 0 out of 2.

your nodes are NOT ready [redacted]

I believe this is a DNS issue that folks, myself included, are still observing.

@yanpozka
Copy link

I'm having the same issue than @natemurthy only the master instance is recognized by kops and kubectl, but I can see all nodes running in my Google Cloud console.

@chrislovecnm
Copy link
Contributor Author

I am opening an issue to document diagnosis of route53 / google dns problems. Not going to reopen this issue.

@chrislovecnm
Copy link
Contributor Author

Hey all. I opened an issue to document how to diagnosis problem such as these, please comment on #3888

@anillingutla
Copy link

203.0.113.123 this is an error iam remembering in my dreams.. i barely am creating cluster with kops create and the sub domain is good. iam stuck at this IP for ever. what and how did you guys solve this ? am i dealing with an improper version of KOPS ?

@anillingutla
Copy link

all that #3888 is good but its not helping me.

@jitendra-saal
Copy link

This still seems to be problem with kops Version 1.9.1.Even after waiting for hours for the DNS update(using AWS Route53),k8s api DNS still points to placeholder ip 203.0.113.123 and cluster validation fails.

@BruceDevOps
Copy link

I've had a time that after the creation it took me an overnight and the placeholder DNS still there not been replaced by the true IP. so eventually I delete the cluster... KOPS and AWS just not works that well!

@mansurali901
Copy link

As my experience if you have difference in version of kops and kubectl and kubernetes plane version then Kops will never update the Route53 enteries you must need to have the same version for all in my case

[root@ip-20-0-0-66 kuberneteswithkops]# kops version Version 1.15.0 (git-9992b4055) [root@ip-20-0-0-66 kuberneteswithkops]# kubectl version Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

@Manikandan-Raj
Copy link

I faced the same issue and tried the following steps, it got fixed.

  1. changed master node type as c4.large, earlier t2.medium
  2. updated both kobs and kubectl to 1.15.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests