Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gce: getInstancesByNames lookup by canonicalized names #26897

Closed
glerchundi opened this issue Jun 6, 2016 · 5 comments
Closed

gce: getInstancesByNames lookup by canonicalized names #26897

glerchundi opened this issue Jun 6, 2016 · 5 comments
Labels
area/os/coreos lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@glerchundi
Copy link

Hi guys,

I'm having a lot of troubles getting up my cluster with provider load balancers, concretely with GCE. controller-manager fails trying to retrieve instances by hostnames (using getInstancesByNames method):

I0606 16:49:08.399340       1 gce.go:474] EnsureLoadBalancer(aa271f4332bfa11e69cfe42010a00000, us-central1, <nil>, [TCP/80 TCP/443], [k8s-worker-lx53.c.mr-potato.internal k8s-worker-xog8.c.mr-potato.internal k8s-worker-1pg4.c.mr-potato.internal k8s-worker-buvc.c.mr-potato.internal], default/lb, map[])
E0606 16:49:08.464763       1 gce.go:2368] Failed to retrieve instance: "k8s-worker-lx53.c.mr-potato.internal"
E0606 16:49:08.464894       1 servicecontroller.go:196] Failed to process service delta. Retrying in 5m0s: Failed to create load balancer for service default/lb: instance not found

I tried to compare the gce access implementation used in 1.1.4 (because I get into this when I was trying a cluster upgrade from 1.1.4 to 1.2.3) with the master implementation and there is something that caught my attention.

Why this piece of code doesn't make use of canonicalizeInstanceName-ed names instead of the ones provided by the user (by user I mean the full computer hostname):

https://github.com/kubernetes/kubernetes/blob/v1.2.3/pkg/cloudprovider/providers/gce/gce.go#L2364-L2372

Maybe this is the reason why I'm getting 'instance not found' all the time.

Thanks in advance,

PD.: Before this I tried to search something related to this in stackoverflow and asked in slack with no answers at all...

@glerchundi
Copy link
Author

glerchundi commented Jun 7, 2016

I firmly believe the issue is exactly what I reported because I was finally able to change the instance templates hostnames by using an overriden hostname (through kubelet --hostname-override=$(hostname -s) arg) which transforms the current CoreOS machine FQDN (k8s-worker-lx53.c.mr-potato.internal) into a shortened version one (k8s-worker-lx53) and after this everything worked properly.

Apparently the difference states in how CoreOS handles uname -n which outputs the FQDN. In contrast, Debian (GKE?) shortens the hostname: https://github.com/kubernetes/kubernetes/blob/v1.2.3/pkg/util/node/node.go#L29-L38.

I can send a PR fixing this if you can confirm the behaviour on your side.

@j3ffml j3ffml added area/os/coreos sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 28, 2016
@shahidhk
Copy link

I have also faced this issue, k8s v1.3.4 on coreos gce. The instance name is set as instance-1, but the hostname comes out to be instance-1.c.project-1.internal.

$ hostname
instance-1.c.project-1.internal
$ hostname -s
instance-1
$ uname -n
instance-1.c.project-1.internal

Now when I try to create a loadbalancer or nginx, this happens: (log from kube-controller-manager)

E0827 20:11:36.650762       1 gce.go:2609] Failed to retrieve instance: "instance-1.c.project-1.internal"
E0827 20:11:36.650867       1 servicecontroller.go:201] Failed to process service delta. Retrying in 5m0s: Failed to create load balancer for service default/nginx: instance not found

As @glerchundi mentioned, can somebody please confirm this behaviour?

@fejta-bot
Copy link

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 17, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 16, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/os/coreos lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

5 participants