Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nginx ingress controller unable to obtain endpoints for service(s) #19827

Open
eengelking opened this issue Apr 25, 2019 · 15 comments

Comments

@eengelking
Copy link

commented Apr 25, 2019

When deploying a fresh installation of Rancher v2.2.2, I create a new project and namespace, create a new workload with a simple nginx:latest pod, and attach a L7 load balancer to the workload with a hostname/URL and SSL certificate.

Expectation
The NGINX Ingress Controller will allow traffic to reach the workload/pods and and display the NGINX welcome page.

Actual Result
Traffic to the URL times out. The logs from the NGINX ingress controller throw the following error:
W0425 22:08:03.044641 7 controller.go:753] Error obtaining Endpoints for Service "onboarding/ingress-b9f5db46e8a39e9a7cfb5fcd09aaf56e": no object matching key "onboarding/ingress-b9f5db46e8a39e9a7cfb5fcd09aaf56e" in local store

As a result, it is impossible to deploy any project/namespace with NGINX Ingress to be reached from the outside world.

Note: This exact same configuration worked without issue in build 2.2.1.

Other Testing Performed

  • I've tried deleting and redeploying the LB, both with and without the SSL cert but with the same results.
  • I've tried downgrading the version/configs of the nginx ingress controllers and the nginx http backend to match that of version 2.2.1, but to no avail.
  • I've tried deploying other Ingress methods, but also to no avail.

Other Notes
Whenever I attempt to edit the YAML, it comes up blank. This is true for editing the YAML for the Project Namespace Ingress YAML, the System Ingress-Nginx YAML, and the System Ingress-Nginx Config Maps YAML.


Useful Info
Versions Rancher v2.2.2 UI: v2.2.41
Route containers.index
@bingyupiaoyao

This comment has been minimized.

Copy link

commented Apr 26, 2019

the same issue for me, any work around?

@eengelking

This comment has been minimized.

Copy link
Author

commented Apr 26, 2019

I've reached out to friends who are using 3rd party ingress (Traefik) on the same build and they report they're not having any issues, so it looks like it's just around the built-in NGINX ingress. I'm going to try out Traefik today and see if that makes any difference as this is a pretty big blocker for me.

@eengelking

This comment has been minimized.

Copy link
Author

commented Apr 26, 2019

Also, here's a Rancher forum post on the issue as well:
https://forums.rancher.com/t/ingress-not-generating-hostnames/14020

@bingyupiaoyao

This comment has been minimized.

Copy link

commented Apr 26, 2019

I've reached out to friends who are using 3rd party ingress (Traefik) on the same build and they report they're not having any issues, so it looks like it's just around the built-in NGINX ingress. I'm going to try out Traefik today and see if that makes any difference as this is a pretty big blocker for me.

Then is there any tutorial for installing Traefik with Rancher?

@eengelking

This comment has been minimized.

Copy link
Author

commented Apr 26, 2019

Not a lot, unfortunately. If I'm successful, I'll write up something.

@eengelking

This comment has been minimized.

Copy link
Author

commented Apr 26, 2019

I was able to get Traefik up and running successfully, however it experiences the exact same issues:

Apr 26 16:35:30 traefik-69c57d4d66-9h224 traefik error {"level":"error","msg":"Service not found for onboarding/ingress-b9f5db46e8a39e9a7cfb5fcd09aaf56e","time":"2019-04-26T23:35:30Z"}
Apr 26 16:35:30 traefik-69c57d4d66-9h224 traefik info {"level":"info","msg":"Server configuration reloaded on :8080","time":"2019-04-26T23:35:30Z"}
Apr 26 16:35:30 traefik-69c57d4d66-9h224 traefik info {"level":"info","msg":"Server configuration reloaded on :80","time":"2019-04-26T23:35:30Z"}
Apr 26 16:35:30 traefik-69c57d4d66-9h224 traefik info {"level":"info","msg":"Server configuration reloaded on :443","time":"2019-04-26T23:35:30Z"}
Apr 26 16:35:30 traefik-69c57d4d66-9h224 traefik warning {"level":"warning","msg":"Endpoints not found for onboarding/ingress-b9f5db46e8a39e9a7cfb5fcd09aaf56e","time":"2019-04-26T23:35:30Z"}
Apr 26 16:35:30 traefik-69c57d4d66-9h224 traefik warning {"level":"warning","msg":"Endpoints not found for onboarding/ingress-b9f5db46e8a39e9a7cfb5fcd09aaf56e","time":"2019-04-26T23:35:30Z"}
Apr 26 16:35:30 traefik-69c57d4d66-9h224 traefik warning {"level":"warning","msg":"Endpoints not available for onboarding/ingress-b9f5db46e8a39e9a7cfb5fcd09aaf56e","time":"2019-04-26T23:35:30Z"}```
@mholasek

This comment has been minimized.

Copy link

commented Apr 27, 2019

I have the same problem with nginx ingress. Added a new cluster and get the same errors. I installed a new cluster from 2.2.2. It is not possible to access any services via ingress. Another cluster, created 2 days ago works well. Both created as Rancher Launched Clusters with Rancher 2.2.2.

@mholasek

This comment has been minimized.

Copy link

commented Apr 28, 2019

I have the same problem with nginx ingress. Added a new cluster and get the same errors. I installed a new cluster from 2.2.2. It is not possible to access any services via ingress. Another cluster, created 2 days ago works well. Both created as Rancher Launched Clusters with Rancher 2.2.2.

Problem fixed. It was a misconfiguration of the layer 4 loadbalancer in front of the worker nodes. The error message seems ok and is just for information if an ingress will be created for a new service.

@eengelking

This comment has been minimized.

Copy link
Author

commented Apr 29, 2019

I was able to get Traefik up and running this morning with proper ingress. I had missed a few things on Friday which were important:

  • I had mapped my URL directly to the worker IP for testing instead of the L7 LB that Traefik had spun up in AWS. This obviously wouldn't work as the traffic needs to go through Traefik.
  • Traefik spins up a L7 LB in AWS and adds nodes to it. This is the ingress point to use.
  • The L7 LB in AWS only had a worker node and none of the cluster nodes, where the Traefik service lives. Edited the instances to only use the cluster nodes instead of the workers.
  • The L7 LB in AWS was only configured to use a single zone. Reconfigured to allow use across the three zones I'm using.
@eengelking

This comment has been minimized.

Copy link
Author

commented Apr 29, 2019

I also noticed that the ingress was stuck on initializing in Rancher, but it did show up in Traefik.
image
image

@azubizarretap

This comment has been minimized.

Copy link

commented May 19, 2019

Facing the same issue. Unable to get nginx ingress to work. Load balancer layer 4 disabled.

How did you manage to fix this ezactly @mholasek ?

Thanks!

@mholasek

This comment has been minimized.

Copy link

commented May 20, 2019

@azubizarretap as I mentioned my issue was not rancher/kubernetes related. I had a misconfiguration with my HA Proxy Loadbalancer (configured as layer 4). Works fine including forwarding client ip address by using the proxy protocol.

@marcstreeter

This comment has been minimized.

Copy link

commented May 23, 2019

@mholasek would you mind sharing some of your HAProxy config, I too am using HAProxy as a layer 4 load balancer (proxying over the tcp connection) and I'm having this issue.

@qcihdtm

This comment has been minimized.

Copy link

commented May 26, 2019

Hi, I experience either the same issue or a very similar one. I thought that I was doing something wrong and I was gonna shamelessly ask for help but it seems there actually is an issue with the ingress?

I have installed Rancher 2.2.3 on my server with hostname rancher.mydomain.com and IP 192.168.1.10.

I have created a cluster with 3 worker nodes (worker1.mydomain.com, worker2.mydomain.com and worker3.mydomain.com).

I have deployed a workload (nginxdemos/hello:plain-text) to it (1 pod per node) publishing the container port 80 as a random nodeport.

I can access the nginx page at port 3xxxx when pointing to each of the nodes ip.

The issue starts now:

I create an ingress specifying a hostname (test.mydomain.com) that I have defined in my dns pointing to the same IP as rancher 192.168.1.10.

  • If I add a target backend (workload) that points to the workload created above and port 80... when accessing test.mydomain.com, I get the rancher login page.

  • If I add a target backend (workload) that points to the workload created above and port 81... when accessing test.mydomain.com, I get the ERR_CONNECTION_REFUSED (I don't have the port open so this seems correct)

  • If I add a target backend (service "wrkld1") that points to the service created above and port 80... when accessing test.mydomain.com, I get the rancher login page.

  • If I add a target backend (service - "wrkld1") that points to the service created above and port 80tcp01-wrkld1... when accessing test.mydomain.com, I get the rancher login page.

  • If I add a target backend (service - "wrkld1-nodeport") that points to the service created above and port 80... when accessing test.mydomain.com, I get the rancher login page.

  • If I add a target backend (service - "wrkld1-nodeport") that points to the service created above and port 80tcp01... when accessing test.mydomain.com, I get the rancher login page.

Am I experiencing the same issue or am I doing something wrong?

@Napsty

This comment has been minimized.

Copy link

commented Sep 11, 2019

Same problem here with Rancher 2.2.8 and Kubernetes 15 (experimental). The ingress rules haven't been changed for a while, so it is likely that this happens on other Rancher 2.2.x and Kubernetes versions as well.

I notice that changes in the ingress rules now create a different upstream naming in the nginx.conf on the nginx-ingress-controller pods:

$ grep proxy_upstream_name nginx.conf
	log_format upstreaminfo '$the_real_ip - [$the_real_ip] - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id';
		set $proxy_upstream_name "-";
			set $proxy_upstream_name "upstream-default-backend";
		set $proxy_upstream_name "-";
			set $proxy_upstream_name "gamma-admin-3000";
		set $proxy_upstream_name "-";
			set $proxy_upstream_name "gamma-public-api-8080";
			set $proxy_upstream_name "gamma-public-api-8080";
		set $proxy_upstream_name "-";
			set $proxy_upstream_name "gamma-ingress-90ee0bbf0004b6d230e5629db7fafcef-3000";
		set $proxy_upstream_name "-";
			set $proxy_upstream_name "gamma-internal-api-80";
		set $proxy_upstream_name "-";
			set $proxy_upstream_name "gamma-ingress-a7a76b7258e4b4cc9f5b686bd3fe506d-3000";
		set $proxy_upstream_name "-";
			set $proxy_upstream_name "gamma-web-3000";
		set $proxy_upstream_name "-";
		set $proxy_upstream_name "-";
			set $proxy_upstream_name "internal";
			set $proxy_upstream_name "upstream-default-backend";
	lua_add_variable $proxy_upstream_name;

Old naming: "project-workload-port", e.g. "gamma-web-3000"
New naming: "prjoject-ingressServiceId-port", e.g. "gamma-ingress-a7a76b7258e4b4cc9f5b686bd3fe506d-3000"
The requests to the endpoints (each with a different hostname) with the new upstream naming all result in HTTP error 503.

Update: Turns out these new naming is kind of a workaround to define a workload as endpoint. It creates a service in service discovery with the long ingress-name. However with Kubernetes 1.15 the service entries in service discovery are not created. As 1.15 is still marked "experimental" as of today, its perfectly OK.

TLDR: Works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.