New ExternalName services aren't detected consistently #7346

bsod90 · 2021-07-13T20:36:06Z

NGINX Ingress controller version: 0.46.0

Kubernetes version (use kubectl version): 1.18

Environment:

Cloud provider or hardware configuration: AWS EKS
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

What happened:

We use https://github.com/metacontroller/metacontroller to listen for changes in our database and automatically create Services/Ingress rules in the k8s cluster. In this particular scenario, we're creating a bunch of ExternalName services that are all pointing at different internal load balancers further down our infrastructure.
I noticed that sometimes when we add a new Ingress/Service it won't work right away, but give me 503s instead. The only way to fix it is to restart the nginx controller and force it to re-validate the entire config.
I suspect there's some sort of a race-condition happening here, but I'm not sure. More details at the bottom of this issue.

What you expected to happen:

New services to be properly detected and nginx routing traffic to our downstream backends instead of giving 503s.

More details:

These are the kind of services we create (all looking the same, just IDs are different):

And these are the ingress rules:

This is ingress-controller failing to read the service configuration, saying no object matching key <...> in local store

The service actually exists, but it might've been added slightly after the ingress rule, depending on how metacontroller orchestrated the update. The ingress controller can't read it on the first try, but I'm wondering why it's not retrying it later and why it's not detecting the moment when the service is actually added to K8S.

Sometimes the entire process crashes and forces the full config to reload. This makes the newly added service immediately available, as well as all the others that weren't detected before. Manual pod restart has the same effect.

I couldn't find a way to better isolate this issue and make it reproduce reliably, but I'll post below if I have any updates on that. Thank you for any help!

Anything else we need to know:

/kind bug

The text was updated successfully, but these errors were encountered:

longwuyuan · 2021-07-14T15:21:29Z

Hi,
Am curious about one aspect here.

this is the ingress-nginx-controller project
ingress-controller processes ingress objects
an ingress object has a backend service as one of its fields in the spec
the backend service of a ingress object is explained in the docs like this:
- Service: A Kubernetes Service that identifies a set of Pods using label selectors. Unless mentioned otherwise, Services are assumed to have virtual IPs only routable within the cluster network https://kubernetes.io/docs/concepts/services-networking/ingress/#terminology
a kubernetes service of type "externalName" is explained like this in docs
- ExternalName: Maps the Service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up https://kubernetes.io/docs/concepts/services-networking/service/#externalname

The curiosity is, are you using a service of type "externalName" as a backend-service in a ingress resource definition

/remove-kind bug
/triage needs-information

bsod90 · 2021-07-14T18:08:05Z

Yep, I'm using the service of type externalName as an ingress backend, isn't this supported?
https://kubernetes.github.io/ingress-nginx/e2e-tests/#service-type-externalname these are some tests I found that suggest ExternalName services are supported...

longwuyuan · 2021-07-15T03:24:36Z

kubernetes/kubernetes#103675 Additional Advisory A similar attack is possible using Ingress implementations that support forwarding to ExternalName Services. This can be used to forward to Services in other namespaces or, in some cases, sensitive endpoints within the Ingress implementation. If you are using the Ingress API, we recommend confirming that the implementation you’re using either does not support forwarding to ExternalName Services or supports disabling the functionality. Thanks, -- ; Long 14 Jul 2021, 23:38 by ***@***.***:

…

Yep, I'm using the service of type > externalName> as an ingress backend, isn't this supported? > https://kubernetes.github.io/ingress-nginx/e2e-tests/#service-type-externalname> these are some tests I found that suggest ExternalName services are supported... — You are receiving this because you commented. Reply to this email directly, > view it on GitHub <#7346 (comment)>> , or > unsubscribe <https://github.com/notifications/unsubscribe-auth/ABGZVWQOWPLQANOE5337UGTTXXHBBANCNFSM5AKCUCYA>> .

bsod90 · 2021-07-20T00:40:17Z

Thanks, @longwuyuan for the notice, although I don't think we're exposed to this vulnerability as we're the only users of our Ingress API.

On the other note. I think I can reproduce this issue more or less reliably in my environment. To me, it looks like the scenario is simple:

The Ingress rule gets added first (at that time the backend service is not yet present)
The Service is added a second later
Nginx controller for some reason fails to detect that addition and does not re-sync the Ingress
We get 503

Looking at this line in the store.go

ingress-nginx/internal/ingress/controller/store/store.go

Line 594 in 9e274dd

serviceHandler := cache.ResourceEventHandlerFuncs{

it seems to be indeed the case as it only handles Service modifications here.
I validated it by trying to modify a service for which I was getting 503 and it worked: I added a dummy annotation, which immediately triggered the Ingress re-sync and my service became available right away.

I wondering if there's a specific reason for omitting the AddFunc on the Service cache handler or it's simply a mistake.

Normally Ingress sinchronization for Services is triggered when corresponding Service's Endpoints are added, deleted or modified. Services of type ExternalName, however, do not have any endpoints and hence do not trigger Ingress synchronization as only Update events are being watched. This commit makes sure that Update and Delete Service events also enqueue a syncIngress task.

…ernetes#7374) Normally Ingress sinchronization for Services is triggered when corresponding Service's Endpoints are added, deleted or modified. Services of type ExternalName, however, do not have any endpoints and hence do not trigger Ingress synchronization as only Update events are being watched. This commit makes sure that Update and Delete Service events also enqueue a syncIngress task.

bsod90 added the kind/bug Categorizes issue or PR as related to a bug. label Jul 13, 2021

k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 14, 2021

bsod90 mentioned this issue Jul 20, 2021

Trigger syncIngress on Service addition/deletion #7346 #7374

Merged

8 tasks

k8s-ci-robot closed this as completed in #7374 Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New ExternalName services aren't detected consistently #7346

New ExternalName services aren't detected consistently #7346

bsod90 commented Jul 13, 2021 •

edited

longwuyuan commented Jul 14, 2021

bsod90 commented Jul 14, 2021

longwuyuan commented Jul 15, 2021 via email

bsod90 commented Jul 20, 2021

New ExternalName services aren't detected consistently #7346

New ExternalName services aren't detected consistently #7346

Comments

bsod90 commented Jul 13, 2021 • edited

longwuyuan commented Jul 14, 2021

bsod90 commented Jul 14, 2021

longwuyuan commented Jul 15, 2021 via email

bsod90 commented Jul 20, 2021

bsod90 commented Jul 13, 2021 •

edited