Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ExternalName services aren't detected consistently #7346

Closed
bsod90 opened this issue Jul 13, 2021 · 4 comments · Fixed by #7374
Closed

New ExternalName services aren't detected consistently #7346

bsod90 opened this issue Jul 13, 2021 · 4 comments · Fixed by #7374
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@bsod90
Copy link
Contributor

bsod90 commented Jul 13, 2021

NGINX Ingress controller version: 0.46.0

Kubernetes version (use kubectl version): 1.18

Environment:

  • Cloud provider or hardware configuration: AWS EKS
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

We use https://github.com/metacontroller/metacontroller to listen for changes in our database and automatically create Services/Ingress rules in the k8s cluster. In this particular scenario, we're creating a bunch of ExternalName services that are all pointing at different internal load balancers further down our infrastructure.
I noticed that sometimes when we add a new Ingress/Service it won't work right away, but give me 503s instead. The only way to fix it is to restart the nginx controller and force it to re-validate the entire config.
I suspect there's some sort of a race-condition happening here, but I'm not sure. More details at the bottom of this issue.

What you expected to happen:

New services to be properly detected and nginx routing traffic to our downstream backends instead of giving 503s.

More details:

These are the kind of services we create (all looking the same, just IDs are different):
Screen Shot 2021-07-13 at 12 59 54 PM

And these are the ingress rules:
Screen Shot 2021-07-13 at 1 02 15 PM

This is ingress-controller failing to read the service configuration, saying no object matching key <...> in local store
Screen Shot 2021-07-13 at 1 12 34 PM

The service actually exists, but it might've been added slightly after the ingress rule, depending on how metacontroller orchestrated the update. The ingress controller can't read it on the first try, but I'm wondering why it's not retrying it later and why it's not detecting the moment when the service is actually added to K8S.

Sometimes the entire process crashes and forces the full config to reload. This makes the newly added service immediately available, as well as all the others that weren't detected before. Manual pod restart has the same effect.
Screen Shot 2021-07-13 at 12 58 36 PM

I couldn't find a way to better isolate this issue and make it reproduce reliably, but I'll post below if I have any updates on that. Thank you for any help!

Anything else we need to know:

/kind bug

@bsod90 bsod90 added the kind/bug Categorizes issue or PR as related to a bug. label Jul 13, 2021
@longwuyuan
Copy link
Contributor

Hi,
Am curious about one aspect here.

  • this is the ingress-nginx-controller project
  • ingress-controller processes ingress objects
  • an ingress object has a backend service as one of its fields in the spec
  • the backend service of a ingress object is explained in the docs like this:
  • a kubernetes service of type "externalName" is explained like this in docs

The curiosity is, are you using a service of type "externalName" as a backend-service in a ingress resource definition

/remove-kind bug
/triage needs-information

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 14, 2021
@bsod90
Copy link
Contributor Author

bsod90 commented Jul 14, 2021

Yep, I'm using the service of type externalName as an ingress backend, isn't this supported?
https://kubernetes.github.io/ingress-nginx/e2e-tests/#service-type-externalname these are some tests I found that suggest ExternalName services are supported...

@longwuyuan
Copy link
Contributor

longwuyuan commented Jul 15, 2021 via email

@bsod90
Copy link
Contributor Author

bsod90 commented Jul 20, 2021

Thanks, @longwuyuan for the notice, although I don't think we're exposed to this vulnerability as we're the only users of our Ingress API.

On the other note. I think I can reproduce this issue more or less reliably in my environment. To me, it looks like the scenario is simple:

  • The Ingress rule gets added first (at that time the backend service is not yet present)
  • The Service is added a second later
  • Nginx controller for some reason fails to detect that addition and does not re-sync the Ingress
  • We get 503

Looking at this line in the store.go

serviceHandler := cache.ResourceEventHandlerFuncs{
it seems to be indeed the case as it only handles Service modifications here.
I validated it by trying to modify a service for which I was getting 503 and it worked: I added a dummy annotation, which immediately triggered the Ingress re-sync and my service became available right away.

I wondering if there's a specific reason for omitting the AddFunc on the Service cache handler or it's simply a mistake.

bsod90 added a commit to bsod90/ingress-nginx that referenced this issue Jul 20, 2021
Normally Ingress sinchronization for Services is triggered when
corresponding Service's Endpoints are added, deleted or modified.
Services of type ExternalName, however, do not have any endpoints
and hence do not trigger Ingress synchronization as only Update
events are being watched. This commit makes sure that Update and
Delete Service events also enqueue a syncIngress task.
bsod90 added a commit to bsod90/ingress-nginx that referenced this issue Jul 26, 2021
Normally Ingress sinchronization for Services is triggered when
corresponding Service's Endpoints are added, deleted or modified.
Services of type ExternalName, however, do not have any endpoints
and hence do not trigger Ingress synchronization as only Update
events are being watched. This commit makes sure that Update and
Delete Service events also enqueue a syncIngress task.
bsod90 added a commit to bsod90/ingress-nginx that referenced this issue Jul 26, 2021
Normally Ingress sinchronization for Services is triggered when
corresponding Service's Endpoints are added, deleted or modified.
Services of type ExternalName, however, do not have any endpoints
and hence do not trigger Ingress synchronization as only Update
events are being watched. This commit makes sure that Update and
Delete Service events also enqueue a syncIngress task.
bsod90 added a commit to bsod90/ingress-nginx that referenced this issue Jul 27, 2021
Normally Ingress sinchronization for Services is triggered when
corresponding Service's Endpoints are added, deleted or modified.
Services of type ExternalName, however, do not have any endpoints
and hence do not trigger Ingress synchronization as only Update
events are being watched. This commit makes sure that Update and
Delete Service events also enqueue a syncIngress task.
bsod90 added a commit to bsod90/ingress-nginx that referenced this issue Aug 24, 2021
Normally Ingress sinchronization for Services is triggered when
corresponding Service's Endpoints are added, deleted or modified.
Services of type ExternalName, however, do not have any endpoints
and hence do not trigger Ingress synchronization as only Update
events are being watched. This commit makes sure that Update and
Delete Service events also enqueue a syncIngress task.
bsod90 added a commit to bsod90/ingress-nginx that referenced this issue Aug 24, 2021
Normally Ingress sinchronization for Services is triggered when
corresponding Service's Endpoints are added, deleted or modified.
Services of type ExternalName, however, do not have any endpoints
and hence do not trigger Ingress synchronization as only Update
events are being watched. This commit makes sure that Update and
Delete Service events also enqueue a syncIngress task.
k8s-ci-robot pushed a commit that referenced this issue Sep 7, 2021
Normally Ingress sinchronization for Services is triggered when
corresponding Service's Endpoints are added, deleted or modified.
Services of type ExternalName, however, do not have any endpoints
and hence do not trigger Ingress synchronization as only Update
events are being watched. This commit makes sure that Update and
Delete Service events also enqueue a syncIngress task.
rchshld pushed a commit to joomcode/ingress-nginx that referenced this issue May 19, 2023
…ernetes#7374)

Normally Ingress sinchronization for Services is triggered when
corresponding Service's Endpoints are added, deleted or modified.
Services of type ExternalName, however, do not have any endpoints
and hence do not trigger Ingress synchronization as only Update
events are being watched. This commit makes sure that Update and
Delete Service events also enqueue a syncIngress task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants