Incorrect Upstream causes endless error logs in gloo container #3761

lgadban · 2020-10-19T16:40:10Z

Describe the bug
When there is an Upstream that contains an invalid destination, specifically a missing k8s Service, the gloo container will endlessly log errors.

To Reproduce
Steps to reproduce the behavior:

Create an Upstream that points to a nonexistent Service
Check gloo pod and see errors like:

{"level":"error","ts":1603124079.3074384,"logger":"gloo-ee.v1.event_loop.setup.v1.event_loop.syncer.kubernetes_eds","caller":"kubernetes/eds.go:206","msg":"upstream gloo-system.bad-upstream: port 8080 not found for service bad-petstore","version":"1.5.1","stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.filterEndpoints\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.5.1/projects/gloo/pkg/plugins/kubernetes/eds.go:206\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).List\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.5.1/projects/gloo/pkg/plugins/kubernetes/eds.go:121\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func1\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.5.1/projects/gloo/pkg/plugins/kubernetes/eds.go:130\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func2\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.5.1/projects/gloo/pkg/plugins/kubernetes/eds.go:157"}
{"level":"error","ts":1603124079.8723693,"logger":"gloo-ee.v1.event_loop.setup.v1.event_loop.syncer.kubernetes_eds","caller":"kubernetes/eds.go:206","msg":"upstream gloo-system.bad-upstream: port 8080 not found for service bad-petstore","version":"1.5.1","stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.filterEndpoints\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.5.1/projects/gloo/pkg/plugins/kubernetes/eds.go:206\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).List\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.5.1/projects/gloo/pkg/plugins/kubernetes/eds.go:121\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func1\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.5.1/projects/gloo/pkg/plugins/kubernetes/eds.go:130\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func2\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.5.1/projects/gloo/pkg/plugins/kubernetes/eds.go:157"}

Expected behavior
Not endless logs of errors

Additionally, is there a possibility that this scenario could fall under invalid route replacement?
For example if a route references an Upstream in the above state, i.e. when a "rejected" Upstream is used in a route, could it be treated the same as a route to a nonexistent Upstream?

Additional context
Add any other context about the problem here, e.g.

Client: {"version":"1.5.0"}
Server: {"type":"Gateway","enterprise":true,"kubernetes":{"containers":[{"Tag":"1.5.1","Name":"grpcserver-ui","Registry":"quay.io/solo-io"},{"Tag":"1.5.1","Name":"grpcserver-ee","Registry":"quay.io/solo-io"},{"Tag":"1.5.1","Name":"grpcserver-envoy","Registry":"quay.io/solo-io"},{"Tag":"1.5.1","Name":"discovery","Registry":"quay.io/solo-io"},{"Tag":"1.5.1","Name":"extauth-ee","Registry":"quay.io/solo-io"},{"Tag":"1.5.1","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"1.5.1","Name":"gloo-ee-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.5.1","Name":"gloo-ee","Registry":"quay.io/solo-io"},{"Tag":"1.5.1","Name":"rate-limit-ee","Registry":"quay.io/solo-io"},{"Tag":"5","Name":"redis","Registry":"docker.io"}],"namespace":"gloo-system"}}

The text was updated successfully, but these errors were encountered:

kdorosh · 2021-03-26T18:26:10Z

w.r.t.

Additionally, is there a possibility that this scenario could fall under invalid route replacement?
For example if a route references an Upstream in the above state, i.e. when a "rejected" Upstream is used in a route, could it be treated the same as a route to a nonexistent Upstream?

Just tested this on 1.7.0-rc2 open source, as part of #4497 and confirmed that the case is covered.

kdorosh · 2021-03-29T17:37:53Z

Sample upstream to apply and reproduce:

apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
  name: bad-upstream
  namespace: gloo-system
spec:
  kube:
    serviceName: doesnotexist
    serviceNamespace: default
    servicePort: 443

kdorosh · 2021-03-29T17:51:14Z

One potential idea for a fix is to update the cache eds logic to fire less often here (ignore endpoint updates that don't result in hash differences, just like we do for the snapshot syncing)

we could do that here

gloo/projects/gloo/pkg/plugins/kubernetes/eds.go

Line 121 in 780b59e

    
           return filterEndpoints(ctx, writeNamespace, endpointList, serviceList, podList, c.upstreams), nil

kdorosh · 2021-03-30T14:01:59Z

Have a potential fix for k8s to push up, tested on consul and the refresh is a lot more sane. No need to bring the change to other EDS types

EItanya · 2021-03-30T14:42:42Z

One potential idea for a fix is to update the cache eds logic to fire less often here (ignore endpoint updates that don't result in hash differences, just like we do for the snapshot syncing)

we could do that here

gloo/projects/gloo/pkg/plugins/kubernetes/eds.go

Line 121 in 780b59e

return filterEndpoints(ctx, writeNamespace, endpointList, serviceList, podList, c.upstreams), nil

I really like that idea. Have you done any perf testing on that? I think we should test with a very large env before going too far with this

lgadban added the Type: Bug Something isn't working label Oct 19, 2020

lgadban mentioned this issue Oct 28, 2020

Provide option for gloo to work with rejected Upstreams #3797

Closed

kdorosh assigned kdorosh and unassigned kdorosh Nov 12, 2020

kdorosh self-assigned this Mar 26, 2021

solo-changelog-bot bot mentioned this issue Mar 30, 2021

Less noisy eds logs #4515

Merged

8 tasks

soloio-bulldozer bot closed this as completed in #4515 Apr 1, 2021

solo-changelog-bot bot mentioned this issue Apr 1, 2021

Less noisy eds logs 1.7 #4531

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Upstream causes endless error logs in gloo container #3761

Incorrect Upstream causes endless error logs in gloo container #3761

lgadban commented Oct 19, 2020

kdorosh commented Mar 26, 2021

kdorosh commented Mar 29, 2021

kdorosh commented Mar 29, 2021 •

edited

Loading

kdorosh commented Mar 30, 2021

EItanya commented Mar 30, 2021

Incorrect Upstream causes endless error logs in gloo container #3761

Incorrect Upstream causes endless error logs in gloo container #3761

Comments

lgadban commented Oct 19, 2020

kdorosh commented Mar 26, 2021

kdorosh commented Mar 29, 2021

kdorosh commented Mar 29, 2021 • edited Loading

kdorosh commented Mar 30, 2021

EItanya commented Mar 30, 2021

kdorosh commented Mar 29, 2021 •

edited

Loading