Worker routines get stuck #99

timebertt · 2020-12-09T09:35:09Z

How to categorize this issue?

/area robustness
/kind bug
/priority normal

What happened:

We have observed some situations, were grm gets stuck reconciling a specific managed resource and does not act upon it anymore.
In all cases I observed, it was either happening in conjunction with a longer period of downtime of the source or target API server (before #95) or a large amount of secret data in the target cluster (like described in #92).

What you expected to happen:

grm should not get stuck and reconcile all managed resources with the given sync interval.

How to reproduce it (as minimally and precisely as possible):

Not sure yet.
My guess would be that the worker goroutines get stuck in some WaitForCacheSync, when the API server is unavailable for a longer period of time or the amount of watched data is to big.

Anything else we need to know?:

Environment:

Gardener-Resource-Manager version: v0.20.0
Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
Others:

The text was updated successfully, but these errors were encountered:

timebertt · 2020-12-09T09:37:30Z

I think, a possible solution or at least one good first step would be to use a context with a timeout for each reconciliation (e.g. 1m).
This way, the WaitForCacheSync funcs will return with false and the key will be marked done in the queue, so it can be reconciled again.

rfranzke · 2021-01-08T07:36:04Z

/assign

timebertt added the kind/bug Bug label Dec 9, 2020

gardener-robot added area/robustness Robustness, reliability, resilience related priority/normal labels Dec 9, 2020

gardener-robot assigned rfranzke Jan 8, 2021

rfranzke mentioned this issue Jan 8, 2021

Time limit for cache sync on startup and controller reconciliations #102

Merged

timebertt closed this as completed in #102 Jan 8, 2021

gardener-robot added the priority/3 Priority (lower number equals higher priority) label Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker routines get stuck #99

Worker routines get stuck #99

timebertt commented Dec 9, 2020

timebertt commented Dec 9, 2020

rfranzke commented Jan 8, 2021

Worker routines get stuck #99

Worker routines get stuck #99

Comments

timebertt commented Dec 9, 2020

timebertt commented Dec 9, 2020

rfranzke commented Jan 8, 2021