Clean up watch manager #308

maxsmythe · 2019-11-26T02:46:45Z

Fixes #295

Signed-off-by: Max Smythe smythe@google.com

pkg/controller/config/config_controller_test.go

pkg/watch/manager.go

ritazh · 2019-12-10T20:44:24Z

pkg/watch/manager.go

-		default:
-			time.Sleep(5 * time.Second)
+			return nil
+		case <-ticker.C:
 			if _, err := wm.updateOrPause(); err != nil {
 				log.Error(err, "error in updateManagerLoop")


return this err

I don't think we want to do that, as it is possible for transient errors (e.g. network issue connecting to the API server) to cause an error, which would then cause the server to crash. This would cause the webhook to become unavailable.

I think it's better to have a graceful degradation model where the webhook continues to serve and custom watches could potentially recover on the next restart loop. Detection of this failure state should likely be detected by Prometheus metrics.

WDYT?

I see. we definitely dont want this to impact the webhook. So what if it continues to fail to restart the watch manager?

Prometheus alerts. We could export metrics for restart failures. Users should also monitor the status fields of constraints/templates to make sure they are operating properly.

we should make sure to test this when we add liveness probes. wdyt?

Likely not on a continual basis, as failing liveliness probes have the effect of forcing the server to reboot, which is effectively the same as exit-on-failure.

We can test the initial state on startup indirectly by validating cache warming:

List all constraints/templates/cached resources on startup

Do not report as healthy until we validate those resources have been handled

sounds good. @sozercan ^

Fixes open-policy-agent#295 Signed-off-by: Max Smythe <smythe@google.com>

Signed-off-by: Max Smythe <smythe@google.com>

maxsmythe requested review from sozercan and ctab November 26, 2019 02:46

ritazh reviewed Dec 10, 2019

View reviewed changes

pkg/controller/config/config_controller_test.go Show resolved Hide resolved

ritazh reviewed Dec 10, 2019

View reviewed changes

pkg/watch/manager.go Show resolved Hide resolved

ritazh reviewed Dec 10, 2019

View reviewed changes

ritazh approved these changes Dec 12, 2019

View reviewed changes

maxsmythe added 3 commits December 11, 2019 17:37

Clean up watch manager

9a00c17

Fixes open-policy-agent#295 Signed-off-by: Max Smythe <smythe@google.com>

Fix lint errors

0520e91

Signed-off-by: Max Smythe <smythe@google.com>

Clean imports

72ea3ad

Signed-off-by: Max Smythe <smythe@google.com>

maxsmythe force-pushed the sync-mgr-cleanup branch from c8b42cf to 72ea3ad Compare December 12, 2019 01:37

maxsmythe merged commit d07c4bb into open-policy-agent:master Dec 12, 2019

maxsmythe deleted the sync-mgr-cleanup branch December 12, 2019 22:27

sozercan mentioned this pull request Jan 10, 2020

validate cache warming before ready check #405

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up watch manager #308

Clean up watch manager #308

maxsmythe commented Nov 26, 2019

ritazh Dec 10, 2019

maxsmythe Dec 10, 2019 •

edited

Loading

ritazh Dec 10, 2019

maxsmythe Dec 11, 2019

ritazh Dec 11, 2019

maxsmythe Dec 11, 2019

ritazh Dec 11, 2019

Clean up watch manager #308

Clean up watch manager #308

Conversation

maxsmythe commented Nov 26, 2019

ritazh Dec 10, 2019

Choose a reason for hiding this comment

maxsmythe Dec 10, 2019 • edited Loading

Choose a reason for hiding this comment

ritazh Dec 10, 2019

Choose a reason for hiding this comment

maxsmythe Dec 11, 2019

Choose a reason for hiding this comment

ritazh Dec 11, 2019

Choose a reason for hiding this comment

maxsmythe Dec 11, 2019

Choose a reason for hiding this comment

ritazh Dec 11, 2019

Choose a reason for hiding this comment

maxsmythe Dec 10, 2019 •

edited

Loading