Readiness tracker collect invalid expects #750

brycecr · 2020-07-24T20:49:20Z

What this PR does / why we need it:
Adds a goroutine that polls for objects the readiness tracker expects, but have been deleted. Without this, there is a race where an object can be expected by the readiness tracker but has been deleted, and the controller that watches for the resource missed the delete because it is using a cached client instead of the readiness tracker's uncached client.

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):
Fixes #686

Special notes for your reviewer:

brycecr · 2020-07-24T20:53:18Z

@shomron @maxsmythe don't believe I'm permitted to add you as reviewers, but PTAL at the PR when you have the time! Thank you!

shomron

Thanks for submitting this @brycecr! This is looking like a big improvement!
I've submitted early comments, there's a few additional cases I want to ponder (relationship between a cancelled CT and expected constraints) but it might take until Monday.

pkg/readiness/ready_tracker.go

shomron · 2020-07-24T23:06:42Z

pkg/readiness/ready_tracker.go

+	if es == nil {
+		return fmt.Errorf("nil Expectations provided to collectForObjectTracker")
+	} else if !es.Populated() || es.Satisfied() {
+		log.Info("Expectations unpopulated or already satisfied, skipping collection")


We might get this log every 500ms?

Also nit: readability can be improved somewhat by removing the else. ref: https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow

If a tracker is satisfied, but not the overall readiness, then yes you'd get this message every tick. It seems a little spammy. I moved to debug -- does that seem reasonable?

BTW the timing was just something I chose that seemed roughly right. Do you think it maybe should be different or configurable?

Removed else, thanks.

Nice, thanks! I don't think the delay or frequency need to be configurable - I don't see users playing with such intricate knobs.

pkg/readiness/ready_tracker.go

maxsmythe

Thanks for writing this!

This mostly LGTM other than comments, but waiting for the exchange with shomron@ to finish before I hit the approval button.

pkg/readiness/ready_tracker.go

pkg/readiness/ready_tracker_test.go

shomron · 2020-07-27T18:45:45Z

Hey @brycecr, so circling back - my concerns are around not cancelling subordinate trackers.
If you look at Tracker.CancelTemplate- when a parent template is cancelled it takes the additional actions of cancelling the corresponding trackConstraints goroutine (if any) as well as removing the entire corresponding constraint objectTracker.

I think in the current implementation this is not happening, and we could potentially leave behind constraint expectations while the constraintTemplate controller might never see the parent to cancel it.
Perhaps we could pass a cleanup function to collectForObjectTracker to allow custom cancellation logic in the final loop instead of only supporting ot.CancelExpect(u)?

What do you think?

brycecr

Thank you for the reviews!

With respect to the cleanup, it seems like a good idea and cleaner to clean these up where possible. On the constraintTrackers, though, I'm not sure how they would be affected by the cancellation here as currently written, because we don't "collect" from trackers that are not populated, and constraintTracker goroutines exit as soon as they mark themselves populated. Am I reading that right?

W.r.t. removing the tracker entry from the trackerMap, that seems like good cleanup. Just out of curiosity, does the tracker stay in memory after satisfaction because it's still reachable from the root manager? And that would be why trackers still in the tracker map aren't collected?

A final small point -- it looks like CancelTemplate (which is the only place I see constraintTrackers cancelled or t.constraints.Remove called) is only called if the constraint is later deleted or fails to compile. In other words, is it true that we should be calling t.contraints.Remove if we have Observed the template, but we aren't doing that currently?

pkg/readiness/ready_tracker.go

brycecr · 2020-07-28T01:47:19Z

pkg/readiness/ready_tracker.go

+	if es == nil {
+		return fmt.Errorf("nil Expectations provided to collectForObjectTracker")
+	} else if !es.Populated() || es.Satisfied() {
+		log.Info("Expectations unpopulated or already satisfied, skipping collection")


If a tracker is satisfied, but not the overall readiness, then yes you'd get this message every tick. It seems a little spammy. I moved to debug -- does that seem reasonable?

BTW the timing was just something I chose that seemed roughly right. Do you think it maybe should be different or configurable?

Removed else, thanks.

pkg/readiness/ready_tracker.go

pkg/readiness/ready_tracker_test.go

pkg/readiness/ready_tracker.go

shomron · 2020-07-28T15:50:39Z

Thank you for the reviews!

With respect to the cleanup, it seems like a good idea and cleaner to clean these up where possible. On the constraintTrackers, though, I'm not sure how they would be affected by the cancellation here as currently written, because we don't "collect" from trackers that are not populated, and constraintTracker goroutines exit as soon as they mark themselves populated. Am I reading that right?

We're using slightly different terminology, but I'll try my best to answer. I think you are correct - overall readiness will not be affected by leaving behind unpopulated constraint trackers. This is regardless of whether those child trackers are fully populated - the Tracker.Satisfied() check filters on parent keys before considering the child trackers.

That said, if timing is just right, I believe it's possible for a trackConstraints routine to never complete - if their CRD is unregistered the retryLister will fail and continue its retry/backoff loop unless explicitly cancelled.

W.r.t. removing the tracker entry from the trackerMap, that seems like good cleanup. Just out of curiosity, does the tracker stay in memory after satisfaction because it's still reachable from the root manager? And that would be why trackers still in the tracker map aren't collected?

Not sure I understood this part - the trackers generally remain in their trackerMap unless explicitly cancelled/removed, even after they are satisfied. They do internally free up some memory, but the actual objectTracker instance remains.

A final small point -- it looks like CancelTemplate (which is the only place I see constraintTrackers cancelled or t.constraints.Remove called) is only called if the constraint is later deleted or fails to compile. In other words, is it true that we should be calling t.contraints.Remove if we have Observed the template, but we aren't doing that currently?

Hmm, I don't think so. Observing a template doesn't mean you have observed its corresponding constraint instances.

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

(misspellings, unneceesary guard around delete) Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

There's no real need to check the number of resources tracked, only that we are tracking and deleting something. This makes the test less sensitive to changes in the testdata and leaked state from different tests. Also fix a couple log.Error calls Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

brycecr · 2020-07-28T22:07:25Z

Thanks, Oren. I added a cleanup parameter to collectForObjectTracker. LMK if that makes sense to you as written.

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

shomron · 2020-07-29T13:57:53Z

Awesome, LGTM. Thanks for working on this @brycecr !

maxsmythe

LGTM

brycecr force-pushed the blindness-patch branch from 45d8b36 to 7124cf6 Compare July 24, 2020 20:51

brycecr force-pushed the blindness-patch branch 2 times, most recently from e85e64e to 31785b7 Compare July 24, 2020 22:44

brycecr changed the title ~~Readiness tracker invalid~~ Readiness tracker collect invalid expects Jul 24, 2020

shomron reviewed Jul 24, 2020

View reviewed changes

maxsmythe reviewed Jul 25, 2020

View reviewed changes

pkg/readiness/ready_tracker.go Show resolved Hide resolved

pkg/readiness/ready_tracker_test.go Outdated Show resolved Hide resolved

brycecr commented Jul 28, 2020

View reviewed changes

brycecr force-pushed the blindness-patch branch from 93842d9 to ff8d084 Compare July 28, 2020 06:18

brycecr added 6 commits July 28, 2020 17:09

Collect Expectations for deleted obects in readiness tracker

d979421

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

Add test for collecting deleted objects in readiness tracker

b2daaa2

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

Fix linting errors

d3e0d95

(misspellings, unneceesary guard around delete) Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

address nits, readability, and restructure initial go func

581929e

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

add retry backoff wrapper for lister

03d4cbf

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

brycecr force-pushed the blindness-patch branch 2 times, most recently from f97001f to f4c19ce Compare July 28, 2020 17:11

brycecr force-pushed the blindness-patch branch from 34a04c1 to b7da74c Compare July 28, 2020 22:07

brycecr added 3 commits July 28, 2020 22:08

go mod tidy

99c0368

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

change retry to retryLister

3036da3

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

add cleanup func param to collectForObjectTracker

4de5072

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

brycecr force-pushed the blindness-patch branch from b7da74c to 4de5072 Compare July 28, 2020 22:08

brycecr added 2 commits July 28, 2020 22:49

rerun flaky tests

34a4285

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

rerun flaky tests

2c3ef6c

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>

maxsmythe approved these changes Jul 29, 2020

View reviewed changes

maxsmythe merged commit 0afb7e5 into open-policy-agent:master Jul 29, 2020

brycecr deleted the blindness-patch branch July 30, 2020 01:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readiness tracker collect invalid expects #750

Readiness tracker collect invalid expects #750

brycecr commented Jul 24, 2020

brycecr commented Jul 24, 2020

shomron left a comment

shomron Jul 24, 2020

brycecr Jul 28, 2020

shomron Jul 28, 2020

maxsmythe left a comment

shomron commented Jul 27, 2020

brycecr left a comment

brycecr Jul 28, 2020

shomron commented Jul 28, 2020

brycecr commented Jul 28, 2020

shomron commented Jul 29, 2020

maxsmythe left a comment

Readiness tracker collect invalid expects #750

Readiness tracker collect invalid expects #750

Conversation

brycecr commented Jul 24, 2020

brycecr commented Jul 24, 2020

shomron left a comment

Choose a reason for hiding this comment

shomron Jul 24, 2020

Choose a reason for hiding this comment

brycecr Jul 28, 2020

Choose a reason for hiding this comment

shomron Jul 28, 2020

Choose a reason for hiding this comment

maxsmythe left a comment

Choose a reason for hiding this comment

shomron commented Jul 27, 2020

brycecr left a comment

Choose a reason for hiding this comment

brycecr Jul 28, 2020

Choose a reason for hiding this comment

shomron commented Jul 28, 2020

brycecr commented Jul 28, 2020

shomron commented Jul 29, 2020

maxsmythe left a comment

Choose a reason for hiding this comment