Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness tracker collect invalid expects #750

Merged
merged 11 commits into from Jul 29, 2020

Conversation

brycecr
Copy link
Contributor

@brycecr brycecr commented Jul 24, 2020

What this PR does / why we need it:
Adds a goroutine that polls for objects the readiness tracker expects, but have been deleted. Without this, there is a race where an object can be expected by the readiness tracker but has been deleted, and the controller that watches for the resource missed the delete because it is using a cached client instead of the readiness tracker's uncached client.

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):
Fixes #686

Special notes for your reviewer:

@brycecr
Copy link
Contributor Author

brycecr commented Jul 24, 2020

@shomron @maxsmythe don't believe I'm permitted to add you as reviewers, but PTAL at the PR when you have the time! Thank you!

@brycecr brycecr force-pushed the blindness-patch branch 2 times, most recently from e85e64e to 31785b7 Compare July 24, 2020 22:44
@brycecr brycecr changed the title Readiness tracker invalid Readiness tracker collect invalid expects Jul 24, 2020
Copy link
Contributor

@shomron shomron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this @brycecr! This is looking like a big improvement!
I've submitted early comments, there's a few additional cases I want to ponder (relationship between a cancelled CT and expected constraints) but it might take until Monday.

pkg/readiness/ready_tracker.go Show resolved Hide resolved
if es == nil {
return fmt.Errorf("nil Expectations provided to collectForObjectTracker")
} else if !es.Populated() || es.Satisfied() {
log.Info("Expectations unpopulated or already satisfied, skipping collection")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might get this log every 500ms?

Also nit: readability can be improved somewhat by removing the else. ref: https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a tracker is satisfied, but not the overall readiness, then yes you'd get this message every tick. It seems a little spammy. I moved to debug -- does that seem reasonable?

BTW the timing was just something I chose that seemed roughly right. Do you think it maybe should be different or configurable?

Removed else, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! I don't think the delay or frequency need to be configurable - I don't see users playing with such intricate knobs.

pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
pkg/readiness/ready_tracker.go Show resolved Hide resolved
pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this!

This mostly LGTM other than comments, but waiting for the exchange with shomron@ to finish before I hit the approval button.

pkg/readiness/ready_tracker.go Show resolved Hide resolved
pkg/readiness/ready_tracker_test.go Outdated Show resolved Hide resolved
@shomron
Copy link
Contributor

shomron commented Jul 27, 2020

Hey @brycecr, so circling back - my concerns are around not cancelling subordinate trackers.
If you look at Tracker.CancelTemplate- when a parent template is cancelled it takes the additional actions of cancelling the corresponding trackConstraints goroutine (if any) as well as removing the entire corresponding constraint objectTracker.

I think in the current implementation this is not happening, and we could potentially leave behind constraint expectations while the constraintTemplate controller might never see the parent to cancel it.
Perhaps we could pass a cleanup function to collectForObjectTracker to allow custom cancellation logic in the final loop instead of only supporting ot.CancelExpect(u)?

What do you think?

Copy link
Contributor Author

@brycecr brycecr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reviews!

With respect to the cleanup, it seems like a good idea and cleaner to clean these up where possible. On the constraintTrackers, though, I'm not sure how they would be affected by the cancellation here as currently written, because we don't "collect" from trackers that are not populated, and constraintTracker goroutines exit as soon as they mark themselves populated. Am I reading that right?

W.r.t. removing the tracker entry from the trackerMap, that seems like good cleanup. Just out of curiosity, does the tracker stay in memory after satisfaction because it's still reachable from the root manager? And that would be why trackers still in the tracker map aren't collected?

A final small point -- it looks like CancelTemplate (which is the only place I see constraintTrackers cancelled or t.constraints.Remove called) is only called if the constraint is later deleted or fails to compile. In other words, is it true that we should be calling t.contraints.Remove if we have Observed the template, but we aren't doing that currently?

pkg/readiness/ready_tracker.go Show resolved Hide resolved
if es == nil {
return fmt.Errorf("nil Expectations provided to collectForObjectTracker")
} else if !es.Populated() || es.Satisfied() {
log.Info("Expectations unpopulated or already satisfied, skipping collection")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a tracker is satisfied, but not the overall readiness, then yes you'd get this message every tick. It seems a little spammy. I moved to debug -- does that seem reasonable?

BTW the timing was just something I chose that seemed roughly right. Do you think it maybe should be different or configurable?

Removed else, thanks.

pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
pkg/readiness/ready_tracker_test.go Outdated Show resolved Hide resolved
pkg/readiness/ready_tracker.go Show resolved Hide resolved
@shomron
Copy link
Contributor

shomron commented Jul 28, 2020

Thank you for the reviews!

With respect to the cleanup, it seems like a good idea and cleaner to clean these up where possible. On the constraintTrackers, though, I'm not sure how they would be affected by the cancellation here as currently written, because we don't "collect" from trackers that are not populated, and constraintTracker goroutines exit as soon as they mark themselves populated. Am I reading that right?

We're using slightly different terminology, but I'll try my best to answer. I think you are correct - overall readiness will not be affected by leaving behind unpopulated constraint trackers. This is regardless of whether those child trackers are fully populated - the Tracker.Satisfied() check filters on parent keys before considering the child trackers.

That said, if timing is just right, I believe it's possible for a trackConstraints routine to never complete - if their CRD is unregistered the retryLister will fail and continue its retry/backoff loop unless explicitly cancelled.

W.r.t. removing the tracker entry from the trackerMap, that seems like good cleanup. Just out of curiosity, does the tracker stay in memory after satisfaction because it's still reachable from the root manager? And that would be why trackers still in the tracker map aren't collected?

Not sure I understood this part - the trackers generally remain in their trackerMap unless explicitly cancelled/removed, even after they are satisfied. They do internally free up some memory, but the actual objectTracker instance remains.

A final small point -- it looks like CancelTemplate (which is the only place I see constraintTrackers cancelled or t.constraints.Remove called) is only called if the constraint is later deleted or fails to compile. In other words, is it true that we should be calling t.contraints.Remove if we have Observed the template, but we aren't doing that currently?

Hmm, I don't think so. Observing a template doesn't mean you have observed its corresponding constraint instances.

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
(misspellings, unneceesary guard around delete)

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
There's no real need to check the number of resources tracked,
only that we are tracking and deleting something. This makes the
test less sensitive to changes in the testdata and leaked state
from different tests.

Also fix a couple log.Error calls

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
@brycecr brycecr force-pushed the blindness-patch branch 2 times, most recently from f97001f to f4c19ce Compare July 28, 2020 17:11
@brycecr
Copy link
Contributor Author

brycecr commented Jul 28, 2020

Thanks, Oren. I added a cleanup parameter to collectForObjectTracker. LMK if that makes sense to you as written.

Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
Signed-off-by: Bryce Cronkite-Ratcliff <brycecr@gmail.com>
@shomron
Copy link
Contributor

shomron commented Jul 29, 2020

Awesome, LGTM. Thanks for working on this @brycecr !

Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maxsmythe maxsmythe merged commit 0afb7e5 into open-policy-agent:master Jul 29, 2020
@brycecr brycecr deleted the blindness-patch branch July 30, 2020 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resources deleted between readiness tracker expectations and controllers starting may remain unresolved
3 participants