-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of the new v2 garbage collector #8621
Conversation
alpha note added update checksum fix comments checksum and validation Update config/core/configmaps/gc.yaml Co-authored-by: Victor Agababov <vagababov@gmail.com> fix comments fix unit test fix both check include unit test make new settings for v2 v2 gc settings fix defaults default values fix unit test. expand comment fix validation error Widen collector test timeouts and some cleansing. (knative#8607) Make sorting consistent in GC test. (knative#8606) One of the tests ("keep recent lastPinned") failed fairly often because it used two revisions with the same LastPinned timestamp. Sorting a slice is unstable so in a lot of occasions, the given order was flipped and revision 5555 was tried to be deleted, even though it shouldn't even have been considered to be deleted. Note: sort.SliceStable doesn't help because the incoming list's order is already non-consistent. [master] Auto-update dependencies (knative#8611) Produced via: `./hack/update-deps.sh --upgrade && ./hack/update-codegen.sh` /assign dprotaso vagababov /cc dprotaso vagababov Add route consistency retry to new logging E2E test. (knative#8324) Handle InitialScale != 1 when creating multiScaler and some tidy-ups (knative#8612) Now InitialScale has landed, take it into account when calculating initial EBC in multiscaler. Also: - Tidy up some comments. - Fix up some test error messages. - Move mutexes above the things they're guarding. - Use %d and %0.3f for printing numbers, consistently with existing Debug statement at start of same method. Enable HA by default (knative#8602) * Enable HA by default * Centralize the bucket/replica information for running the HA testing. Add a test to ensure the count of the controller's reconciler stays in sync with the actual controller binary. Remove endpoints from KPA as well. (knative#8616) * Remove endpoints from KPA as well. Today I noticed that I missed the endpoints code in the KPA itself, when removing that informer. So now really get rid of those. The tests are of course nightmarish :) * remove old cruft Assorted fixes to enable chaos duck. (knative#8619) * Assorted fixes to enable chaos duck. This lands a handful of fixes that I uncovered preparing to run controlplane chaos testing during our e2e tests. * Drop sleep to 10s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whaught: 0 warnings.
In response to this:
Fixes #
Proposed Changes
Depends on
Need to move: Preserve Annotation to Prevent GC #8614 into isRevisionActive
Then fix unit tests
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
pkg/reconciler/gc/v2/gc.go
Outdated
continue | ||
} | ||
|
||
if err := client.ServingV1().Revisions(rev.Namespace).Delete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could use deletecollection 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement
That would be neat for batch delete (in practice this isn't likely to actually delete more than one since it runs on every deploy), but I think you'd need to do a set selector on metadata.name
.
Are set selectors supported for field selectors (rather than labels)? There's no mention of it here: https://kubernetes.io/docs/concepts/overview/working-with-objects/field-selectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably we can set a label for things we want to collect and collect them, but probably not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
} | ||
|
||
if stale > min { | ||
logger.Info("Deleting stale revision: ", rev.ObjectMeta.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should tweak the name of MinNonActiveRevisions
and MaxNonActiveRevisions
config options for more clarity
Lines 58 to 63 in d86ab4a
// Minimum number of stale revisions to keep before considering for GC. | |
MinNonActiveRevisions int64 | |
// Maximum number of non-active revisions to keep before considering for GC. | |
// regardless of creation or staleness time-bounds. | |
// Set Disabled (-1) to disable/ignore max. | |
MaxNonActiveRevisions int64 |
Since depending on how the revisions are processes you'd get different gc results
For example (with min 20 & max 1000):
A list composed of 20 stale revisions and then 100 non-active would cause all non-active revisions to be deleted
compared to
A list composed of 100 non-active revisions and then 20 stale revisions would delete nothing
Unsure what we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've documented that better here:
serving/config/core/configmaps/gc.yaml
Line 59 in da8105f
# --------------------------------------- |
and need to carry that across to the golang comment. Note that all stale revisions are also non-active. In your example I think we'd expect 20 to be deleted and the remaining to stick around until they also go stale. However if you disable the time bounds for staleness, we keep growing the revision list to the max. That allows you to either have a hard-count of how many we keep around or a less deterministic time-based approach that can still have a ceiling. Or disable everything!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood what you were saying. The former scenario would be unexpected because the revisions are first sorted by their last active time, but we do want the latter scenario to delete the 20 stale revisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code to consider the latter scenario regardless of the sort assumption. I think it's a little more readable too.
The following is the coverage report on the affected files.
|
The following jobs failed:
Automatically retrying due to test flakiness... |
/lgtm I think there's an improvement to be made to reduce the memory allocations but doesn't need to block this PR |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso, whaught The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Issue #8208
Proposed Changes
Release Note