Implementation of the new v2 garbage collector #8621

whaught · 2020-07-14T04:00:47Z

Proposed Changes

New GC algorithm
- Takes into account a maximum limit of revisions
- Allows time-based GC to be disabled

Release Note

Implemented new garbage collector that allows for either time-based
or min/max count bounds for automatic deletion of old revisions.

alpha note added update checksum fix comments checksum and validation Update config/core/configmaps/gc.yaml Co-authored-by: Victor Agababov <vagababov@gmail.com> fix comments fix unit test fix both check include unit test make new settings for v2 v2 gc settings fix defaults default values fix unit test. expand comment fix validation error Widen collector test timeouts and some cleansing. (knative#8607) Make sorting consistent in GC test. (knative#8606) One of the tests ("keep recent lastPinned") failed fairly often because it used two revisions with the same LastPinned timestamp. Sorting a slice is unstable so in a lot of occasions, the given order was flipped and revision 5555 was tried to be deleted, even though it shouldn't even have been considered to be deleted. Note: sort.SliceStable doesn't help because the incoming list's order is already non-consistent. [master] Auto-update dependencies (knative#8611) Produced via: `./hack/update-deps.sh --upgrade && ./hack/update-codegen.sh` /assign dprotaso vagababov /cc dprotaso vagababov Add route consistency retry to new logging E2E test. (knative#8324) Handle InitialScale != 1 when creating multiScaler and some tidy-ups (knative#8612) Now InitialScale has landed, take it into account when calculating initial EBC in multiscaler. Also: - Tidy up some comments. - Fix up some test error messages. - Move mutexes above the things they're guarding. - Use %d and %0.3f for printing numbers, consistently with existing Debug statement at start of same method. Enable HA by default (knative#8602) * Enable HA by default * Centralize the bucket/replica information for running the HA testing. Add a test to ensure the count of the controller's reconciler stays in sync with the actual controller binary. Remove endpoints from KPA as well. (knative#8616) * Remove endpoints from KPA as well. Today I noticed that I missed the endpoints code in the KPA itself, when removing that informer. So now really get rid of those. The tests are of course nightmarish :) * remove old cruft Assorted fixes to enable chaos duck. (knative#8619) * Assorted fixes to enable chaos duck. This lands a handful of fixes that I uncovered preparing to run controlplane chaos testing during our e2e tests. * Drop sleep to 10s

knative-prow-robot

@whaught: 0 warnings.

In response to this:

Fixes #

Proposed Changes

Depends on

Populate the routingState label and modified annotation in the Labeler #8604

Create v2 GC settings #8615

Need to move: Preserve Annotation to Prevent GC #8614 into isRevisionActive

Then fix unit tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pkg/reconciler/gc/v2/gc.go

vagababov · 2020-07-19T22:27:50Z

pkg/reconciler/gc/v2/gc.go

-				continue
-			}
+
+		if err := client.ServingV1().Revisions(rev.Namespace).Delete(


It would be nice if we could use deletecollection 😢

https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement

That would be neat for batch delete (in practice this isn't likely to actually delete more than one since it runs on every deploy), but I think you'd need to do a set selector on metadata.name.

Are set selectors supported for field selectors (rather than labels)? There's no mention of it here: https://kubernetes.io/docs/concepts/overview/working-with-objects/field-selectors

probably we can set a label for things we want to collect and collect them, but probably not worth it.

pkg/reconciler/gc/v2/gc.go

vagababov

/lgtm

dprotaso · 2020-07-20T18:19:15Z

pkg/reconciler/gc/v2/gc.go

+		}
+
+		if stale > min {
+			logger.Info("Deleting stale revision: ", rev.ObjectMeta.Name)


I wonder if we should tweak the name of MinNonActiveRevisions and MaxNonActiveRevisions config options for more clarity

serving/pkg/gc/config.go

Lines 58 to 63 in d86ab4a

// Minimum number of stale revisions to keep before considering for GC.

MinNonActiveRevisions int64

// Maximum number of non-active revisions to keep before considering for GC.

// regardless of creation or staleness time-bounds.

// Set Disabled (-1) to disable/ignore max.

MaxNonActiveRevisions int64

Since depending on how the revisions are processes you'd get different gc results

For example (with min 20 & max 1000):
A list composed of 20 stale revisions and then 100 non-active would cause all non-active revisions to be deleted
compared to
A list composed of 100 non-active revisions and then 20 stale revisions would delete nothing

Unsure what we want

I think I've documented that better here:

serving/config/core/configmaps/gc.yaml

Line 59 in da8105f

# ---------------------------------------

and need to carry that across to the golang comment. Note that all stale revisions are also non-active. In your example I think we'd expect 20 to be deleted and the remaining to stick around until they also go stale. However if you disable the time bounds for staleness, we keep growing the revision list to the max. That allows you to either have a hard-count of how many we keep around or a less deterministic time-based approach that can still have a ceiling. Or disable everything!

I think I misunderstood what you were saying. The former scenario would be unexpected because the revisions are first sorted by their last active time, but we do want the latter scenario to delete the 20 stale revisions.

I've updated the code to consider the latter scenario regardless of the sort assumption. I think it's a little more readable too.

knative-metrics-robot · 2020-07-21T06:52:36Z

The following is the coverage report on the affected files.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/gc/v2/gc.go	97.6%	98.2%	0.7

knative-test-reporter-robot · 2020-07-21T07:43:05Z

The following jobs failed:

Test name	Triggers	Retries
pull-knative-serving-integration-tests	2020-07-21 07:43:00.364 +0000 UTC	1/3

Automatically retrying due to test flakiness...
/test pull-knative-serving-integration-tests

dprotaso · 2020-07-21T15:13:53Z

/lgtm
/approve

I think there's an improvement to be made to reduce the memory allocations but doesn't need to block this PR

knative-prow-robot · 2020-07-21T15:14:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, whaught

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/gc/OWNERS~~ [dprotaso]
~~pkg/reconciler/OWNERS~~ [dprotaso]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

whaught added 5 commits July 13, 2020 11:18

Create a maximum revision GC setting

6cd4f07

Use the new GC settings

c6fc366

Merge remote-tracking branch 'upstream/master' into use-gc-settings

d09b112

actually use the new settings

0a65256

knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 14, 2020

googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jul 14, 2020

knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 14, 2020

knative-prow-robot reviewed Jul 14, 2020

View reviewed changes

knative-prow-robot requested review from markusthoemmes, tcnghia and vagababov July 14, 2020 04:01

knative-prow-robot added the area/API API objects and controllers label Jul 14, 2020

whaught added 2 commits July 13, 2020 21:25

simplify

aab20cf

now it compiles

2a7afbb

whaught mentioned this pull request Jul 14, 2020

Preserve Annotation to Prevent GC #8614

Merged

whaught added 4 commits July 14, 2020 13:52

Merge remote-tracking branch 'upstream/master' into use-gc-settings

3aebf24

fix config references

16d6668

test with min settings

562512a

max tests

639a4fc

knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2020

Merge remote-tracking branch 'upstream/master' into use-gc-settings

b2969c6

knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2020

whaught added 3 commits July 14, 2020 20:59

change name of max to max-non-active

668dd00

comments and validate positive

80d7b90

parse forever constant

ab2ea1c

knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 15, 2020

whaught added 2 commits July 14, 2020 23:33

fixing the unit test, better examples

6066241

include disabled setinel

e3343a2

whaught added 6 commits July 17, 2020 10:26

nits

65aad63

fix unit test names

de048b5

more consistent name

d7297ec

fix max boundary

e994502

nit

b1d34df

remove ref to lastpinned

5f1fc4c

knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 19, 2020

vagababov reviewed Jul 19, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into use-gc-settings

220ca35

knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2020

whaught added 2 commits July 19, 2020 21:54

review suggestions

0085ef8

fix sign

ec96b4b

vagababov reviewed Jul 20, 2020

View reviewed changes

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2020

dprotaso reviewed Jul 20, 2020

View reviewed changes

use nonactive as min

4331669

knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2020

whaught added 5 commits July 20, 2020 20:03

first filter out active

f29b422

comment nit on config

f0264d7

swap if/else ordering

5a7c70b

simplify

345e7ec

whitespace

0204045

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2020

knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2020

knative-prow-robot merged commit c1a8eab into knative:master Jul 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of the new v2 garbage collector #8621

Implementation of the new v2 garbage collector #8621

whaught commented Jul 14, 2020 •

edited

Loading

knative-prow-robot left a comment

vagababov Jul 19, 2020

whaught Jul 21, 2020

vagababov Jul 21, 2020

vagababov left a comment

dprotaso Jul 20, 2020

whaught Jul 20, 2020 •

edited

Loading

whaught Jul 21, 2020 •

edited

Loading

whaught Jul 21, 2020

knative-metrics-robot commented Jul 21, 2020

knative-test-reporter-robot commented Jul 21, 2020

dprotaso commented Jul 21, 2020

knative-prow-robot commented Jul 21, 2020

	// Minimum number of stale revisions to keep before considering for GC.
	MinNonActiveRevisions int64
	// Maximum number of non-active revisions to keep before considering for GC.
	// regardless of creation or staleness time-bounds.
	// Set Disabled (-1) to disable/ignore max.
	MaxNonActiveRevisions int64

Implementation of the new v2 garbage collector #8621

Implementation of the new v2 garbage collector #8621

Conversation

whaught commented Jul 14, 2020 • edited Loading

Proposed Changes

knative-prow-robot left a comment

Choose a reason for hiding this comment

Proposed Changes

vagababov Jul 19, 2020

Choose a reason for hiding this comment

whaught Jul 21, 2020

Choose a reason for hiding this comment

vagababov Jul 21, 2020

Choose a reason for hiding this comment

vagababov left a comment

Choose a reason for hiding this comment

dprotaso Jul 20, 2020

Choose a reason for hiding this comment

whaught Jul 20, 2020 • edited Loading

Choose a reason for hiding this comment

whaught Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

whaught Jul 21, 2020

Choose a reason for hiding this comment

knative-metrics-robot commented Jul 21, 2020

knative-test-reporter-robot commented Jul 21, 2020

dprotaso commented Jul 21, 2020

knative-prow-robot commented Jul 21, 2020

whaught commented Jul 14, 2020 •

edited

Loading

whaught Jul 20, 2020 •

edited

Loading

whaught Jul 21, 2020 •

edited

Loading