Logarithmic timestamp comparison for downscaling #99212

damemi · 2021-02-18T20:12:46Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Implements logarithmically-compared timestamps for replica scale downs, from #96898:

Compares ready and creation timestamps in a logarithmic scale. This allows for some level of randomness when Pods are quick-sorted to get downscaling candidates.

Used base 2. This means that (roughly) if a Pod A has been created/running for less than half the time of Pod B, then Pod A will be downscaled first. But if Pod A has been created/running for more than half the time of Pod B, they can be equally downscaled.

Which issue(s) this PR fixes:

Ref kubernetes/enhancements#2185 and #96748

Special notes for your reviewer:

This is a proposal that has very low overhead compared to #96748. Since behavior is not backwards compatible, we could release with a FeatureGate first. Then, from feedback, we can adjust the logarithmic base or, if we find out that the behavior might not be desired by everyone, we could make it a configuration option.

Does this PR introduce a user-facing change?

When downscaling ReplicaSets, ready and creation timestamps are compared in a logarithmic scale.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

damemi

/cc @alculquicondor

damemi · 2021-02-18T20:53:58Z

/retest

alculquicondor · 2021-02-18T23:11:10Z

You forgot to add the KEP to the description :)

alculquicondor

We are missing the feature gate

alculquicondor · 2021-02-18T23:13:59Z

pkg/controller/controller_utils.go

@@ -808,6 +809,9 @@ func (s ActivePods) Less(i, j int) bool {
 // 7. If the pods' creation times differ, the pod that was created more recently
 //    comes before the older pod.
 //
+// In 5 and 7, times are compared in a logarithmic scale. This allows a level
+// of randomness among equivalent Pods when sorting.
+//
 // If none of these rules matches, the second pod comes before the first pod.


What about the UUID comparison that was suggested in the KEP?

alculquicondor · 2021-02-18T23:19:25Z

test/integration/replicationcontroller/replicationcontroller_test.go

@@ -494,6 +495,45 @@ func TestSpecReplicasChange(t *testing.T) {
 	}
 }

+func TestLogarithmicScaleDown(t *testing.T) {


This test is ensuring that the existing behavior somewhat still holds. That is good. We should run it with the feature gate enabled and disabled (I would remove the Logarithmic part from the test name)

Another test we could have is rather opposite: have all the pods run at roughly the same time, downscale, and see if some level of randomness holds. Do you think something like this is possible? Maybe we can run the same test X times and ensure that at least in one of them, the removed pod was not the absolute youngest.

In the test plan we also suggested emulating the story https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/2185-random-pod-select-on-replicaset-downscale#test-plan
We can do that in a follow up.

have all the pods run at roughly the same time, downscale, and see if some level of randomness holds ... Maybe we can run the same test X times

There is an inherent problem with randomness in that we can never be certain it will be the randomness we expect :) Something like this will introduce flakes, which can be minimized by increasing the number of test runs, but still always possible. Are there examples of other tests which take a similar approach?

What we really wish to show from such a test is that pods created on a similar logarithmic time scale have an equal chance of being chosen for downscaling. In other words, that their ranks are calculated properly. I believe the unit tests cover this level of detail sufficiently.

The integration test is simply showing that the basic logic still works when removed from the vacuum of unit tests. I think that the user story you linked will make a good e2e, and I could actually add that to this PR (I don't see any reason to wait, and that will flesh out test cases here more comfortably). Wdyt?

I believe the unit tests cover this level of detail sufficiently.

I agree, but thanks for giving it a thought.

Still, modify the test to run for feature gate enabled and disabled.

Fine with me to add the 2e2 test for the user story in this PR.

damemi · 2021-03-03T17:18:45Z

@alculquicondor I added the UID sorting right before the call to actually sort the pods in 5ffacd0a98bb855bcbe067b2560a6f56c8775254. Let me know if that looks good, it should at least give a pseudo-random start before pods of similar ranks are sorted.

I am still working on how to add the e2e. I'm thinking these steps:

Create X pods
Wait 30 seconds (this should be enough that those pods will have a rank of 4)
Create Y more pods
Wait another 30 seconds (now the first pods should be ~60s old w/rank 6, and the new pods have rank 4)
Downscale by Y pods
Confirm that the remaining pods are all from the original group X

My goal is to create 2 groups of pods, spaced far enough apart that they will have different ranks. However I add the 30s waits to reduce flakes caused by latency (in other words, the difference between a base-2 rank 2 and rank 3 is only 4 seconds, but rank 5 and 6 is 32 seconds).

What do you think of this approach? I want to avoid adding too much "scheduling" dependency into the test (ie, assuming that pods will be evenly distributed among nodes) because that introduces flakes unless the nodes are evenly balanced and I think it's irrelevant to this anyway.

alculquicondor · 2021-03-03T17:37:10Z

pkg/controller/replicaset/replica_set.go

@@ -802,6 +802,10 @@ func getPodsToDelete(filteredPods, relatedPods []*v1.Pod, diff int) []*v1.Pod {
 	// diff will always be <= len(filteredPods), so not need to handle > case.
 	if diff < len(filteredPods) {
 		podsWithRanks := getPodsRankedByRelatedPodsOnSameNode(filteredPods, relatedPods)
+		// First obtain a pseudo-random ordering by sorting by pod UID
+		sort.Slice(podsWithRanks.Pods, func(i, j int) bool {


Why not just add the comparison in the existing Less? I feel like that should perform better.

Should that come before or after the timestamp comparison?

it's a comparison criteria with the lowest priority, so after.

in that case I should also update logarithmicOrAfterZero to check if the 2 ranks are equal (and continue), rather than just returning... or I can add the UID check right into logarithmicOrAfterZero. Wdyt?

probably better to be explicit about UIDs by having that comparison outside of timestamp comparison.

alculquicondor · 2021-03-03T17:41:06Z

Re: E2E

I think we should do what was suggested in the KEP: simulate the scenario from the story and make sure spreading is good, with some allowance for skew.

alculquicondor

good for squash

pkg/controller/controller_utils.go

alculquicondor · 2021-03-05T15:47:40Z

/assign @kow3ns

soltysh

/approve
Some minor nits.

soltysh · 2021-03-05T18:33:20Z

pkg/controller/controller_utils.go

@@ -807,6 +808,10 @@ func (s ActivePods) Less(i, j int) bool {
 // 8. If the pods' creation times differ, the pod that was created more recently
 //    comes before the older pod.
 //
+// In 5 and 8, times are compared in a logarithmic scale. This allows a level


Suggested change

// In 5 and 8, times are compared in a logarithmic scale. This allows a level

// In 6 and 8, times are compared in a logarithmic scale. This allows a level

soltysh · 2021-03-05T18:34:34Z

pkg/controller/controller_utils.go

+		if rankDiff > 0 {
+			return false
+		}
+		return s.Pods[i].UID < s.Pods[j].UID


The logic is identical for both pieces, how about making this a function rather than repeating the code?

The idea is to keep the UID comparison in this method, for visibility. But a shorter alternative would be

diff := logarithmicRankDiff(s.Pods[i].CreationTimestamp, s.Pods[j].CreationTimestamp, s.Now) if diff == 0 { return s.Pods[i].UID < s.Pods[j].UID } return diff < 0

Yeah, I did it my way to keep the UID comparison physically last since it has the lowest priority. But if this is cleaner, we can do that too

soltysh · 2021-03-05T18:36:52Z

pkg/controller/controller_utils.go

+		r2 = -1
+	} else {
+		r2 = int64(math.Log2(float64(d2)))
+	}


r1 := int64(-1) r2 := int64(-1) if d1 > 0 { r1 = int64(math.Log2(float64(d1))) } if d2 > 0 { r2 = int64(math.Log2(float64(d2))) }

Seems simpler, no?

damemi · 2021-03-05T20:29:54Z

Thanks for the review @soltysh. I pushed and rebased with the feedback, and also added the feature gate @alculquicondor pointed out from #99212 (review). Please let me know if there is anything else I need to do

alculquicondor

You need to enable the feature gate for the tests.

I had long forgotten about the gate 😂

Change-Id: I0657ea0ce41b98fdee1a5307b5826a10deaff98c

soltysh

/lgtm
/triage accepted
/priority backlog

k8s-ci-robot · 2021-03-05T21:25:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damemi, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/controller/OWNERS~~ [soltysh]
~~pkg/features/OWNERS~~ [soltysh]
~~test/OWNERS~~ [soltysh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2021-03-06T00:34:17Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2021-03-06T14:34:10Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

k8s-ci-robot requested review from cheftako and dims February 18, 2021 20:13

damemi mentioned this pull request Feb 18, 2021

Random Pod selection on ReplicaSet downscaling kubernetes/enhancements#2185

Open

8 tasks

damemi commented Feb 18, 2021

View reviewed changes

k8s-ci-robot requested a review from alculquicondor February 18, 2021 20:15

damemi force-pushed the alculquicondor-log-timestamp branch from d70ae93 to 088c236 Compare February 18, 2021 20:28

alculquicondor mentioned this pull request Feb 18, 2021

WIP Logarithmic timestamp comparison for dowscaling #96898

Closed

damemi force-pushed the alculquicondor-log-timestamp branch from 088c236 to d6ec6a6 Compare February 18, 2021 21:50

alculquicondor reviewed Feb 18, 2021

View reviewed changes

alculquicondor mentioned this pull request Feb 19, 2021

Indexed job implementation #98812

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 1, 2021

damemi force-pushed the alculquicondor-log-timestamp branch from d6ec6a6 to d99be17 Compare March 3, 2021 15:49

alculquicondor reviewed Mar 3, 2021

View reviewed changes

damemi force-pushed the alculquicondor-log-timestamp branch from c43188e to 469e24d Compare March 3, 2021 17:53

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2021

damemi force-pushed the alculquicondor-log-timestamp branch from 30808c2 to bb80f72 Compare March 4, 2021 20:53

alculquicondor reviewed Mar 4, 2021

View reviewed changes

pkg/controller/controller_utils.go Outdated Show resolved Hide resolved

pkg/controller/controller_utils.go Outdated Show resolved Hide resolved

damemi force-pushed the alculquicondor-log-timestamp branch from bb80f72 to d5054e2 Compare March 5, 2021 15:04

k8s-ci-robot assigned kow3ns Mar 5, 2021

soltysh approved these changes Mar 5, 2021

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 5, 2021

damemi force-pushed the alculquicondor-log-timestamp branch 2 times, most recently from 71c0029 to 15fc96f Compare March 5, 2021 20:29

alculquicondor reviewed Mar 5, 2021

View reviewed changes

Logarithmic timestamp comparison for ReplicSet downscaling

a8d105a

Change-Id: I0657ea0ce41b98fdee1a5307b5826a10deaff98c

damemi force-pushed the alculquicondor-log-timestamp branch from 15fc96f to a8d105a Compare March 5, 2021 20:58

soltysh approved these changes Mar 5, 2021

View reviewed changes

k8s-ci-robot assigned soltysh Mar 5, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 5, 2021

k8s-ci-robot merged commit bf448a1 into kubernetes:master Mar 6, 2021

k8s-ci-robot added this to the v1.21 milestone Mar 6, 2021

damemi mentioned this pull request Mar 10, 2021

Add description of replica controller scaledown sort logic kubernetes/website#26993

Merged

Huang-Wei mentioned this pull request Mar 24, 2021

Pod Topology Spread constraints should be taken into account on scale down #96748

Closed

damemi mentioned this pull request Mar 25, 2021

Rename RandomReplicaSetDownScale feature gate to LogarithmicScaleDown kubernetes/enhancements#2584

Merged

damemi mentioned this pull request May 6, 2021

Promote LogarithmicScaleDown to Beta #101767

Merged

alculquicondor mentioned this pull request Aug 30, 2021

Add alculquicondor to sig-apps-reviewers #104663

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logarithmic timestamp comparison for downscaling #99212

Logarithmic timestamp comparison for downscaling #99212

damemi commented Feb 18, 2021

damemi left a comment

damemi commented Feb 18, 2021

alculquicondor commented Feb 18, 2021

alculquicondor left a comment

alculquicondor Feb 18, 2021

alculquicondor Feb 18, 2021

damemi Feb 19, 2021

alculquicondor Feb 19, 2021

damemi commented Mar 3, 2021

alculquicondor Mar 3, 2021

damemi Mar 3, 2021

alculquicondor Mar 3, 2021

damemi Mar 3, 2021

alculquicondor Mar 3, 2021

alculquicondor commented Mar 3, 2021

alculquicondor left a comment

alculquicondor commented Mar 5, 2021

soltysh left a comment

soltysh Mar 5, 2021

soltysh Mar 5, 2021

alculquicondor Mar 5, 2021

damemi Mar 5, 2021

soltysh Mar 5, 2021

alculquicondor Mar 5, 2021

damemi commented Mar 5, 2021

alculquicondor left a comment

soltysh left a comment

k8s-ci-robot commented Mar 5, 2021

fejta-bot commented Mar 6, 2021

fejta-bot commented Mar 6, 2021

	// In 5 and 8, times are compared in a logarithmic scale. This allows a level
	// In 6 and 8, times are compared in a logarithmic scale. This allows a level

Logarithmic timestamp comparison for downscaling #99212

Logarithmic timestamp comparison for downscaling #99212

Conversation

damemi commented Feb 18, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

damemi left a comment

Choose a reason for hiding this comment

damemi commented Feb 18, 2021

alculquicondor commented Feb 18, 2021

alculquicondor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi commented Mar 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Mar 3, 2021

alculquicondor left a comment

Choose a reason for hiding this comment

alculquicondor commented Mar 5, 2021

soltysh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi commented Mar 5, 2021

alculquicondor left a comment

Choose a reason for hiding this comment

soltysh left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 5, 2021

fejta-bot commented Mar 6, 2021

fejta-bot commented Mar 6, 2021