P&F: Update and cleanup mutating work estimator #105930

wojtek-t · 2021-10-27T08:45:24Z

NONE

/kind cleanup
/priority important-soon
/sig api-machinery

/assign @MikeSpreitzer @tkashem

wojtek-t · 2021-10-27T11:58:29Z

/retest

fedebongio · 2021-10-28T20:07:08Z

/triage accepted

tkashem

looks good to me, left a few minor comments

tkashem · 2021-10-29T14:49:24Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

 	// TODO: As described in the KEP, we should take into account that not all
 	//   events are equal and try to estimate the cost of a single event based on
 	//   some historical data about size of events.
-	var finalSeats uint
-	var additionalLatency time.Duration
+	maxSeats := uint(math.Ceil(float64(watchCount) / watchesPerSeat))


nit: maxSeats and maximumSeats sound very similar, can we rename maxSeats?

Yeah. This is just an earlier calculation of finalSeats.

tkashem · 2021-10-29T14:58:17Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go


 	// TODO: Make this unconditional after we tune the algorithm better.
 	//   Technically, there is an overhead connected to processing an event after
 	//   the request finishes even if there is a small number of watches.
 	//   However, until we tune the estimation we want to stay on the safe side
 	//   an avoid introducing additional latency for almost every single request.
 	if watchCount >= watchesPerSeat {
-		finalSeats = uint(math.Ceil(float64(watchCount) / watchesPerSeat))
-		additionalLatency = eventAdditionalDuration


I actually preferred what you had before, this way we can avoid calling SeatsTimesDuration and DurationPerSeat when final seats do not exceed the maximum, or am i missing something here?

if watchCount >= watchesPerSeat { finalSeats = uint(math.Ceil(float64(watchCount) / watchesPerSeat)) additionalLatency = eventAdditionalDuration if finalSeats > maximumSeats { originalFinalSeatSeconds := SeatsTimesDuration(float64(finalSeats), additionalLatency) finalSeats = maximumSeats additionalLatency = originalFinalSeatSeconds.DurationPerSeat(float64(finalSeats)) }

I agree. And this avoids the poorly-named "maxSeats".

tkashem · 2021-10-29T15:06:17Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/width_test.go

-				additionalLatencyExpected: 5 * time.Millisecond,
-			},
-		*/
+		{


this is just the uncommented version, right?

yes - modulo one ase that was adjusted to test the capping - this one:
"request verb is create, watches registered, maximum is capped"

MikeSpreitzer

This is mainly good. A few minor issues noted inline.

MikeSpreitzer · 2021-10-29T18:55:47Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/seat_seconds.go

+
+// SeatSeconds is a measure of work, in units of seat-seconds, using a fixed-point representation.
+// `SeatSeconds(n)` represents `n/ssScale` seat-seconds.
+// The constants `ssScale` and `ssScaleDigits` are private to the implementation here,


BTW, I forgot to remove the mention of "ssScaleDigits" from this comment when that constant was removed in the original drafting.

MikeSpreitzer · 2021-10-29T19:00:24Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

@@ -24,27 +24,36 @@ import (
 	apirequest "k8s.io/apiserver/pkg/endpoints/request"
 )

+const (
+	watchesPerSeat          = 10.0
+	eventAdditionalDuration = 5 * time.Millisecond


I like "notification" better than "event", since we also have a different thing called "event" and "notification" is a pretty standard and more specific term for what we are talking about here.

The terminology that is used everywhere is "watch event", not "watch notification".
So we should be consistent with the terminology that is used across the project (even if we believe that notification would be a better one).

Thanks, I had forgotten about "watch event".

MikeSpreitzer · 2021-10-29T19:04:23Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+	// We simply estimate the total amount of final work for this request based
+	// on the above and rely on potential reshaping of the request if the
+	// concurrency limit for a given priority level will not allow to run
+	// request with that many seats.


This is no longer true.

Which one - in my opinion it is true:

we estimate the total amount of final work based on the this assumption

we still rely on potential reshaping if concurrency limit is smaller then final seats.

First, the preceding unchanged sentence is a bit off base. It says "we assume ... is infinitely parallelizable". Was that intended to say that the actual work is all launched at once? Or that we can freely reshape that work as much as we like?

I think a more accurate statement, starting with a revised version of the preceding thus-far-unchanged sentence, would be something like the following.

// As a starting point we assume that the actual work associated with the watchers happens
// in many waiting goroutines that are all resumed at once,
// each such goroutine taking 1/Nth of a seat for M milliseconds.
// We allow the accounting in APF of that work to be reshaped into
// another rectangle of equal area, for practical reasons.

Slightly changed the version (as the multiple gorotuines part is not an assumption - it's a fact).

MikeSpreitzer · 2021-10-29T19:07:29Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

 	// TODO: As described in the KEP, we should take into account that not all
 	//   events are equal and try to estimate the cost of a single event based on
 	//   some historical data about size of events.
-	var finalSeats uint
-	var additionalLatency time.Duration
+	maxSeats := uint(math.Ceil(float64(watchCount) / watchesPerSeat))


Yeah. This is just an earlier calculation of finalSeats.

MikeSpreitzer · 2021-10-29T19:07:34Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go


 	// TODO: Make this unconditional after we tune the algorithm better.
 	//   Technically, there is an overhead connected to processing an event after
 	//   the request finishes even if there is a small number of watches.
 	//   However, until we tune the estimation we want to stay on the safe side
 	//   an avoid introducing additional latency for almost every single request.
 	if watchCount >= watchesPerSeat {
-		finalSeats = uint(math.Ceil(float64(watchCount) / watchesPerSeat))
-		additionalLatency = eventAdditionalDuration


I agree. And this avoids the poorly-named "maxSeats".

MikeSpreitzer · 2021-10-29T19:11:06Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+
+		// While processing individual events is highly parallelizable,
+		// our design/implementation have a couple limitations that would make
+		// it highly inefficient:


I see two problems with the wording here.

"it" is supposed to refer to the most recent noun phrase. In the current text that is "a couple limitations", which is not what you mean.

The limitations in the current APF design/implementation do not make the processing ("sending") of notifications inefficient. APF does not change how notifications are sent. The issue here is with how APF manages requests to account for that notification sending.

MikeSpreitzer · 2021-10-29T19:14:22Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		//
+		// TODO: Confirm that the current cap of maximumSeats allow us to
+		//   achieve the above.
+		if finalSeats > maximumSeats {


The constant maximumSeats was introduced to cap the width coming from the LIST estimator. Perhaps we do not necessarily want the same cap for the mutation-WATCH estimator?

We may want to tune it, but for now we don't have any better constant so I'm reusing the existing one.

MikeSpreitzer · 2021-10-29T19:14:57Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		// 2) we are not changing the finalWork estimate - just potentially
+		//    making it much longer. This is fine as long as we will not
+		//    be able to dispatch so many requests that will effectively
+		//    completely overload kube-apiserver (and/or etcd).


IIRC there are other concerns as well. I recall network bandwidth being mentioned.

Yes - the "overloaded" is both in terms of cpu and network bandwidth.

I also reworded it.

MikeSpreitzer · 2021-10-29T19:17:37Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		// 1) we reduce the amount of seat-seconds that are "wasted" during
+		//    dispatching and executing initial phase of the request
+		// 2) we are not changing the finalWork estimate - just potentially
+		//    making it much longer. This is fine as long as we will not


It is not "fine". This is abandoning the correspondence between the way the actual work is scheduled --- all the notification work is launched at once --- and how APF is reserving server capacity for that work. The hope here is that the newly relaxed relationship is good enough.

wojtek-t

@MikeSpreitzer @tkashem - thanks for the review; I applied the comments - PTAL

wojtek-t · 2021-10-29T20:05:32Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+	// We simply estimate the total amount of final work for this request based
+	// on the above and rely on potential reshaping of the request if the
+	// concurrency limit for a given priority level will not allow to run
+	// request with that many seats.


Which one - in my opinion it is true:

we estimate the total amount of final work based on the this assumption

we still rely on potential reshaping if concurrency limit is smaller then final seats.

wojtek-t · 2021-10-29T20:06:35Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/width_test.go

-				additionalLatencyExpected: 5 * time.Millisecond,
-			},
-		*/
+		{


yes - modulo one ase that was adjusted to test the capping - this one:
"request verb is create, watches registered, maximum is capped"

wojtek-t · 2021-10-29T20:08:18Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

@@ -24,27 +24,36 @@ import (
 	apirequest "k8s.io/apiserver/pkg/endpoints/request"
 )

+const (
+	watchesPerSeat          = 10.0
+	eventAdditionalDuration = 5 * time.Millisecond


The terminology that is used everywhere is "watch event", not "watch notification".
So we should be consistent with the terminology that is used across the project (even if we believe that notification would be a better one).

wojtek-t · 2021-10-29T20:10:03Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

 	// TODO: As described in the KEP, we should take into account that not all
 	//   events are equal and try to estimate the cost of a single event based on
 	//   some historical data about size of events.
-	var finalSeats uint
-	var additionalLatency time.Duration
+	maxSeats := uint(math.Ceil(float64(watchCount) / watchesPerSeat))


wojtek-t · 2021-10-29T20:11:15Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go


 	// TODO: Make this unconditional after we tune the algorithm better.
 	//   Technically, there is an overhead connected to processing an event after
 	//   the request finishes even if there is a small number of watches.
 	//   However, until we tune the estimation we want to stay on the safe side
 	//   an avoid introducing additional latency for almost every single request.
 	if watchCount >= watchesPerSeat {
-		finalSeats = uint(math.Ceil(float64(watchCount) / watchesPerSeat))
-		additionalLatency = eventAdditionalDuration


wojtek-t · 2021-10-29T20:15:09Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		// 2) we are not changing the finalWork estimate - just potentially
+		//    making it much longer. This is fine as long as we will not
+		//    be able to dispatch so many requests that will effectively
+		//    completely overload kube-apiserver (and/or etcd).


Yes - the "overloaded" is both in terms of cpu and network bandwidth.

wojtek-t · 2021-10-29T20:18:17Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		// 1) we reduce the amount of seat-seconds that are "wasted" during
+		//    dispatching and executing initial phase of the request
+		// 2) we are not changing the finalWork estimate - just potentially
+		//    making it much longer. This is fine as long as we will not


wojtek-t · 2021-10-29T20:18:30Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		// 2) we are not changing the finalWork estimate - just potentially
+		//    making it much longer. This is fine as long as we will not
+		//    be able to dispatch so many requests that will effectively
+		//    completely overload kube-apiserver (and/or etcd).


I also reworded it.

wojtek-t · 2021-10-29T20:18:53Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		//
+		// TODO: Confirm that the current cap of maximumSeats allow us to
+		//   achieve the above.
+		if finalSeats > maximumSeats {


We may want to tune it, but for now we don't have any better constant so I'm reusing the existing one.

wojtek-t · 2021-10-29T20:20:42Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/seat_seconds.go

+
+// SeatSeconds is a measure of work, in units of seat-seconds, using a fixed-point representation.
+// `SeatSeconds(n)` represents `n/ssScale` seat-seconds.
+// The constants `ssScale` and `ssScaleDigits` are private to the implementation here,


MikeSpreitzer · 2021-10-29T20:47:39Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+	// We simply estimate the total amount of final work for this request based
+	// on the above and rely on potential reshaping of the request if the
+	// concurrency limit for a given priority level will not allow to run
+	// request with that many seats.


First, the preceding unchanged sentence is a bit off base. It says "we assume ... is infinitely parallelizable". Was that intended to say that the actual work is all launched at once? Or that we can freely reshape that work as much as we like?

I think a more accurate statement, starting with a revised version of the preceding thus-far-unchanged sentence, would be something like the following.

// As a starting point we assume that the actual work associated with the watchers happens
// in many waiting goroutines that are all resumed at once,
// each such goroutine taking 1/Nth of a seat for M milliseconds.
// We allow the accounting in APF of that work to be reshaped into
// another rectangle of equal area, for practical reasons.

MikeSpreitzer · 2021-10-29T20:52:04Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		// 2) we are not changing the finalWork estimate - just potentially
+		//    making it much longer. As long as the maximum seats setting


Rather than saying "potentially making it much longer", where "it" grammatically refers to "the finalWork estimate" (which is work not time), I would say something like "reshaping it to be narrower and longer".

wojtek-t

@MikeSpreitzer - thank you Mike for the review (I felt a bit as on the English lesson :) )

PTAL

wojtek-t · 2021-10-30T13:12:29Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		// 2) we are not changing the finalWork estimate - just potentially
+		//    making it much longer. As long as the maximum seats setting


wojtek-t · 2021-10-30T13:17:36Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+	// We simply estimate the total amount of final work for this request based
+	// on the above and rely on potential reshaping of the request if the
+	// concurrency limit for a given priority level will not allow to run
+	// request with that many seats.


Slightly changed the version (as the multiple gorotuines part is not an assumption - it's a fact).

wojtek-t · 2021-10-30T14:50:25Z

/retest

MikeSpreitzer

This looks good to me, just a couple of minor comments inline.

MikeSpreitzer · 2021-11-01T02:37:23Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

-		additionalLatency = eventAdditionalDuration
+		finalWork := SeatsTimesDuration(float64(finalSeats), eventAdditionalDuration)
+
+		// While processing individual events is highly parallelizable,


Same wording issue here. "Parallelizable" means can be parallel. But what we want to say here is that this processing is highly parallel.

MikeSpreitzer · 2021-11-01T02:37:48Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		finalWork := SeatsTimesDuration(float64(finalSeats), eventAdditionalDuration)
+
+		// While processing individual events is highly parallelizable,
+		// the design/implementation of P&F have a couple limitations that


s/have/has/

MikeSpreitzer · 2021-11-01T02:41:28Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+	// we will work on tuning the algorithm later. Given that the actual work
+	// associated with processing watch events is happening in multiple
+	// goroutines (proportional to the number of watchers) that are all


This is good.

wojtek-t

Thanks @MikeSpreitzer - PTAL

wojtek-t · 2021-11-01T08:35:43Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

-		additionalLatency = eventAdditionalDuration
+		finalWork := SeatsTimesDuration(float64(finalSeats), eventAdditionalDuration)
+
+		// While processing individual events is highly parallelizable,


wojtek-t · 2021-11-01T08:35:47Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/mutating_work_estimator.go

+		finalWork := SeatsTimesDuration(float64(finalSeats), eventAdditionalDuration)
+
+		// While processing individual events is highly parallelizable,
+		// the design/implementation of P&F have a couple limitations that


MikeSpreitzer

/lgtm

k8s-ci-robot · 2021-11-01T13:33:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/util/flowcontrol/OWNERS~~ [MikeSpreitzer,wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

P&F: clean up mutating work estimator tests

943bc38

k8s-ci-robot assigned MikeSpreitzer Oct 27, 2021

k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Oct 27, 2021

k8s-ci-robot assigned tkashem Oct 27, 2021

k8s-ci-robot requested review from lavalamp and yue9944882 October 27, 2021 08:46

k8s-ci-robot added the area/apiserver label Oct 27, 2021

wojtek-t force-pushed the pf_watch_support_7 branch from 1ed960b to 30c99ca Compare October 27, 2021 10:59

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 28, 2021

tkashem reviewed Oct 29, 2021

View reviewed changes

MikeSpreitzer reviewed Oct 29, 2021

View reviewed changes

P&F: move seat-seconds to a better location

e262db7

wojtek-t force-pushed the pf_watch_support_7 branch from 30c99ca to c5e2628 Compare October 29, 2021 20:20

wojtek-t commented Oct 29, 2021

View reviewed changes

MikeSpreitzer reviewed Oct 29, 2021

View reviewed changes

wojtek-t commented Oct 30, 2021

View reviewed changes

wojtek-t force-pushed the pf_watch_support_7 branch from c5e2628 to ba5d444 Compare October 30, 2021 13:20

MikeSpreitzer reviewed Nov 1, 2021

View reviewed changes

P&F: update mutating request estimation

4700cf6

wojtek-t force-pushed the pf_watch_support_7 branch from ba5d444 to 4700cf6 Compare November 1, 2021 08:35

wojtek-t commented Nov 1, 2021

View reviewed changes

MikeSpreitzer approved these changes Nov 1, 2021

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2021

k8s-ci-robot merged commit 7669498 into kubernetes:master Nov 1, 2021

k8s-ci-robot added this to the v1.23 milestone Nov 1, 2021

		// 2) we are not changing the finalWork estimate - just potentially
		// making it much longer. As long as the maximum seats setting

P&F: Update and cleanup mutating work estimator #105930

P&F: Update and cleanup mutating work estimator #105930

Conversation

wojtek-t commented Oct 27, 2021

wojtek-t commented Oct 27, 2021

fedebongio commented Oct 28, 2021

tkashem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkashem Oct 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeSpreitzer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Oct 30, 2021

MikeSpreitzer left a comment

Choose a reason for hiding this comment

MikeSpreitzer Nov 1, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeSpreitzer left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 1, 2021

tkashem Oct 29, 2021 •

edited

MikeSpreitzer Nov 1, 2021 •

edited