-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P&F: Update and cleanup mutating work estimator #105930
Conversation
1ed960b
to
30c99ca
Compare
/retest |
/triage accepted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, left a few minor comments
// TODO: As described in the KEP, we should take into account that not all | ||
// events are equal and try to estimate the cost of a single event based on | ||
// some historical data about size of events. | ||
var finalSeats uint | ||
var additionalLatency time.Duration | ||
maxSeats := uint(math.Ceil(float64(watchCount) / watchesPerSeat)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maxSeats
and maximumSeats
sound very similar, can we rename maxSeats
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This is just an earlier calculation of finalSeats
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// TODO: Make this unconditional after we tune the algorithm better. | ||
// Technically, there is an overhead connected to processing an event after | ||
// the request finishes even if there is a small number of watches. | ||
// However, until we tune the estimation we want to stay on the safe side | ||
// an avoid introducing additional latency for almost every single request. | ||
if watchCount >= watchesPerSeat { | ||
finalSeats = uint(math.Ceil(float64(watchCount) / watchesPerSeat)) | ||
additionalLatency = eventAdditionalDuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually preferred what you had before, this way we can avoid calling SeatsTimesDuration
and DurationPerSeat
when final seats do not exceed the maximum, or am i missing something here?
if watchCount >= watchesPerSeat {
finalSeats = uint(math.Ceil(float64(watchCount) / watchesPerSeat))
additionalLatency = eventAdditionalDuration
if finalSeats > maximumSeats {
originalFinalSeatSeconds := SeatsTimesDuration(float64(finalSeats), additionalLatency)
finalSeats = maximumSeats
additionalLatency = originalFinalSeatSeconds.DurationPerSeat(float64(finalSeats))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. And this avoids the poorly-named "maxSeats".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
additionalLatencyExpected: 5 * time.Millisecond, | ||
}, | ||
*/ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just the uncommented version, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - modulo one ase that was adjusted to test the capping - this one:
"request verb is create, watches registered, maximum is capped"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly good. A few minor issues noted inline.
|
||
// SeatSeconds is a measure of work, in units of seat-seconds, using a fixed-point representation. | ||
// `SeatSeconds(n)` represents `n/ssScale` seat-seconds. | ||
// The constants `ssScale` and `ssScaleDigits` are private to the implementation here, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I forgot to remove the mention of "ssScaleDigits" from this comment when that constant was removed in the original drafting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -24,27 +24,36 @@ import ( | |||
apirequest "k8s.io/apiserver/pkg/endpoints/request" | |||
) | |||
|
|||
const ( | |||
watchesPerSeat = 10.0 | |||
eventAdditionalDuration = 5 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like "notification" better than "event", since we also have a different thing called "event" and "notification" is a pretty standard and more specific term for what we are talking about here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The terminology that is used everywhere is "watch event", not "watch notification".
So we should be consistent with the terminology that is used across the project (even if we believe that notification would be a better one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I had forgotten about "watch event".
// We simply estimate the total amount of final work for this request based | ||
// on the above and rely on potential reshaping of the request if the | ||
// concurrency limit for a given priority level will not allow to run | ||
// request with that many seats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one - in my opinion it is true:
- we estimate the total amount of final work based on the this assumption
- we still rely on potential reshaping if concurrency limit is smaller then final seats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, the preceding unchanged sentence is a bit off base. It says "we assume ... is infinitely parallelizable". Was that intended to say that the actual work is all launched at once? Or that we can freely reshape that work as much as we like?
I think a more accurate statement, starting with a revised version of the preceding thus-far-unchanged sentence, would be something like the following.
// As a starting point we assume that the actual work associated with the watchers happens
// in many waiting goroutines that are all resumed at once,
// each such goroutine taking 1/Nth of a seat for M milliseconds.
// We allow the accounting in APF of that work to be reshaped into
// another rectangle of equal area, for practical reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly changed the version (as the multiple gorotuines part is not an assumption - it's a fact).
// TODO: As described in the KEP, we should take into account that not all | ||
// events are equal and try to estimate the cost of a single event based on | ||
// some historical data about size of events. | ||
var finalSeats uint | ||
var additionalLatency time.Duration | ||
maxSeats := uint(math.Ceil(float64(watchCount) / watchesPerSeat)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This is just an earlier calculation of finalSeats
.
|
||
// TODO: Make this unconditional after we tune the algorithm better. | ||
// Technically, there is an overhead connected to processing an event after | ||
// the request finishes even if there is a small number of watches. | ||
// However, until we tune the estimation we want to stay on the safe side | ||
// an avoid introducing additional latency for almost every single request. | ||
if watchCount >= watchesPerSeat { | ||
finalSeats = uint(math.Ceil(float64(watchCount) / watchesPerSeat)) | ||
additionalLatency = eventAdditionalDuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. And this avoids the poorly-named "maxSeats".
|
||
// While processing individual events is highly parallelizable, | ||
// our design/implementation have a couple limitations that would make | ||
// it highly inefficient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two problems with the wording here.
- "it" is supposed to refer to the most recent noun phrase. In the current text that is "a couple limitations", which is not what you mean.
- The limitations in the current APF design/implementation do not make the processing ("sending") of notifications inefficient. APF does not change how notifications are sent. The issue here is with how APF manages requests to account for that notification sending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworder
// | ||
// TODO: Confirm that the current cap of maximumSeats allow us to | ||
// achieve the above. | ||
if finalSeats > maximumSeats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant maximumSeats
was introduced to cap the width coming from the LIST estimator. Perhaps we do not necessarily want the same cap for the mutation-WATCH estimator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to tune it, but for now we don't have any better constant so I'm reusing the existing one.
// 2) we are not changing the finalWork estimate - just potentially | ||
// making it much longer. This is fine as long as we will not | ||
// be able to dispatch so many requests that will effectively | ||
// completely overload kube-apiserver (and/or etcd). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there are other concerns as well. I recall network bandwidth being mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - the "overloaded" is both in terms of cpu and network bandwidth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also reworded it.
// 1) we reduce the amount of seat-seconds that are "wasted" during | ||
// dispatching and executing initial phase of the request | ||
// 2) we are not changing the finalWork estimate - just potentially | ||
// making it much longer. This is fine as long as we will not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not "fine". This is abandoning the correspondence between the way the actual work is scheduled --- all the notification work is launched at once --- and how APF is reserving server capacity for that work. The hope here is that the newly relaxed relationship is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded
30c99ca
to
c5e2628
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikeSpreitzer @tkashem - thanks for the review; I applied the comments - PTAL
// We simply estimate the total amount of final work for this request based | ||
// on the above and rely on potential reshaping of the request if the | ||
// concurrency limit for a given priority level will not allow to run | ||
// request with that many seats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one - in my opinion it is true:
- we estimate the total amount of final work based on the this assumption
- we still rely on potential reshaping if concurrency limit is smaller then final seats.
additionalLatencyExpected: 5 * time.Millisecond, | ||
}, | ||
*/ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - modulo one ase that was adjusted to test the capping - this one:
"request verb is create, watches registered, maximum is capped"
@@ -24,27 +24,36 @@ import ( | |||
apirequest "k8s.io/apiserver/pkg/endpoints/request" | |||
) | |||
|
|||
const ( | |||
watchesPerSeat = 10.0 | |||
eventAdditionalDuration = 5 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The terminology that is used everywhere is "watch event", not "watch notification".
So we should be consistent with the terminology that is used across the project (even if we believe that notification would be a better one).
// TODO: As described in the KEP, we should take into account that not all | ||
// events are equal and try to estimate the cost of a single event based on | ||
// some historical data about size of events. | ||
var finalSeats uint | ||
var additionalLatency time.Duration | ||
maxSeats := uint(math.Ceil(float64(watchCount) / watchesPerSeat)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// TODO: Make this unconditional after we tune the algorithm better. | ||
// Technically, there is an overhead connected to processing an event after | ||
// the request finishes even if there is a small number of watches. | ||
// However, until we tune the estimation we want to stay on the safe side | ||
// an avoid introducing additional latency for almost every single request. | ||
if watchCount >= watchesPerSeat { | ||
finalSeats = uint(math.Ceil(float64(watchCount) / watchesPerSeat)) | ||
additionalLatency = eventAdditionalDuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// 2) we are not changing the finalWork estimate - just potentially | ||
// making it much longer. This is fine as long as we will not | ||
// be able to dispatch so many requests that will effectively | ||
// completely overload kube-apiserver (and/or etcd). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - the "overloaded" is both in terms of cpu and network bandwidth.
// 1) we reduce the amount of seat-seconds that are "wasted" during | ||
// dispatching and executing initial phase of the request | ||
// 2) we are not changing the finalWork estimate - just potentially | ||
// making it much longer. This is fine as long as we will not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded
// 2) we are not changing the finalWork estimate - just potentially | ||
// making it much longer. This is fine as long as we will not | ||
// be able to dispatch so many requests that will effectively | ||
// completely overload kube-apiserver (and/or etcd). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also reworded it.
// | ||
// TODO: Confirm that the current cap of maximumSeats allow us to | ||
// achieve the above. | ||
if finalSeats > maximumSeats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to tune it, but for now we don't have any better constant so I'm reusing the existing one.
|
||
// SeatSeconds is a measure of work, in units of seat-seconds, using a fixed-point representation. | ||
// `SeatSeconds(n)` represents `n/ssScale` seat-seconds. | ||
// The constants `ssScale` and `ssScaleDigits` are private to the implementation here, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// We simply estimate the total amount of final work for this request based | ||
// on the above and rely on potential reshaping of the request if the | ||
// concurrency limit for a given priority level will not allow to run | ||
// request with that many seats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, the preceding unchanged sentence is a bit off base. It says "we assume ... is infinitely parallelizable". Was that intended to say that the actual work is all launched at once? Or that we can freely reshape that work as much as we like?
I think a more accurate statement, starting with a revised version of the preceding thus-far-unchanged sentence, would be something like the following.
// As a starting point we assume that the actual work associated with the watchers happens
// in many waiting goroutines that are all resumed at once,
// each such goroutine taking 1/Nth of a seat for M milliseconds.
// We allow the accounting in APF of that work to be reshaped into
// another rectangle of equal area, for practical reasons.
// 2) we are not changing the finalWork estimate - just potentially | ||
// making it much longer. As long as the maximum seats setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than saying "potentially making it much longer", where "it" grammatically refers to "the finalWork estimate" (which is work not time), I would say something like "reshaping it to be narrower and longer".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikeSpreitzer - thank you Mike for the review (I felt a bit as on the English lesson :) )
PTAL
// 2) we are not changing the finalWork estimate - just potentially | ||
// making it much longer. As long as the maximum seats setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// We simply estimate the total amount of final work for this request based | ||
// on the above and rely on potential reshaping of the request if the | ||
// concurrency limit for a given priority level will not allow to run | ||
// request with that many seats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly changed the version (as the multiple gorotuines part is not an assumption - it's a fact).
c5e2628
to
ba5d444
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, just a couple of minor comments inline.
additionalLatency = eventAdditionalDuration | ||
finalWork := SeatsTimesDuration(float64(finalSeats), eventAdditionalDuration) | ||
|
||
// While processing individual events is highly parallelizable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same wording issue here. "Parallelizable" means can be parallel. But what we want to say here is that this processing is highly parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
finalWork := SeatsTimesDuration(float64(finalSeats), eventAdditionalDuration) | ||
|
||
// While processing individual events is highly parallelizable, | ||
// the design/implementation of P&F have a couple limitations that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/have/has/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// we will work on tuning the algorithm later. Given that the actual work | ||
// associated with processing watch events is happening in multiple | ||
// goroutines (proportional to the number of watchers) that are all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good.
ba5d444
to
4700cf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MikeSpreitzer - PTAL
additionalLatency = eventAdditionalDuration | ||
finalWork := SeatsTimesDuration(float64(finalSeats), eventAdditionalDuration) | ||
|
||
// While processing individual events is highly parallelizable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
finalWork := SeatsTimesDuration(float64(finalSeats), eventAdditionalDuration) | ||
|
||
// While processing individual events is highly parallelizable, | ||
// the design/implementation of P&F have a couple limitations that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MikeSpreitzer, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/kind cleanup
/priority important-soon
/sig api-machinery
/assign @MikeSpreitzer @tkashem