-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust LIST work estimator to match current code #104599
Conversation
// As a result. the work estimate of the request should be computed based on <limit> and the actual | ||
// cost of processing more elements will be hidden in the request processing latency. | ||
var estimatedObjectsToBeProcessed int64 = count | ||
if !isListFromCache && utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APIListChunking, Limit > 0, etc. are already part of shouldListFromStorage
function - they aren't needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The should-serve-from-etcd condition is a disjunction; it does not imply that pagination is enabled and the limit is positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I see now what you're trying to achieve. That makes sense.
Can we maybe opaque feature gate and limit >0 into sth like:
useLimit := !isListFromCache && utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) && listOptions.Limit > 0
It would make it easier to follow.
staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/list_work_estimator.go
Outdated
Show resolved
Hide resolved
var estimatedObjectsToBeProcessed int64 = count | ||
if !isListFromCache && utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) && | ||
listOptions.Limit > 0 && listOptions.Limit < estimatedObjectsToBeProcessed && | ||
listOptions.FieldSelector == "" && listOptions.LabelSelector == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true - whether or not selector is specified doesn't matter for the fact if we serve from cache or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand. The comment that you are reviewing is about when to use Limit instead of count as the worst-case number of objects touched. When fetching from the watch cache, all the cached objects are fetched from the cache and tested with the filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I know what you had on your mind. But I completely didn't follow it neither from the code comment nor from the comment above.
IIUC, what you're saying is that the determines how many objects we actually return to the user. But in case of existing selectors, we may need to process many more objects to actually get of them to return.
Given that "filter" operation is generally much lighter then serialization I was thinking how to merge those two between different modes (i.e. listing from cache and from etcd in this case).
But I think it plays well, because instead of serializing them, we're actually deserializing all of them when they are returned from etcd.
So I'm fine this logic, but I think the comment needs to be clarified to incorporate what I described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I was saying is something even simpler: just look at the number of objects touched in any way.
If we think that serialization and de-serialization are far more expensive than anything else, then indeed we want a different number of objects than the number touched in any way.
Now I have to go review the costs of fetching from the watch cache to see if I believe it.
Maybe we want a linear blend between the two numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want a linear blend between the two numbers?
I don't think we want. Serialiation/deserialization is order of magnitude cheeper than filtering in all practical cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe we want (number de-serialized) + (number serialized)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe we want (number de-serialized) + (number serialized)?
This may be a valid option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I see that a watch cache is based on a *threadSafeStore
, whose List
operation does just one allocation (to create the slice to return).
/triage accepted |
I updated https://docs.google.com/document/d/1r7KQhVAM3gi6mgdf69lqLAYG_3aR1TAeWunpYnde7i0/edit?usp=sharing with a formula for the number of serializations and de-serializations in the apiserver. I am not clear on the relative cost of the corresponding serializations in the etcd server. |
@MikeSpreitzer - can you please open the doc for comment access? I don't fully agree with some bits there but it would be easier to point those things if I had comment access. |
Comments authorized on that doc. |
Thank you. |
b4a5876
to
3c07c0b
Compare
The force-push to 3c07c0b8cb3 revises to follow the formula in the latest https://docs.google.com/document/d/1r7KQhVAM3gi6mgdf69lqLAYG_3aR1TAeWunpYnde7i0 |
3c07c0b
to
7a0d5dc
Compare
The force-push to 7a0d5dc5ee2 recognizes the one case where the watch cache does pagination, and updates the unit tests. |
/retest |
1 similar comment
/retest |
limit := numStored | ||
if utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) && listOptions.Limit > 0 && | ||
listOptions.Limit < numStored { | ||
limit = listOptions.Limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true for listing from cache - see my comment in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, I hadn't noticed. That seems like pretty strange behavior to me. Is it a bug? Given that we are struggling with the costs of long list results, maybe the cacher should limit its response in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised estimator to match current code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a bug? Given that we are struggling with the costs of long list results, maybe the cacher should limit its response in this case?
It was intentional (although I agree it isn't intuitive).
The reasoning behind it is that in large clusters, when node agents were listing stuff from etcd (say e.g. to restore after control-plane outage when those lists where somewhat coordinated across many nodes), it was completely killing etcd.
And given listing from watchcache is explicit opt-in, then ignoring limit is slightly less risky, as people need to explicitly opt-in for that behavior so hopefully they understand the consequences.
With P&F giving us a better protection (once we have a good list support) [and couple other things that happened over last years] we may be able to revise it, but that requires much deeper testing and should be its own effort.
[FTR - once solve, we would be able to actually graduate https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/365-paginated-lists to stable, as solving somehow the pagination aspect for watchcache is currently the only blocker IIRC.]
estimatedObjectsToBeProcessed = count | ||
} | ||
|
||
if isListFromCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think that using switch instead of multiple if else if.. branches is a bit clearer:
switch {
case isListFromCache:
...
case listOptions.FieldSelector != "" ...:
...
default:
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
7a0d5dc
to
6f160ca
Compare
The force-push to 6f160ca is for the last two review comments above and updating the unit test accordingly. |
/lgtm thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MikeSpreitzer, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR adjusts the work estimator for LIST requests in API Priority and Fairness to reflect the worst case behavior of the LIST handlers.
Which issue(s) this PR fixes:
Fixes #104596
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: