Adjust LIST work estimator to match current code #104599

MikeSpreitzer · 2021-08-26T03:24:32Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR adjusts the work estimator for LIST requests in API Priority and Fairness to reflect the worst case behavior of the LIST handlers.

Which issue(s) this PR fixes:

Fixes #104596

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

MikeSpreitzer · 2021-08-26T03:25:00Z

@kubernetes/sig-api-machinery-bugs
/cc @tkashem
/cc @wojtek-t
/cc @deads2k
@lavalamp

wojtek-t · 2021-08-26T06:57:44Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/list_work_estimator.go

-		// As a result. the work estimate of the request should be computed based on <limit> and the actual
-		// cost of processing more elements will be hidden in the request processing latency.
+	var estimatedObjectsToBeProcessed int64 = count
+	if !isListFromCache && utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) &&


APIListChunking, Limit > 0, etc. are already part of shouldListFromStorage function - they aren't needed here

The should-serve-from-etcd condition is a disjunction; it does not imply that pagination is enabled and the limit is positive.

OK - I see now what you're trying to achieve. That makes sense.

Can we maybe opaque feature gate and limit >0 into sth like:

useLimit := !isListFromCache && utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) && listOptions.Limit > 0

It would make it easier to follow.

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/list_work_estimator.go

wojtek-t · 2021-08-26T06:58:53Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/list_work_estimator.go

+	var estimatedObjectsToBeProcessed int64 = count
+	if !isListFromCache && utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) &&
+		listOptions.Limit > 0 && listOptions.Limit < estimatedObjectsToBeProcessed &&
+		listOptions.FieldSelector == "" && listOptions.LabelSelector == "" {


This isn't true - whether or not selector is specified doesn't matter for the fact if we serve from cache or not.

I am not sure I understand. The comment that you are reviewing is about when to use Limit instead of count as the worst-case number of objects touched. When fetching from the watch cache, all the cached objects are fetched from the cache and tested with the filter.

I guess I know what you had on your mind. But I completely didn't follow it neither from the code comment nor from the comment above.

IIUC, what you're saying is that the determines how many objects we actually return to the user. But in case of existing selectors, we may need to process many more objects to actually get of them to return.

Given that "filter" operation is generally much lighter then serialization I was thinking how to merge those two between different modes (i.e. listing from cache and from etcd in this case).
But I think it plays well, because instead of serializing them, we're actually deserializing all of them when they are returned from etcd.

So I'm fine this logic, but I think the comment needs to be clarified to incorporate what I described above.

No, what I was saying is something even simpler: just look at the number of objects touched in any way.

If we think that serialization and de-serialization are far more expensive than anything else, then indeed we want a different number of objects than the number touched in any way.

Now I have to go review the costs of fetching from the watch cache to see if I believe it.

Maybe we want a linear blend between the two numbers?

Maybe we want a linear blend between the two numbers?

I don't think we want. Serialiation/deserialization is order of magnitude cheeper than filtering in all practical cases.

And maybe we want (number de-serialized) + (number serialized)?

And maybe we want (number de-serialized) + (number serialized)?

This may be a valid option.

FYI, I see that a watch cache is based on a *threadSafeStore, whose List operation does just one allocation (to create the slice to return).

fedebongio · 2021-08-26T20:09:39Z

/triage accepted

MikeSpreitzer · 2021-08-30T17:21:27Z

I updated https://docs.google.com/document/d/1r7KQhVAM3gi6mgdf69lqLAYG_3aR1TAeWunpYnde7i0/edit?usp=sharing with a formula for the number of serializations and de-serializations in the apiserver. I am not clear on the relative cost of the corresponding serializations in the etcd server.

wojtek-t · 2021-08-31T06:41:07Z

I updated https://docs.google.com/document/d/1r7KQhVAM3gi6mgdf69lqLAYG_3aR1TAeWunpYnde7i0/edit?usp=sharing with a formula for the number of serializations and de-serializations in the apiserver. I am not clear on the relative cost of the corresponding serializations in the etcd server.

@MikeSpreitzer - can you please open the doc for comment access? I don't fully agree with some bits there but it would be easier to point those things if I had comment access.

MikeSpreitzer · 2021-09-01T00:52:19Z

Comments authorized on that doc.

wojtek-t · 2021-09-01T06:39:28Z

Comments authorized on that doc.

Thank you.
I'm fairly sure that it looked differently when i was looking at it yesterday. Anyway - I added some minor comments, but overall the current version looks reasonable.

MikeSpreitzer · 2021-09-01T21:12:28Z

The force-push to 3c07c0b8cb3 revises to follow the formula in the latest https://docs.google.com/document/d/1r7KQhVAM3gi6mgdf69lqLAYG_3aR1TAeWunpYnde7i0

MikeSpreitzer · 2021-09-02T02:39:19Z

The force-push to 7a0d5dc5ee2 recognizes the one case where the watch cache does pagination, and updates the unit tests.

MikeSpreitzer · 2021-09-02T03:16:55Z

/retest

MikeSpreitzer · 2021-09-02T04:39:45Z

/retest

wojtek-t · 2021-09-02T13:16:28Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/list_work_estimator.go

+	limit := numStored
+	if utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) && listOptions.Limit > 0 &&
+		listOptions.Limit < numStored {
+		limit = listOptions.Limit


This isn't true for listing from cache - see my comment in the doc.

Oh wow, I hadn't noticed. That seems like pretty strange behavior to me. Is it a bug? Given that we are struggling with the costs of long list results, maybe the cacher should limit its response in this case?

Revised estimator to match current code.

Is it a bug? Given that we are struggling with the costs of long list results, maybe the cacher should limit its response in this case?

It was intentional (although I agree it isn't intuitive).
The reasoning behind it is that in large clusters, when node agents were listing stuff from etcd (say e.g. to restore after control-plane outage when those lists where somewhat coordinated across many nodes), it was completely killing etcd.
And given listing from watchcache is explicit opt-in, then ignoring limit is slightly less risky, as people need to explicitly opt-in for that behavior so hopefully they understand the consequences.

With P&F giving us a better protection (once we have a good list support) [and couple other things that happened over last years] we may be able to revise it, but that requires much deeper testing and should be its own effort.

[FTR - once solve, we would be able to actually graduate https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/365-paginated-lists to stable, as solving somehow the pagination aspect for watchcache is currently the only blocker IIRC.]

wojtek-t · 2021-09-02T13:17:36Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/list_work_estimator.go

-			estimatedObjectsToBeProcessed = count
-		}
+
+	if isListFromCache {


nit: I think that using switch instead of multiple if else if.. branches is a bit clearer:

switch { case isListFromCache: ... case listOptions.FieldSelector != "" ...: ... default: ... }

MikeSpreitzer · 2021-09-02T15:58:06Z

The force-push to 6f160ca is for the last two review comments above and updating the unit test accordingly.

wojtek-t · 2021-09-03T06:25:44Z

/lgtm
/approve

thanks!

k8s-ci-robot · 2021-09-03T06:26:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/util/flowcontrol/OWNERS~~ [MikeSpreitzer,wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pacoxu · 2021-09-03T08:19:59Z

/retest
flake #104057 which will be fixed in #104069

k8s-triage-robot · 2021-09-03T11:08:11Z

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

The PR does have any do-not-merge/* labels
The PR does not have the needs-ok-to-test label
The PR is mergeable (does not have a needs-rebase label)
The PR is approved (has cncf-cla: yes, lgtm, approved labels)
The PR is failing tests required for merge

You can:

Review the full test history for this PR
Prevent this bot from retesting with /lgtm cancel or /hold
Help make our tests less flaky by following our Flaky Tests Guide

/retest

k8s-ci-robot requested review from deads2k, tkashem and wojtek-t August 26, 2021 03:25

k8s-ci-robot requested a review from lavalamp August 26, 2021 03:25

wojtek-t reviewed Aug 26, 2021

View reviewed changes

wojtek-t self-assigned this Aug 26, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 26, 2021

MikeSpreitzer force-pushed the proper-limit branch from b4a5876 to 3c07c0b Compare September 1, 2021 21:11

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 1, 2021

MikeSpreitzer force-pushed the proper-limit branch from 3c07c0b to 7a0d5dc Compare September 2, 2021 02:37

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 2, 2021

wojtek-t reviewed Sep 2, 2021

View reviewed changes

Adjust LIST work estimator to match current code

6f160ca

MikeSpreitzer force-pushed the proper-limit branch from 7a0d5dc to 6f160ca Compare September 2, 2021 15:56

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 3, 2021

k8s-ci-robot merged commit 7997805 into kubernetes:master Sep 3, 2021

k8s-ci-robot added this to the v1.23 milestone Sep 3, 2021

MikeSpreitzer deleted the proper-limit branch September 8, 2021 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust LIST work estimator to match current code #104599

Adjust LIST work estimator to match current code #104599

MikeSpreitzer commented Aug 26, 2021

MikeSpreitzer commented Aug 26, 2021

wojtek-t Aug 26, 2021

MikeSpreitzer Aug 27, 2021

wojtek-t Aug 27, 2021

wojtek-t Aug 26, 2021

MikeSpreitzer Aug 27, 2021

wojtek-t Aug 27, 2021

MikeSpreitzer Aug 27, 2021

wojtek-t Aug 27, 2021

MikeSpreitzer Aug 27, 2021

wojtek-t Aug 27, 2021

MikeSpreitzer Aug 27, 2021

fedebongio commented Aug 26, 2021

MikeSpreitzer commented Aug 30, 2021

wojtek-t commented Aug 31, 2021

MikeSpreitzer commented Sep 1, 2021

wojtek-t commented Sep 1, 2021

MikeSpreitzer commented Sep 1, 2021

MikeSpreitzer commented Sep 2, 2021

MikeSpreitzer commented Sep 2, 2021

MikeSpreitzer commented Sep 2, 2021

wojtek-t Sep 2, 2021

MikeSpreitzer Sep 2, 2021

MikeSpreitzer Sep 2, 2021

wojtek-t Sep 3, 2021

wojtek-t Sep 2, 2021

MikeSpreitzer Sep 2, 2021

MikeSpreitzer commented Sep 2, 2021

wojtek-t commented Sep 3, 2021

k8s-ci-robot commented Sep 3, 2021

pacoxu commented Sep 3, 2021

k8s-triage-robot commented Sep 3, 2021

Adjust LIST work estimator to match current code #104599

Adjust LIST work estimator to match current code #104599

Conversation

MikeSpreitzer commented Aug 26, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

MikeSpreitzer commented Aug 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fedebongio commented Aug 26, 2021

MikeSpreitzer commented Aug 30, 2021

wojtek-t commented Aug 31, 2021

MikeSpreitzer commented Sep 1, 2021

wojtek-t commented Sep 1, 2021

MikeSpreitzer commented Sep 1, 2021

MikeSpreitzer commented Sep 2, 2021

MikeSpreitzer commented Sep 2, 2021

MikeSpreitzer commented Sep 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeSpreitzer commented Sep 2, 2021

wojtek-t commented Sep 3, 2021

k8s-ci-robot commented Sep 3, 2021

pacoxu commented Sep 3, 2021

k8s-triage-robot commented Sep 3, 2021