Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust LIST work estimator to match current code #104599

Merged
merged 1 commit into from
Sep 3, 2021

Conversation

MikeSpreitzer
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR adjusts the work estimator for LIST requests in API Priority and Fairness to reflect the worst case behavior of the LIST handlers.

Which issue(s) this PR fixes:

Fixes #104596

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 26, 2021
@MikeSpreitzer
Copy link
Member Author

@kubernetes/sig-api-machinery-bugs
/cc @tkashem
/cc @wojtek-t
/cc @deads2k
@lavalamp

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. area/apiserver approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 26, 2021
// As a result. the work estimate of the request should be computed based on <limit> and the actual
// cost of processing more elements will be hidden in the request processing latency.
var estimatedObjectsToBeProcessed int64 = count
if !isListFromCache && utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APIListChunking, Limit > 0, etc. are already part of shouldListFromStorage function - they aren't needed here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The should-serve-from-etcd condition is a disjunction; it does not imply that pagination is enabled and the limit is positive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I see now what you're trying to achieve. That makes sense.

Can we maybe opaque feature gate and limit >0 into sth like:

useLimit := !isListFromCache && utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) && listOptions.Limit > 0

It would make it easier to follow.

var estimatedObjectsToBeProcessed int64 = count
if !isListFromCache && utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) &&
listOptions.Limit > 0 && listOptions.Limit < estimatedObjectsToBeProcessed &&
listOptions.FieldSelector == "" && listOptions.LabelSelector == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't true - whether or not selector is specified doesn't matter for the fact if we serve from cache or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand. The comment that you are reviewing is about when to use Limit instead of count as the worst-case number of objects touched. When fetching from the watch cache, all the cached objects are fetched from the cache and tested with the filter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I know what you had on your mind. But I completely didn't follow it neither from the code comment nor from the comment above.

IIUC, what you're saying is that the determines how many objects we actually return to the user. But in case of existing selectors, we may need to process many more objects to actually get of them to return.

Given that "filter" operation is generally much lighter then serialization I was thinking how to merge those two between different modes (i.e. listing from cache and from etcd in this case).
But I think it plays well, because instead of serializing them, we're actually deserializing all of them when they are returned from etcd.

So I'm fine this logic, but I think the comment needs to be clarified to incorporate what I described above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what I was saying is something even simpler: just look at the number of objects touched in any way.

If we think that serialization and de-serialization are far more expensive than anything else, then indeed we want a different number of objects than the number touched in any way.

Now I have to go review the costs of fetching from the watch cache to see if I believe it.

Maybe we want a linear blend between the two numbers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want a linear blend between the two numbers?

I don't think we want. Serialiation/deserialization is order of magnitude cheeper than filtering in all practical cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe we want (number de-serialized) + (number serialized)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe we want (number de-serialized) + (number serialized)?

This may be a valid option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I see that a watch cache is based on a *threadSafeStore, whose List operation does just one allocation (to create the slice to return).

@wojtek-t wojtek-t self-assigned this Aug 26, 2021
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 26, 2021
@MikeSpreitzer
Copy link
Member Author

I updated https://docs.google.com/document/d/1r7KQhVAM3gi6mgdf69lqLAYG_3aR1TAeWunpYnde7i0/edit?usp=sharing with a formula for the number of serializations and de-serializations in the apiserver. I am not clear on the relative cost of the corresponding serializations in the etcd server.

@wojtek-t
Copy link
Member

I updated https://docs.google.com/document/d/1r7KQhVAM3gi6mgdf69lqLAYG_3aR1TAeWunpYnde7i0/edit?usp=sharing with a formula for the number of serializations and de-serializations in the apiserver. I am not clear on the relative cost of the corresponding serializations in the etcd server.

@MikeSpreitzer - can you please open the doc for comment access? I don't fully agree with some bits there but it would be easier to point those things if I had comment access.

@MikeSpreitzer
Copy link
Member Author

Comments authorized on that doc.

@wojtek-t
Copy link
Member

wojtek-t commented Sep 1, 2021

Comments authorized on that doc.

Thank you.
I'm fairly sure that it looked differently when i was looking at it yesterday. Anyway - I added some minor comments, but overall the current version looks reasonable.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 1, 2021
@MikeSpreitzer
Copy link
Member Author

The force-push to 3c07c0b8cb3 revises to follow the formula in the latest https://docs.google.com/document/d/1r7KQhVAM3gi6mgdf69lqLAYG_3aR1TAeWunpYnde7i0

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 2, 2021
@MikeSpreitzer
Copy link
Member Author

The force-push to 7a0d5dc5ee2 recognizes the one case where the watch cache does pagination, and updates the unit tests.

@MikeSpreitzer
Copy link
Member Author

/retest

1 similar comment
@MikeSpreitzer
Copy link
Member Author

/retest

limit := numStored
if utilfeature.DefaultFeatureGate.Enabled(features.APIListChunking) && listOptions.Limit > 0 &&
listOptions.Limit < numStored {
limit = listOptions.Limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't true for listing from cache - see my comment in the doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I hadn't noticed. That seems like pretty strange behavior to me. Is it a bug? Given that we are struggling with the costs of long list results, maybe the cacher should limit its response in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised estimator to match current code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a bug? Given that we are struggling with the costs of long list results, maybe the cacher should limit its response in this case?

It was intentional (although I agree it isn't intuitive).
The reasoning behind it is that in large clusters, when node agents were listing stuff from etcd (say e.g. to restore after control-plane outage when those lists where somewhat coordinated across many nodes), it was completely killing etcd.
And given listing from watchcache is explicit opt-in, then ignoring limit is slightly less risky, as people need to explicitly opt-in for that behavior so hopefully they understand the consequences.

With P&F giving us a better protection (once we have a good list support) [and couple other things that happened over last years] we may be able to revise it, but that requires much deeper testing and should be its own effort.

[FTR - once solve, we would be able to actually graduate https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/365-paginated-lists to stable, as solving somehow the pagination aspect for watchcache is currently the only blocker IIRC.]

estimatedObjectsToBeProcessed = count
}

if isListFromCache {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think that using switch instead of multiple if else if.. branches is a bit clearer:

switch {
case isListFromCache:
  ...
case listOptions.FieldSelector != "" ...:
  ...
default:
  ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@MikeSpreitzer
Copy link
Member Author

The force-push to 6f160ca is for the last two review comments above and updating the unit test accordingly.

@wojtek-t
Copy link
Member

wojtek-t commented Sep 3, 2021

/lgtm
/approve

thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 3, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pacoxu
Copy link
Member

pacoxu commented Sep 3, 2021

/retest
flake #104057 which will be fixed in #104069

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 7997805 into kubernetes:master Sep 3, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Sep 3, 2021
@MikeSpreitzer MikeSpreitzer deleted the proper-limit branch September 8, 2021 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to tweak LIST work estimator in API Priority and Fairness
6 participants