Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set QP resource defaults #14039

Merged
merged 2 commits into from Jul 7, 2023

Conversation

skonto
Copy link
Contributor

@skonto skonto commented May 31, 2023

Part of the work for #13861.

Proposed Changes

  • Sets the deployment cm resource values by default.
  • Defaults are useful to avoid quota issues and guide users of what is the recommended values to use

Release Note

A new flag is introduced `queueproxy.resource-defaults` that sets resource requests, limits for Queue Proxy when enabled (applies only to cpu and memory).

@knative-prow knative-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/API API objects and controllers size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 31, 2023
@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 31, 2023
@skonto skonto requested a review from dprotaso May 31, 2023 10:42
@codecov
Copy link

codecov bot commented May 31, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (32ec382) 86.27% compared to head (82f917f) 86.27%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14039   +/-   ##
=======================================
  Coverage   86.27%   86.27%           
=======================================
  Files         199      199           
  Lines       14795    14808   +13     
=======================================
+ Hits        12764    12776   +12     
- Misses       1730     1732    +2     
+ Partials      301      300    -1     
Impacted Files Coverage Δ
pkg/deployment/config.go 100.00% <ø> (ø)
pkg/apis/config/features.go 95.65% <100.00%> (+0.12%) ⬆️
pkg/reconciler/revision/resources/queue.go 98.42% <100.00%> (+0.05%) ⬆️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@skonto skonto changed the title [wip] Set QP resource defaults Set QP resource defaults Jun 1, 2023
@knative-prow knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 1, 2023
@skonto
Copy link
Contributor Author

skonto commented Jun 6, 2023

/assign @dprotaso

@xtreme-vikram-yadav
Copy link

lgtm

queue-sidecar-ephemeral-storage-request: "512Mi"

# Sets the queue proxy's ephemeral storage limit.
# If omitted, no value is specified and the system default is used.
# If omitted, a default value (currently "1024Mi"), is used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a setting that lets you 'unset' this value? ie. empty string? If so then we should add that to the comments

Copy link
Contributor Author

@skonto skonto Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config map validation does not allow an empty string eg. queue-sidecar-memory-limit: "", as the quantity is parsed and checked:

error: configmaps "config-deployment" could not be patched: admission webhook "config.webhook.serving.knative.dev" denied the request: validation failed: failed to parse "queue-sidecar-memory-limit": quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9])$'

The only way is not to set a value. However with this PR if the user enables the feature for QP defaults (planning to introduce a feature flag) he could only change them via the configmap but not unset them, unless I have a feature flag per resource type.
The latter is redundant as if they want partially to define defaults they can always set them explicitly in the corresponding configmap and disable this feature.
The idea is to enable this feature by default at some point as a good practice.

Copy link
Member

@dprotaso dprotaso Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an explicit unset constant could be used to allow the default to be unset.

This could also be the way operators can retain the prior behaviour allowing us to safely change the default over two releases.

queue-sidecar-memory-limit: "800Mi"

# Sets the queue proxy's ephemeral storage request.
# If omitted, no value is specified and the system default is used.
# If omitted, a default value (currently "512Mi"), is used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need this much storage - in theory we use nothing no?

Copy link
Contributor Author

@skonto skonto Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure about that either, there is an interesting history behind the validation of the quota resources: kubernetes/kubernetes#110956. Also ephemeral storage is not validated as reported in the ticket, explanation here. So I think its safe to remove (I will double check).

Copy link
Contributor Author

@skonto skonto Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some experimentation it seems to me that ephemeral storage should be skipped as there is no guarantee that someone will use it if set and can cause issues in a namespace where there is a hard limit on it via a resourceQuota.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From: #14039 (comment)

Maybe the default can just remain unset

queue-sidecar-memory-request: "400Mi"

# Sets the queue proxy's memory limit.
# If omitted, no value is specified and the system default is used.
# If omitted, a default value (currently "800Mi"), is used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these defaults still make sense? Also could setting these break existing users - ie. there's some queue-proxy using more than 800Mi - and now upgrading is causing crashes

@dprotaso
Copy link
Member

dprotaso commented Jun 7, 2023

Should we deprecate the % knob?

@ReToCode
Copy link
Member

ReToCode commented Jun 8, 2023

I agree with Daves comments, the current requests seem very high, as these are blocking resources that we probably do not use. I still have a todo open to get the real world scenario numbers for various components. This might help to find better defaults in a follow-up PR.

@dprotaso I do not see a way how this is not breaking. Vanilla K8S does not define any defaults, so we don't get any requests & limits today. We cannot not break that if we define something. Can we add it as a breaking change to the release notes?

Other than that, LGTM.

@evankanderson
Copy link
Member

@dprotaso I do not see a way how this is not breaking. Vanilla K8S does not define any defaults, so we don't get any requests & limits today. We cannot not break that if we define something. Can we add it as a breaking change to the release notes?

Dave's point was specifically about defining limits where they don't exist today. The k8s default value for limit is "unlimited" unless a value is set, so we're effectively reducing from ♾️ -> some finite value. This is probably good, particularly for a noncompressible resource like memory, but it could break someone who has a lot of really big requests buffered in-flight.

@ReToCode
Copy link
Member

ReToCode commented Jun 9, 2023

Dave's point was specifically about defining limits where they don't exist today.

Vanilla K8S does not define any defaults, so we don't get any requests & limits today

Exactly. This is why I think it will always be a breaking change. Can we do that when announcing it as breaking change in the release notes?

@dprotaso
Copy link
Member

dprotaso commented Jun 9, 2023

FWIW - it looks like downstream we're setting our defaults to

  queue-sidecar-cpu-request: "25m"
  queue-sidecar-cpu-limit: "1000m"
  queue-sidecar-memory-request: "50Mi"
  queue-sidecar-memory-limit: "200Mi"

Can we do that when announcing it as breaking change in the release notes?

Yeah. I also think we need a way to 'unset' this so we can apply this default to our KinD installations and give operators a way to 'preserve' past behaviour.

@skonto
Copy link
Contributor Author

skonto commented Jun 20, 2023

FWIW - it looks like downstream we're setting our defaults to...

I was just using the numbers downstream suggested by @norman465 which work at a bigger scale, but lower values could be more attractive for average scenarios, so I will use that.

Yeah. I also think we need a way to 'unset' this so we can apply this default to our KinD installations and give operators a way to 'preserve' past behaviour.

I will add a flag to make migration smoother as with secure-pod-defaults. I will add qp-resource-defaults if there is no objection. At some point we can just set this to "enabled" by default.

@dprotaso
Copy link
Member

I will add a flag to make migration smoother as with secure-pod-defaults. I will add qp-resource-defaults if there is no objection. At some point we can just set this to "enabled" by default.

Sounds good

@skonto skonto force-pushed the enforce_defaults_for_qp_resources branch from 84898f9 to 82f917f Compare July 5, 2023 07:03
@knative-prow knative-prow bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 5, 2023
@skonto
Copy link
Contributor Author

skonto commented Jul 5, 2023

@dprotaso gentle ping. Status:

  • Added the flag for smooth transition, users can choose to disable the flag for a new release that turns it on by default.
  • Added some test scenarios
  • Updated the release note

@dprotaso
Copy link
Member

dprotaso commented Jul 7, 2023

/lgtm
/approve

@skonto can you update the release note to say it only applies to cpu and memory?

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2023
@knative-prow
Copy link

knative-prow bot commented Jul 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, skonto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 7, 2023
@knative-prow knative-prow bot merged commit b935bea into knative:main Jul 7, 2023
62 checks passed
skonto added a commit to skonto/serving that referenced this pull request Oct 5, 2023
* enforce qp resource defaults

* add flag and several fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants