Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CEL runtime cost limit #108595

Merged
merged 1 commit into from
Mar 15, 2022
Merged

Conversation

cici37
Copy link
Contributor

@cici37 cici37 commented Mar 8, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR is to update CEL runtime cost limit

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Mar 8, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 8, 2022
@fedebongio
Copy link
Contributor

/cc @jpbetz
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. release-note-none Denotes a PR that doesn't merit a release note. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 8, 2022
@jpbetz
Copy link
Contributor

jpbetz commented Mar 8, 2022

/lgtm

This is roughly what I was expecting for an initial limit given our current heuristic that 1 cost ~= 50ns. It seems high enough that it should be primarily a backstop for runaway execution.

cc @liggitt @DangerOnTheRanger I expect we will refine this number before the release, but does this seem like a good starting point to you both?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 8, 2022
@DangerOnTheRanger
Copy link
Contributor

DangerOnTheRanger commented Mar 9, 2022

Yeah, I think especially that we should have an ample amount of time to change that limit if need be, it seems like a good place to start to me.

Comment on lines 44 to 50
// perCallLimit specify the actual cost limit per CEL validation call
// current perCallLimit gives roughly one second for each expression validation call
perCallLimit = 20000000

// RuntimeCELCostBudget is the overall cost budget for runtime CEL validation cost per CustomResource
// current RuntimeCELCostBudget gives roughly 10 seconds for CR validation
RuntimeCELCostBudget = 200000000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these represent dedicated CPU time, both of these are about an order of magnitude higher than I expected... are we really ok with 10 seconds of devoted CPU time per custom resource write validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number is fairly high now. I was thinking to start with a higher number and reduce based on performance run. Would you have any suggestions on the CPU time custom resource could consume?

Copy link
Member

@liggitt liggitt Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at most, I would expect fractions of a second of dedicated CPU per call and less than a second of dedicated CPU for validating the entire resource. I wouldn't start any higher than that, and would actually like to see that ramp even further down as we prove you can do significant amounts of complex validation with much lower limits. I'd probably drop an order of magnitude from each of these to start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated to start with 2000000 for PerCallLimit and 20000000 for RuntimeCELCostBudget. cc @jpbetz @DangerOnTheRanger

@@ -92,34 +93,55 @@ func validator(s *schema.Structural, isResourceRoot bool) *Validator {
}

// Validate validates all x-kubernetes-validations rules in Validator against obj and returns any errors.
func (s *Validator) Validate(fldPath *field.Path, sts *schema.Structural, obj interface{}) field.ErrorList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ignoring changes to this file and assuming this will be rebased on top of #108482

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is going to be rebased on #108482 . This PR is only for updating the budget. Thanks

@k8s-ci-robot k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Mar 14, 2022
@cici37
Copy link
Contributor Author

cici37 commented Mar 14, 2022

/retest

@cici37
Copy link
Contributor Author

cici37 commented Mar 14, 2022

/test pull-kubernetes-e2e-kind-ipv6

@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@liggitt
Copy link
Member

liggitt commented Mar 14, 2022

these numbers look like plausible starting points... is there a tracking issue or spreadsheet item for finalizing these for 1.24?

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 14, 2022
@cici37
Copy link
Contributor Author

cici37 commented Mar 14, 2022

these numbers look like plausible starting points... is there a tracking issue or spreadsheet item for finalizing these for 1.24?

Here is the umbrella issue for tracking: #107573
Here is the spreadsheet for tracking the progress

@liggitt
Copy link
Member

liggitt commented Mar 15, 2022

/lgtm
/approve
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 15, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cici37, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

Copy link
Contributor

@yangjunmyfm192085 yangjunmyfm192085 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants