Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate CEL cost totals #108612

Merged
merged 1 commit into from Mar 28, 2022

Conversation

DangerOnTheRanger
Copy link
Contributor

@DangerOnTheRanger DangerOnTheRanger commented Mar 9, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR is a part of #107573, and adds support at the CRD level for CEL expression cost calculation as per the KEP, and emits an error message if the CRD CEL cost limit is exceeded. This PR builds off of #108419 by taking into account maxLength and associated fields when calculating the total maximum cost for a CRD's CEL expressions.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 9, 2022
@k8s-ci-robot k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 9, 2022
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 10, 2022
@jpbetz
Copy link
Contributor

jpbetz commented Mar 12, 2022

@DangerOnTheRanger Just to make sure we're on the same page, I'm expecting that:

per-expression estimated cost is: *

per-CRD estimated cost: sum(per-expression estimated costs)

I wanted to point this out because this PR is titled "per-CRD" but includes logic to calculate number of times an expression can be evaluated.

@DangerOnTheRanger DangerOnTheRanger changed the title [WIP] Calculate per-CRD CEL cost [WIP] Calculate CEL cost totals Mar 14, 2022
@DangerOnTheRanger
Copy link
Contributor Author

Kermit Alexander II Just to make sure we're on the same page, I'm expecting that:

per-expression estimated cost is: *

per-CRD estimated cost: sum(per-expression estimated costs)

I wanted to point this out because this PR is titled "per-CRD" but includes logic to calculate number of times an expression can be evaluated.

Yeah, I think there was unfortunately some ambiguity there. I've renamed the PR so the changes make a bit more sense/have a bit more context, hopefully.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2022
@cici37
Copy link
Contributor

cici37 commented Mar 17, 2022

Hi @DangerOnTheRanger, when you rebase, please remove TODO in compilation_test and update the cost limit with const PerCallLimit. Thanks

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 18, 2022
}
}

func getCRDCost(baseCost uint64, schemaNode *schemaTree) uint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a name different than "CRD cost"? This computes the cost of a single CEL expression, not the cost of all the CEL expressions in a CRD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it could be more descriptive. I've renamed it to getExpressionCost - how does that sound?

@DangerOnTheRanger DangerOnTheRanger force-pushed the cel-crd-maxlength branch 2 times, most recently from 37762dc to dbc87e3 Compare March 25, 2022 18:32
@DangerOnTheRanger
Copy link
Contributor Author

/retest

@@ -981,6 +1026,61 @@ func ValidateCustomResourceDefinitionOpenAPISchema(schema *apiextensions.JSONSch
return allErrs
}

func extractMaxElements(schema *apiextensions.JSONSchemaProps) *uint64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does a nil return mean?

Copy link
Contributor

@jpbetz jpbetz Mar 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godoc:

extractMaxElements returns the factor by which the schema increases the number of
possible data elements for its children.  If schema is a map and has MaxProperties or an
array has MaxItems, the int pointer of the max value is returned.
If schema is a map or array and does not have MaxProperties or MaxItems, nil is returned to indicate
that there is no limit to the possible number of data elements imposed by the current
schema.  If the schema is an object, 1 is returned to indicate that there
is not increase to the number of possible data elements for its children.  Primitives
do not have children, but 1 is returned for simplicity in this case.

// Note that this only assumes a single comma between data elements, so if the schema is contained under only maps,
// this estimates a higher cardinality that would be possible.
func MaxCardinality(s *schema.Structural) uint64 {
sz := estimateMinSizeJSON(s) + 1 // assume at least one comma between elements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this being called in ways that will make us repeatedly recursively evaluate the size of a schema?

if I have a schema 40 nesting levels deep, and have a cel rule at each level, does this call:

  • estimateMinSizeJSON(root) (traversing all child schemas to compute the min size of the root)
  • estimateMinSizeJSON(level 1) (re-traversing all child schemas to compute the min size of level 1)
  • ...
    ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beyond the scope of this PR, but a similar question exists for other callers of estimateMinSizeJSON via SchemaDeclType / estimateMaxArrayItemsPerRequest / estimateMaxAdditionalPropertiesPerRequest

we want to make sure a deep schema with cel rules at the root and other levels isn't going to be super expensive to compute cost on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general problem is fairly pervasive in the CRD validation side of things. While I was looking at how we could do more work as a post traversal step to make the min calculations cheap, I noticed that we construct a new structural schema whenever we compile CEL programs, which is another case of use doing a recursive traversal at every level (for the worst case). We also convert those schemas to the "decl" format that CEL accepts in compile (which is another recursive traversal).

How would you feel about a beta task where we construct a benchmark that reproduces this problem well and then optimize it away? A traversal that starts at the first branch in the tree where a CEL rule encountered that accumulates some base facts (like min sizes), and prepares the structural and "decls" schemas should allow for a lot of reuse. But it's a larger change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds ok for beta, as long as the calculations that do recursive traversals in this PR and #108990 are behind the cel validation feature gate

@liggitt liggitt added this to the v1.24 milestone Mar 26, 2022
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 27, 2022
@jpbetz
Copy link
Contributor

jpbetz commented Mar 27, 2022

Feedback applied on new commits.

@liggitt
Copy link
Member

liggitt commented Mar 27, 2022

needs squash, and has an unused import compilation error:

staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/schema/cel/compilation_test.go:21:2: "math" imported but not used (typecheck)
	"math"
	^

@liggitt
Copy link
Member

liggitt commented Mar 27, 2022

lgtm otherwise

@dims
Copy link
Member

dims commented Mar 28, 2022

@DangerOnTheRanger please visit the red CI jobs!

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 28, 2022
@liggitt
Copy link
Member

liggitt commented Mar 28, 2022

/approve
/retest

@jpbetz has final lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DangerOnTheRanger, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2022
@jpbetz
Copy link
Contributor

jpbetz commented Mar 28, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2022
@jpbetz
Copy link
Contributor

jpbetz commented Mar 28, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 28, 2022
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants