New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate CEL cost totals #108612
Calculate CEL cost totals #108612
Conversation
staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/validation/validation.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/validation/validation.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/validation/validation.go
Outdated
Show resolved
Hide resolved
/triage accepted |
@DangerOnTheRanger Just to make sure we're on the same page, I'm expecting that: per-expression estimated cost is: * per-CRD estimated cost: sum(per-expression estimated costs) I wanted to point this out because this PR is titled "per-CRD" but includes logic to calculate number of times an expression can be evaluated. |
Yeah, I think there was unfortunately some ambiguity there. I've renamed the PR so the changes make a bit more sense/have a bit more context, hopefully. |
Hi @DangerOnTheRanger, when you rebase, please remove TODO in compilation_test and update the cost limit with const |
ac86a88
to
f5b6d34
Compare
} | ||
} | ||
|
||
func getCRDCost(baseCost uint64, schemaNode *schemaTree) uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a name different than "CRD cost"? This computes the cost of a single CEL expression, not the cost of all the CEL expressions in a CRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it could be more descriptive. I've renamed it to getExpressionCost
- how does that sound?
37762dc
to
dbc87e3
Compare
/retest |
staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/validation/validation.go
Outdated
Show resolved
Hide resolved
@@ -981,6 +1026,61 @@ func ValidateCustomResourceDefinitionOpenAPISchema(schema *apiextensions.JSONSch | |||
return allErrs | |||
} | |||
|
|||
func extractMaxElements(schema *apiextensions.JSONSchemaProps) *uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does a nil return mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
godoc:
extractMaxElements returns the factor by which the schema increases the number of
possible data elements for its children. If schema is a map and has MaxProperties or an
array has MaxItems, the int pointer of the max value is returned.
If schema is a map or array and does not have MaxProperties or MaxItems, nil is returned to indicate
that there is no limit to the possible number of data elements imposed by the current
schema. If the schema is an object, 1 is returned to indicate that there
is not increase to the number of possible data elements for its children. Primitives
do not have children, but 1 is returned for simplicity in this case.
staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/validation/validation.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/validation/validation_test.go
Show resolved
Hide resolved
staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/validation/validation.go
Show resolved
Hide resolved
// Note that this only assumes a single comma between data elements, so if the schema is contained under only maps, | ||
// this estimates a higher cardinality that would be possible. | ||
func MaxCardinality(s *schema.Structural) uint64 { | ||
sz := estimateMinSizeJSON(s) + 1 // assume at least one comma between elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this being called in ways that will make us repeatedly recursively evaluate the size of a schema?
if I have a schema 40 nesting levels deep, and have a cel rule at each level, does this call:
- estimateMinSizeJSON(root) (traversing all child schemas to compute the min size of the root)
- estimateMinSizeJSON(level 1) (re-traversing all child schemas to compute the min size of level 1)
- ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beyond the scope of this PR, but a similar question exists for other callers of estimateMinSizeJSON via SchemaDeclType / estimateMaxArrayItemsPerRequest / estimateMaxAdditionalPropertiesPerRequest
we want to make sure a deep schema with cel rules at the root and other levels isn't going to be super expensive to compute cost on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This general problem is fairly pervasive in the CRD validation side of things. While I was looking at how we could do more work as a post traversal step to make the min calculations cheap, I noticed that we construct a new structural schema whenever we compile CEL programs, which is another case of use doing a recursive traversal at every level (for the worst case). We also convert those schemas to the "decl" format that CEL accepts in compile (which is another recursive traversal).
How would you feel about a beta task where we construct a benchmark that reproduces this problem well and then optimize it away? A traversal that starts at the first branch in the tree where a CEL rule encountered that accumulates some base facts (like min sizes), and prepares the structural and "decls" schemas should allow for a lot of reuse. But it's a larger change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds ok for beta, as long as the calculations that do recursive traversals in this PR and #108990 are behind the cel validation feature gate
Feedback applied on new commits. |
d47025b
to
78b4326
Compare
needs squash, and has an unused import compilation error:
|
lgtm otherwise |
@DangerOnTheRanger please visit the red CI jobs! |
78b4326
to
7e66bd2
Compare
/approve @jpbetz has final lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DangerOnTheRanger, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/hold cancel |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR is a part of #107573, and adds support at the CRD level for CEL expression cost calculation as per the KEP, and emits an error message if the CRD CEL cost limit is exceeded. This PR builds off of #108419 by taking into account
maxLength
and associated fields when calculating the total maximum cost for a CRD's CEL expressions.Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: