perf: Add schema caching to parameter validation#261
perf: Add schema caching to parameter validation#261vmarceau wants to merge 4 commits intopb33f:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #261 +/- ##
==========================================
+ Coverage 97.94% 97.98% +0.03%
==========================================
Files 64 64
Lines 6525 6647 +122
==========================================
+ Hits 6391 6513 +122
Misses 133 133
Partials 1 1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6697da8 to
526d925
Compare
|
Awesome! this is great! |
| // Store in cache for future requests | ||
| if o != nil && o.SchemaCache != nil && schema != nil && schema.GoLow() != nil { | ||
| hash := schema.GoLow().Hash() | ||
| o.SchemaCache.Store(hash, &cache.SchemaCacheEntry{ |
There was a problem hiding this comment.
This writes a partial SchemaCacheEntry from ValidateSingleParameterSchema with only RenderedJSON and CompiledSchema. The body validators treat any cached compiled schema as a fully-populated entry and reuse RenderedInline, ReferenceSchema, and RenderedNode . A successful parameter validation of a shared schema can poison later request/response body errors.
There was a problem hiding this comment.
My bad, thanks for catching this. Fixed it in 7271dd3.
| // Try cache lookup first | ||
| if opts != nil && opts.SchemaCache != nil && schema.GoLow() != nil { | ||
| hash := schema.GoLow().Hash() | ||
| if cached, ok := opts.SchemaCache.Load(hash); ok && cached != nil && len(cached.RenderedJSON) > 0 { |
There was a problem hiding this comment.
Makes GetRenderedSchema stateful. Cache hits return raw JSON via string(cached.RenderedJSON), cache misses call json.Marshal on the []byte from RenderInline()
While testing, The first invalid request returned
ReferenceSchema="\"dHlwZTogInN0cmluZyIKZW51bToKICAgIC0gImEiCiAgICAtICJiIgo=\"",
then after one successful request warmed the cache, the same invalid request returned
ReferenceSchema="{\"enum\":[\"a\",\"b\"],\"type\":\"string\"}".
Same input should not change error payloads based on cache state.
There was a problem hiding this comment.
Thanks for catching this!
This actually sent me down the rabbit hole regarding the format of the SchemaValidationFailure.ReferenceSchema attributes on validation errors....
It looks like for most validation errors, such as request body validation errors (REF), we are using the value of RenderInline(), which is plain YAML text. However, in the case of parameter validation, we apply json.Marshal() on this data, which produces some base64 encoded value like you have in your comment example. Correct me if I'm wrong since I might be missing some context here, but looking at the codebase I could not find any rationale for this json.Marshal() transform on YAML data for populating ReferenceSchema on validation errors for parameter validation only. It feels to me this is a pre-existing bug.
So coming back to your original comment, I did two things:
- I fixed the inconsistency in the return value of
GetRenderedSchemabased on the state of the cache (whether its a cache hit or miss will return the same value). - I removed the
json.Marshal()call to align the format of theSchemaValidationFailure.ReferenceSchemafor parameter validation errors with the rest of the codebase. NowReferenceSchemais a plain YAML string, like it is for e.g. request body validation errors.
This was implemented in 97ddd74.
daveshanley
left a comment
There was a problem hiding this comment.
Looks good overall, but I found two issues that should be fixed before merge:
-
ValidateSingleParameterSchemais writing partialSchemaCacheEntryvalues into the shared cache (RenderedJSON+CompiledSchemaonly). Later request/response body validation treats any entry withCompiledSchema != nilas a full cache hit, which can stripReferenceSchema/RenderedInlinefrom body-validation errors when the same schema is shared. Please either always write a fullSchemaCacheEntryhere, or have body validators
treat incomplete entries as cache misses. -
GetRenderedSchemais not deterministic right now: cache hits return raw JSON, but cache misses return a different marshaled form fromRenderInline(). That means the same validation error can produce different
ReferenceSchemapayloads depending on cache state. Please make both paths serialize the schema the same way.
70454de to
97ddd74
Compare
|
Many thanks for the careful review @daveshanley 🙇 I have addressed each of your comments (see reply below each) and pushed the fixes. |
| if err == nil && rendered != nil { | ||
| renderedBytes, _ := json.Marshal(rendered) | ||
| fail.ReferenceSchema = string(renderedBytes) | ||
| fail.ReferenceSchema = string(rendered) |
Summary
This PR fixes a performance issue where parameter validation (path, query, header, cookie) was not utilizing the schema cache, causing expensive schema re-compilation and re-rendering on every request. Request body validation already used caching, but parameter validation was inadvertently left out.
As shown below, this yields significant performance improvements for parameter validation.
Related issue: this problem was already mentioned in recent issue #227
Problem
When using
libopenapi-validatorfor HTTP request validation, we observed that parameter validation was significantly slower than request body validation.To isolate and demonstrate this issue, we designed benchmarks with two separate endpoints:
Test Specification
Benchmarks and profiling
Running the benchmark code below on an Apple M2 Max shows that parameter validation is significantly slower than body validation:
Running CPU profiling on parameter validation revealed the following breakdown:
The
syscalloverhead comes from schema re-compilation on every request. The jsonschema library'sAddResource()function callsfilepath.Abs()for non-URL resource names, which triggers anos.Getwd()syscall.Benchmark Code
Root Cause Analysis
Slow performance for parameter validation originate from 2 issues:
ValidateSingleParameterSchemaandValidateParameterSchemainparameters/validate_parameter.go) do not check theSchemaCachebefore compiling schemas, despite this cache being warmed up at validator initialization viawarmParameterSchema().RenderInline()is being called on every validation to prepare schema JSON for potential error messages. These calls happen regardless of whether validation errors occurred, adding unnecessary overhead.Testing Results
Running the benchmarks again on an Apple M2 Max after the fix introduced in this PR shows significant performance improvements:
Observations: