Skip to content

perf: Add schema caching to parameter validation#261

Open
vmarceau wants to merge 4 commits intopb33f:mainfrom
vmarceau:vmarceau/cache-lookup-for-param-validation
Open

perf: Add schema caching to parameter validation#261
vmarceau wants to merge 4 commits intopb33f:mainfrom
vmarceau:vmarceau/cache-lookup-for-param-validation

Conversation

@vmarceau
Copy link
Copy Markdown

@vmarceau vmarceau commented Apr 13, 2026

Summary

This PR fixes a performance issue where parameter validation (path, query, header, cookie) was not utilizing the schema cache, causing expensive schema re-compilation and re-rendering on every request. Request body validation already used caching, but parameter validation was inadvertently left out.

As shown below, this yields significant performance improvements for parameter validation.

Related issue: this problem was already mentioned in recent issue #227

Problem

When using libopenapi-validator for HTTP request validation, we observed that parameter validation was significantly slower than request body validation.

To isolate and demonstrate this issue, we designed benchmarks with two separate endpoints:

  • POST /users - validates only a request body (no path parameters)
  • GET /users/{userId}/posts - validates only parameters (path, query, header)

Test Specification

openapi: "3.1.0"
info:
  title: Cache Performance Test API
  version: "1.0.0"
paths:
  /users:
    post:
      operationId: createUser
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [name, email]
              properties:
                name:
                  type: string
                  minLength: 1
                  maxLength: 100
                email:
                  type: string
                  format: email
                  maxLength: 256
                age:
                  type: integer
                  minimum: 0
                  maximum: 150
                bio:
                  type: string
                  maxLength: 1000
      responses:
        "201":
          description: Created
  /users/{userId}/posts:
    get:
      operationId: getUserPosts
      parameters:
        - name: userId
          in: path
          required: true
          schema:
            type: string
            minLength: 1
            maxLength: 64
        - name: search
          in: query
          schema:
            type: string
            maxLength: 256
        - name: page
          in: query
          schema:
            type: integer
            minimum: 0
            maximum: 1000
        - name: limit
          in: query
          schema:
            type: integer
            minimum: 1
            maximum: 100
        - name: published
          in: query
          schema:
            type: boolean
        - name: sort_by
          in: query
          schema:
            type: string
            enum: ["created_at", "updated_at", "title"]
        - name: order
          in: query
          schema:
            type: string
            enum: ["asc", "desc"]
        - name: X-Request-ID
          in: header
          schema:
            type: string
            maxLength: 64
      responses:
        "200":
          description: OK

Benchmarks and profiling

Running the benchmark code below on an Apple M2 Max shows that parameter validation is significantly slower than body validation:

Benchmark Before (time/allocs)
body_only (BASELINE) 19 µs / 133 allocs
path_only 148 µs / 935 allocs
path_and_query 746 µs / 5,086 allocs
path_query_and_header 790 µs / 5,816 allocs

Running CPU profiling on parameter validation revealed the following breakdown:

~35% - os.Getwd() / syscall.Stat / filepath.Abs
       Called from: jsonschema/v6.(*Compiler).AddResource
       Called from: helpers.NewCompiledSchema
       Called from: parameters.ValidateSingleParameterSchema

~32% - RenderInline() calls for error message preparation
       Called from: parameters/query_parameters.go
       Called from: parameters/path_parameters.go
       Called from: parameters/header_parameters.go
       Called from: parameters/cookie_parameters.go

~11% - GC pressure from high allocation rate

~22% - Actual schema compilation and validation work

The syscall overhead comes from schema re-compilation on every request. The jsonschema library's AddResource() function calls filepath.Abs() for non-URL resource names, which triggers an os.Getwd() syscall.

Benchmark Code
package validator

import (
	"bytes"
	"net/http"
	"net/http/httptest"
	"testing"

	"github.com/pb33f/libopenapi"
)

// testSpecForCacheBenchmark is an OpenAPI spec designed to isolate:
// - Request body validation (POST /users with body only, no path params)
// - Parameter validation (GET /users/{userId}/posts with path, query, header params)
var testSpecForCacheBenchmark = []byte(`
openapi: "3.1.0"
info:
  title: Cache Performance Test API
  version: "1.0.0"
paths:
  /users:
    post:
      operationId: createUser
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - name
                - email
              properties:
                name:
                  type: string
                  minLength: 1
                  maxLength: 100
                email:
                  type: string
                  format: email
                  maxLength: 256
                age:
                  type: integer
                  minimum: 0
                  maximum: 150
                bio:
                  type: string
                  maxLength: 1000
      responses:
        "201":
          description: Created
  /users/{userId}/posts:
    get:
      operationId: getUserPosts
      parameters:
        - name: userId
          in: path
          required: true
          schema:
            type: string
            minLength: 1
            maxLength: 64
        - name: search
          in: query
          schema:
            type: string
            maxLength: 256
        - name: page
          in: query
          schema:
            type: integer
            minimum: 0
            maximum: 1000
        - name: limit
          in: query
          schema:
            type: integer
            minimum: 1
            maximum: 100
        - name: published
          in: query
          schema:
            type: boolean
        - name: sort_by
          in: query
          schema:
            type: string
            enum: ["created_at", "updated_at", "title"]
        - name: order
          in: query
          schema:
            type: string
            enum: ["asc", "desc"]
        - name: X-Request-ID
          in: header
          schema:
            type: string
            maxLength: 64
      responses:
        "200":
          description: OK
`)

// BenchmarkRequestBodyValidation benchmarks request body validation in isolation.
// Uses POST /users which has a body but NO path parameters.
func BenchmarkRequestBodyValidation(b *testing.B) {
	doc, err := libopenapi.NewDocument(testSpecForCacheBenchmark)
	if err != nil {
		b.Fatalf("failed to create document: %v", err)
	}

	v, errs := NewValidator(doc)
	if len(errs) > 0 {
		b.Fatalf("failed to create validator: %v", errs)
	}

	requestBody := []byte(`{
		"name": "John Doe",
		"email": "john.doe@example.com",
		"age": 30,
		"bio": "Software engineer interested in distributed systems."
	}`)

	b.ReportAllocs()
	b.ResetTimer()

	for b.Loop() {
		req := httptest.NewRequest(http.MethodPost, "/users", bytes.NewReader(requestBody))
		req.Header.Set("Content-Type", "application/json")
		v.ValidateHttpRequest(req)
	}
}

// BenchmarkParameterValidation benchmarks parameter validation in isolation.
// Uses GET /users/{userId}/posts with varying combinations of path, query, and header params.
func BenchmarkParameterValidation(b *testing.B) {
	doc, err := libopenapi.NewDocument(testSpecForCacheBenchmark)
	if err != nil {
		b.Fatalf("failed to create document: %v", err)
	}

	v, errs := NewValidator(doc)
	if len(errs) > 0 {
		b.Fatalf("failed to create validator: %v", errs)
	}

	tests := []struct {
		name      string
		url       string
		addHeader bool
	}{
		{
			name:      "path_only",
			url:       "/users/123/posts",
			addHeader: false,
		},
		{
			name:      "path_and_query",
			url:       "/users/123/posts?search=test&page=2&limit=25&published=true&sort_by=created_at&order=desc",
			addHeader: false,
		},
		{
			name:      "path_query_and_header",
			url:       "/users/123/posts?search=test&page=2&limit=25&published=true&sort_by=created_at&order=desc",
			addHeader: true,
		},
	}

	for _, tc := range tests {
		b.Run(tc.name, func(b *testing.B) {
			b.ReportAllocs()
			b.ResetTimer()

			for b.Loop() {
				req := httptest.NewRequest(http.MethodGet, tc.url, nil)
				if tc.addHeader {
					req.Header.Set("X-Request-ID", "test-request-123")
				}
				v.ValidateHttpRequest(req)
			}
		})
	}
}

Root Cause Analysis

Slow performance for parameter validation originate from 2 issues:

  1. Schema compilation cache not used: The parameter validation functions (ValidateSingleParameterSchema and ValidateParameterSchema in parameters/validate_parameter.go) do not check the SchemaCache before compiling schemas, despite this cache being warmed up at validator initialization via warmParameterSchema().
  2. Schema rendering for error messages: Even with compilation caching, RenderInline() is being called on every validation to prepare schema JSON for potential error messages. These calls happen regardless of whether validation errors occurred, adding unnecessary overhead.

Testing Results

Running the benchmarks again on an Apple M2 Max after the fix introduced in this PR shows significant performance improvements:

Benchmark Before After Speedup
body_only (BASELINE) 19 µs / 133 allocs 19 µs / 133 allocs
path_only 148 µs / 935 allocs 21 µs / 177 allocs 7x
path_and_query 746 µs / 5,086 allocs 34 µs / 357 allocs 22x
path_query_and_header 790 µs / 5,816 allocs 38 µs / 382 allocs 21x

Observations:

  • Request body validation was already fast (~19 µs) and remains unchanged
  • Before this fix, even validating a single path parameter was 8x slower than validating an entire JSON body
  • After this fix, parameter validation performance is comparable to request body validation

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.98%. Comparing base (8828f82) to head (97ddd74).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #261      +/-   ##
==========================================
+ Coverage   97.94%   97.98%   +0.03%     
==========================================
  Files          64       64              
  Lines        6525     6647     +122     
==========================================
+ Hits         6391     6513     +122     
  Misses        133      133              
  Partials        1        1              
Flag Coverage Δ
unittests 97.98% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vmarceau vmarceau force-pushed the vmarceau/cache-lookup-for-param-validation branch from 6697da8 to 526d925 Compare April 13, 2026 15:55
@vmarceau vmarceau marked this pull request as ready for review April 13, 2026 17:13
@daveshanley
Copy link
Copy Markdown
Member

Awesome! this is great!

// Store in cache for future requests
if o != nil && o.SchemaCache != nil && schema != nil && schema.GoLow() != nil {
hash := schema.GoLow().Hash()
o.SchemaCache.Store(hash, &cache.SchemaCacheEntry{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This writes a partial SchemaCacheEntry from ValidateSingleParameterSchema with only RenderedJSON and CompiledSchema. The body validators treat any cached compiled schema as a fully-populated entry and reuse RenderedInline, ReferenceSchema, and RenderedNode . A successful parameter validation of a shared schema can poison later request/response body errors.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, thanks for catching this. Fixed it in 7271dd3.

Comment thread parameters/validate_parameter.go Outdated
// Try cache lookup first
if opts != nil && opts.SchemaCache != nil && schema.GoLow() != nil {
hash := schema.GoLow().Hash()
if cached, ok := opts.SchemaCache.Load(hash); ok && cached != nil && len(cached.RenderedJSON) > 0 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes GetRenderedSchema stateful. Cache hits return raw JSON via string(cached.RenderedJSON), cache misses call json.Marshal on the []byte from RenderInline()

While testing, The first invalid request returned

ReferenceSchema="\"dHlwZTogInN0cmluZyIKZW51bToKICAgIC0gImEiCiAgICAtICJiIgo=\"", 

then after one successful request warmed the cache, the same invalid request returned

ReferenceSchema="{\"enum\":[\"a\",\"b\"],\"type\":\"string\"}". 

Same input should not change error payloads based on cache state.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this!

This actually sent me down the rabbit hole regarding the format of the SchemaValidationFailure.ReferenceSchema attributes on validation errors....

It looks like for most validation errors, such as request body validation errors (REF), we are using the value of RenderInline(), which is plain YAML text. However, in the case of parameter validation, we apply json.Marshal() on this data, which produces some base64 encoded value like you have in your comment example. Correct me if I'm wrong since I might be missing some context here, but looking at the codebase I could not find any rationale for this json.Marshal() transform on YAML data for populating ReferenceSchema on validation errors for parameter validation only. It feels to me this is a pre-existing bug.

So coming back to your original comment, I did two things:

  • I fixed the inconsistency in the return value of GetRenderedSchema based on the state of the cache (whether its a cache hit or miss will return the same value).
  • I removed the json.Marshal() call to align the format of the SchemaValidationFailure.ReferenceSchema for parameter validation errors with the rest of the codebase. Now ReferenceSchema is a plain YAML string, like it is for e.g. request body validation errors.

This was implemented in 97ddd74.

Copy link
Copy Markdown
Member

@daveshanley daveshanley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but I found two issues that should be fixed before merge:

  • ValidateSingleParameterSchema is writing partial SchemaCacheEntry values into the shared cache (RenderedJSON + CompiledSchema only). Later request/response body validation treats any entry with CompiledSchema != nil as a full cache hit, which can strip ReferenceSchema / RenderedInline from body-validation errors when the same schema is shared. Please either always write a full SchemaCacheEntry here, or have body validators
    treat incomplete entries as cache misses.

  • GetRenderedSchema is not deterministic right now: cache hits return raw JSON, but cache misses return a different marshaled form from RenderInline(). That means the same validation error can produce different
    ReferenceSchema payloads depending on cache state. Please make both paths serialize the schema the same way.

@vmarceau vmarceau force-pushed the vmarceau/cache-lookup-for-param-validation branch from 70454de to 97ddd74 Compare April 15, 2026 17:14
@vmarceau
Copy link
Copy Markdown
Author

Many thanks for the careful review @daveshanley 🙇 I have addressed each of your comments (see reply below each) and pushed the fixes.

@vmarceau vmarceau requested a review from daveshanley April 15, 2026 17:36
if err == nil && rendered != nil {
renderedBytes, _ := json.Marshal(rendered)
fail.ReferenceSchema = string(renderedBytes)
fail.ReferenceSchema = string(rendered)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants