🐛 bug: hash QUERY cache body keys#4459
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughCache middleware helpers are split into dedicated files for cache-control parsing, key generation, freshness utilities, and Vary handling, with related tests added for key collisions and reserved-prefix hashing. ChangesCache middleware split
SSE newline fast path
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4459 +/- ##
==========================================
+ Coverage 91.70% 91.73% +0.02%
==========================================
Files 134 138 +4
Lines 13480 13486 +6
==========================================
+ Hits 12362 12371 +9
+ Misses 707 705 -2
+ Partials 411 410 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Hash any key-segment value that already begins with the reserved "sha256:" namespace prefix, in addition to oversized segments, so a short literal "sha256:..." value cannot collide with a genuinely-hashed long segment. This is defense-in-depth at the helper level: every call site already escapes ":" via escapeKeyDelimiters, but the bounding helpers are now self-protecting regardless. Factor the prefix into a shared hashPrefix constant reused across boundKeySegment, appendBoundKeySegment, and appendHashedKeySegment, and add a unit test covering re-hashing of reserved-prefix values, verbatim passthrough of normal values, and no over-hashing of "sha256" (no colon). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QVm2BnJwqp81uFAmo7Ltog
defaultKeyGenerator hashed every QUERY request body unconditionally, paying a SHA-256 even for a 7-byte form body. Append small bodies verbatim (after escapeKeyDelimiters) and only hash bodies that exceed the per-dimension length bound, matching the policy used for the other key dimensions. Critically, the hash is always computed over the RAW body in both hash branches. Hashing the escaped form for small bodies while hashing raw for large ones would let "a|"xN (escaped to "a\p"xN) collide with a body containing the literal bytes "a\p"xN. A new end-to-end regression test (Test_Cache_Security_QueryBody_RawHashDomain) pins this invariant, and the stable-key corpus is updated for the now-verbatim small bodies. The large-body branch also skips escaping entirely, avoiding a 2x memory-amplification on delimiter-heavy bodies. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QVm2BnJwqp81uFAmo7Ltog
cd2cfc8 to
fb642d3
Compare
escapeKeyDelimiters: replace the three sequential strings.ReplaceAll calls with a package-level strings.Replacer (single byte-keyed pass, no ordering dependency). The ContainsAny fast path is kept, so clean values are unchanged. The win scales with the number of distinct delimiters present (medians, 6 runs): input before (3x ReplaceAll) after (Replacer) pipe + colon 95 ns, 32 B, 2 allocs 39 ns, 16 B, 1 alloc (~2.4x) backslash+colon 107 ns, 48 B, 2 allocs 46 ns, 24 B, 1 alloc (~2.3x) heavy mixed 695 ns, 384 B, 3 allocs 238 ns, 288 B, 2 alloc (~2.9x) single colon ~85 ns, 1 alloc ~90 ns, 1-2 allocs (roughly even) normalizeNewlines: a strings.Replacer was measured and rejected here, because \r\n is a multi-byte pattern that uses genericReplacer, which always allocates (3 allocs) and is slower on the dirty path. Instead add a carriage-return fast path; the common no-CR case now short-circuits: clean (no CR): 19 ns, 0 allocs -> ~5 ns, 0 allocs Output is byte-identical to the previous implementations (verified by an equivalence test), so existing cache keys and SSE framing are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iles cache.go had grown past 1300 lines. Extract cohesive groups into separate files within the same package (no behavior change, no exported-API change): - keygen.go: defaultKeyGenerator, the canonical query/header/cookie helpers, escapeKeyDelimiters, the segment bounding/hashing helpers, and the key-tuning constants plus the key buffer pool. - vary.go: Vary parsing, the vary-key hasher, manifest load/store, and the maxVaryHeaders cap (now next to its only user). - cachecontrol.go: Cache-Control directive parsing and interpretation. - utils.go: age/freshness, HTTP-date and seconds helpers, and the auth hasher. defaultKeyGenerator is also simplified: the QUERY request-body branch moves into appendQueryBodySegment, and the "return buffer to pool" boilerplate becomes the (inlined) releaseKeyBuffer helper. The pool "get" is kept inline because a helper for it does not inline and sits in the hot path. Output is byte-identical (Test_defaultKeyGenerator_stableKeys) and the key benchmark shows unchanged allocations, so cache keys are unaffected. cache.go: ~1372 -> 891 lines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@middleware/cache/cachecontrol.go`:
- Around line 57-60: The directive scanner in cachecontrol.go is splitting on
every comma without honoring quoted strings, which can misparse values like
foo=", s-maxage=600, bar" as real directives. Update the comma શોધ logic in the
cache-control parsing loop to track quoted-string and escape state while
advancing partEnd, so commas inside quoted values are ignored. Keep the fix
localized to the directive scanning code that uses start, i, and partEnd in the
cache-control parser.
In `@middleware/cache/keygen.go`:
- Around line 251-264: appendQueryBodySegment currently allows small QUERY
bodies to be escaped and appended verbatim, which can still collide with the
auth suffix added later in cache.go. Change appendQueryBodySegment to always
route QUERY bodies through appendHashedKeySegment, and remove the
verbatim/escaped fast path so the body content can never be interpreted as part
of the key suffix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0cd45a75-3def-4301-803c-9118b04bc79c
📒 Files selected for processing (5)
middleware/cache/cache.gomiddleware/cache/cachecontrol.gomiddleware/cache/keygen.gomiddleware/cache/utils.gomiddleware/cache/vary.go
| start := i | ||
| for i < len(cc) && cc[i] != ',' { | ||
| i++ | ||
| } |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Make directive splitting quote-aware.
This scanner treats every comma as a directive separator, so foo=", s-maxage=600, bar" can be misread as a real s-maxage directive and incorrectly allow authenticated responses into shared cache. Track quoted-string and escape state while finding partEnd.
Suggested localized fix
- for i < len(cc) && cc[i] != ',' {
- i++
- }
+ inQuote := false
+ escaped := false
+ for i < len(cc) {
+ ch := cc[i]
+ if escaped {
+ escaped = false
+ i++
+ continue
+ }
+ if inQuote {
+ if ch == '\\' {
+ escaped = true
+ } else if ch == '"' {
+ inQuote = false
+ }
+ i++
+ continue
+ }
+ if ch == '"' {
+ inQuote = true
+ i++
+ continue
+ }
+ if ch == ',' {
+ break
+ }
+ i++
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| start := i | |
| for i < len(cc) && cc[i] != ',' { | |
| i++ | |
| } | |
| start := i | |
| inQuote := false | |
| escaped := false | |
| for i < len(cc) { | |
| ch := cc[i] | |
| if escaped { | |
| escaped = false | |
| i++ | |
| continue | |
| } | |
| if inQuote { | |
| if ch == '\\' { | |
| escaped = true | |
| } else if ch == '"' { | |
| inQuote = false | |
| } | |
| i++ | |
| continue | |
| } | |
| if ch == '"' { | |
| inQuote = true | |
| i++ | |
| continue | |
| } | |
| if ch == ',' { | |
| break | |
| } | |
| i++ | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@middleware/cache/cachecontrol.go` around lines 57 - 60, The directive scanner
in cachecontrol.go is splitting on every comma without honoring quoted strings,
which can misparse values like foo=", s-maxage=600, bar" as real directives.
Update the comma શોધ logic in the cache-control parsing loop to track
quoted-string and escape state while advancing partEnd, so commas inside quoted
values are ignored. Keep the fix localized to the directive scanning code that
uses start, i, and partEnd in the cache-control parser.
| // appendQueryBodySegment appends a QUERY request body as a key segment. A body | ||
| // that fits the per-dimension bound both raw and after escaping is escaped and | ||
| // appended verbatim; otherwise the raw body is hashed. The hash is always taken | ||
| // over the raw bytes, so the verbatim and hashed forms can never share a | ||
| // preimage and collide, and an oversized body is never escaped (avoids 2x | ||
| // memory amplification on delimiter-heavy input). Escaping the verbatim form | ||
| // still stops a body containing |/:/\ from injecting key-suffix structure. | ||
| func appendQueryBodySegment(dst, body []byte) []byte { | ||
| if len(body) <= maxKeyDimensionSegmentLength { | ||
| if escaped := escapeKeyDelimiters(utils.UnsafeString(body)); len(escaped) <= maxKeyDimensionSegmentLength { | ||
| return append(dst, escaped...) | ||
| } | ||
| } | ||
| return appendHashedKeySegment(dst, body) |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
Always hash QUERY bodies here.
Small bodies are still appended verbatim, so a body like foo_auth_<authHash> can collide with an authorized request whose body is foo after cache.go appends _auth_<authHash>. That keeps the suffix-injection class this PR is meant to remove.
Suggested fix
-// appendQueryBodySegment appends a QUERY request body as a key segment. A body
-// that fits the per-dimension bound both raw and after escaping is escaped and
-// appended verbatim; otherwise the raw body is hashed. The hash is always taken
-// over the raw bytes, so the verbatim and hashed forms can never share a
-// preimage and collide, and an oversized body is never escaped (avoids 2x
-// memory amplification on delimiter-heavy input). Escaping the verbatim form
-// still stops a body containing |/:/\ from injecting key-suffix structure.
+// appendQueryBodySegment appends a QUERY request body as a hashed key segment.
+// The hash is always taken over the raw bytes so body bytes cannot inject
+// cache-key suffix syntax.
func appendQueryBodySegment(dst, body []byte) []byte {
- if len(body) <= maxKeyDimensionSegmentLength {
- if escaped := escapeKeyDelimiters(utils.UnsafeString(body)); len(escaped) <= maxKeyDimensionSegmentLength {
- return append(dst, escaped...)
- }
- }
return appendHashedKeySegment(dst, body)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // appendQueryBodySegment appends a QUERY request body as a key segment. A body | |
| // that fits the per-dimension bound both raw and after escaping is escaped and | |
| // appended verbatim; otherwise the raw body is hashed. The hash is always taken | |
| // over the raw bytes, so the verbatim and hashed forms can never share a | |
| // preimage and collide, and an oversized body is never escaped (avoids 2x | |
| // memory amplification on delimiter-heavy input). Escaping the verbatim form | |
| // still stops a body containing |/:/\ from injecting key-suffix structure. | |
| func appendQueryBodySegment(dst, body []byte) []byte { | |
| if len(body) <= maxKeyDimensionSegmentLength { | |
| if escaped := escapeKeyDelimiters(utils.UnsafeString(body)); len(escaped) <= maxKeyDimensionSegmentLength { | |
| return append(dst, escaped...) | |
| } | |
| } | |
| return appendHashedKeySegment(dst, body) | |
| // appendQueryBodySegment appends a QUERY request body as a hashed key segment. | |
| // The hash is always taken over the raw bytes so body bytes cannot inject | |
| // cache-key suffix syntax. | |
| func appendQueryBodySegment(dst, body []byte) []byte { | |
| return appendHashedKeySegment(dst, body) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@middleware/cache/keygen.go` around lines 251 - 264, appendQueryBodySegment
currently allows small QUERY bodies to be escaped and appended verbatim, which
can still collide with the auth suffix added later in cache.go. Change
appendQueryBodySegment to always route QUERY bodies through
appendHashedKeySegment, and remove the verbatim/escaped fast path so the body
content can never be interpreted as part of the key suffix.
Motivation
sha256:<hex>namespace or inject key suffix syntax like|vary|...and_auth_....Description
appendHashedKeySegmenthelper and calling it from the key generator.appendHashedKeySegment(dst, segment []byte) []bytewhich appendssha256:<hex>for the provided body bytes and remove reliance on the bounded raw-segment namespace for QUERY bodies.middleware/cache/keygen_test.goand add a regression test inmiddleware/cache/cache_test.gothat proves a long body and a short body equal to its formersha256:<hex>marker no longer collide.Testing
make audit— executed;go mod verifyandgo vet ./...ran, butgovulncheckreported 25 known standard-library issues on Gogo1.25.1causingmake auditto fail (these are unrelated to the patch and due to the toolchain version).make generate— ran successfully and executedgo generate ./....make betteralign,make format, andmake lint— all ran successfully, withmake lintreporting no issues.make test— ran successfully with the full test-suite resultDONE 3728 tests, 1 skipped, andgo test ./middleware/cache -run 'Test_Cache_QueryMethod|Test_defaultKeyGenerator' -count=1passed for the modified package.Codex Task