fix: preserve compressed/binary request bodies in proxy handler#619
fix: preserve compressed/binary request bodies in proxy handler#619
Conversation
The first-party proxy handler unconditionally read request bodies via h3's readBody(), which decodes binary bytes as UTF-8 text. This corrupted compressed payloads (e.g. PostHog gzip-js) and caused upstream timeouts. When no privacy transforms are needed or the body is binary/compressed, the raw request stream is now piped directly to upstream via getRequestWebStream() — zero buffering, zero re-encoding.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
commit: |
Adds posthog-js dependency, fixture page, and provider paths so PostHog is exercised in the E2E error-check sweep with a real browser connection.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDetects binary/compressed request bodies (Content-Encoding present, application/octet-stream, or compression query param) and introduces a passthrough mode that streams the raw request to upstream for write methods. When not passthrough, the handler reads and may transform bodies (JSON fingerprint stripping, parsing JSON strings, form handling). Adjusts header handling (content-length, x-forwarded-for/anonymization, selected privacy masking for UA/Accept-Language/client hints) and uses duplex/raw stream as fetch body for passthrough. Adds tests and fixtures (PostHog) and a unit suite validating byte-for-byte preservation of gzip/binary payloads and privacy-related JSON behavior. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/unit/proxy-handler-binary.test.ts (1)
117-161: Add one more regression case:compression=gzip-jswith privacy enabled and nocontent-encoding.Current coverage validates gzip-js only when privacy is off, and privacy-on only when
content-encodingis present. Adding this combination will lock in the intended behavior for compressed text/plain uploads.💡 Suggested test case
+ it('preserves gzip-js body when privacy is enabled without content-encoding', async () => { + const payload = JSON.stringify({ event: 'test', ua: 'fingerprint' }) + const compressed = gzipSync(Buffer.from(payload)) + + await fetch(`http://localhost:${proxyPort}/batch?compression=gzip-js`, { + method: 'POST', + headers: { + 'content-type': 'text/plain', + 'x-test-privacy': 'true', + }, + body: compressed, + }) + + expect(capturedUpstreamBody).not.toBeNull() + expect(Buffer.compare(capturedUpstreamBody!, compressed)).toBe(0) + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/unit/proxy-handler-binary.test.ts` around lines 117 - 161, Add a new test case mirroring "preserves gzip-compressed body sent without content-type" but with privacy enabled: send the same gzip-compressed payload to `http://localhost:${proxyPort}/batch?compression=gzip-js` including header `x-test-privacy: 'true'`, using the same `gzipSync` and `compressed` variables, then assert `capturedUpstreamBody` is not null and that `Buffer.compare(capturedUpstreamBody!, compressed) === 0`; place the test near the other binary/compression tests so it exercises the `compression=gzip-js` + privacy path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/runtime/server/proxy-handler.ts`:
- Around line 232-233: The current truthy checks on rawBody drop valid falsy
payloads (''/0/false); update both occurrences (the if (rawBody) guards around
the object branch and the later check at lines ~312-314) to test explicitly for
undefined (e.g., if (rawBody !== undefined)) so only an absent body is treated
as missing while preserving the existing object branch behavior (the typeof
rawBody === 'object' branch and subsequent handling should remain unchanged).
- Around line 105-113: The current isBinaryBody check in proxy-handler.ts
(computed using getHeaders and content-type) is too narrow and misses cases like
?compression=gzip-js with text/plain, causing privacy-enabled write requests to
still call readBody; update the isBinaryBody logic used before readBody() so it
also treats compressed/encoded bodies as binary: extend the condition in the
isBinaryBody computation (used around getHeaders, contentType, and before
readBody()) to include common compression tokens in content-type (e.g., match
/gzip|br|deflate|compress|x-gzip/ or +gzip/+br suffixes) and to detect a
compression query param on the request URL (e.g.,
URL(event.url).searchParams.has('compression') or
searchParams.get('compression') includes gzip/br), so any of those cases
short-circuit to passing raw bytes instead of calling readBody().
---
Nitpick comments:
In `@test/unit/proxy-handler-binary.test.ts`:
- Around line 117-161: Add a new test case mirroring "preserves gzip-compressed
body sent without content-type" but with privacy enabled: send the same
gzip-compressed payload to
`http://localhost:${proxyPort}/batch?compression=gzip-js` including header
`x-test-privacy: 'true'`, using the same `gzipSync` and `compressed` variables,
then assert `capturedUpstreamBody` is not null and that
`Buffer.compare(capturedUpstreamBody!, compressed) === 0`; place the test near
the other binary/compression tests so it exercises the `compression=gzip-js` +
privacy path.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/runtime/server/proxy-handler.tstest/e2e/first-party.test.tstest/fixtures/first-party/nuxt.config.tstest/unit/proxy-handler-binary.test.ts
- Detect ?compression=gzip-js (PostHog) so compressed bodies pass through raw even when privacy is enabled - Fix truthy checks on rawBody/body to avoid dropping falsy-but-valid payloads - Add test for gzip-js + privacy enabled scenario
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/runtime/server/proxy-handler.ts`:
- Around line 219-220: The code treats request bodies as Record<string, unknown>
which causes top-level JSON arrays to be coerced into object-like shapes; update
the handling around the variables body and rawBody and any calls to
stripPayloadFingerprinting so that Array.isArray(rawBody) or Array.isArray(body)
is detected and arrays are preserved (e.g., allow body to be typed as unknown |
unknown[] | Record<string, unknown>) and pass arrays through to
stripPayloadFingerprinting or handle them with a separate array-aware branch
instead of casting to Record<string, unknown>, ensuring
stripPayloadFingerprinting receives the original array shape.
- Around line 111-117: Normalize the Content-Type header before doing
case-sensitive substring checks: change how contentType is derived from
originalHeaders (e.g., const contentType = (originalHeaders['content-type'] ||
'').toLowerCase()) and then use that lowercased value in isBinaryBody and any
other checks (such as the later includes check around line 257) so matches like
'Octet-Stream' or mixed-case types are handled correctly; update all occurrences
where contentType.includes(...) is used to reference the normalized variable.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/runtime/server/proxy-handler.tstest/unit/proxy-handler-binary.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- test/unit/proxy-handler-binary.test.ts
| const contentType = originalHeaders['content-type'] || '' | ||
| const compressionParam = new URL(event.path, 'http://localhost').searchParams.get('compression') | ||
| const isBinaryBody = Boolean( | ||
| originalHeaders['content-encoding'] | ||
| || contentType.includes('octet-stream') | ||
| || (compressionParam && /gzip|deflate|br|compress/i.test(compressionParam)), | ||
| ) |
There was a problem hiding this comment.
Normalize Content-Type before matching.
At Line 115 and Line 257, includes(...) checks are case-sensitive. Content-Type is case-insensitive by spec, so mixed-case values can bypass binary/form handling and route through the wrong path.
💡 Suggested fix
- const contentType = originalHeaders['content-type'] || ''
+ const contentType = originalHeaders['content-type'] || ''
+ const normalizedContentType = contentType.toLowerCase()
@@
- || contentType.includes('octet-stream')
+ || normalizedContentType.includes('octet-stream')
@@
- else if (contentType.includes('application/x-www-form-urlencoded')) {
+ else if (normalizedContentType.includes('application/x-www-form-urlencoded')) {Also applies to: 257-257
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/runtime/server/proxy-handler.ts` around lines 111 - 117, Normalize the
Content-Type header before doing case-sensitive substring checks: change how
contentType is derived from originalHeaders (e.g., const contentType =
(originalHeaders['content-type'] || '').toLowerCase()) and then use that
lowercased value in isBinaryBody and any other checks (such as the later
includes check around line 257) so matches like 'Octet-Stream' or mixed-case
types are handled correctly; update all occurrences where
contentType.includes(...) is used to reference the normalized variable.
PostHog's config.js endpoint returns JSON but the SDK requests it as a script, causing a MIME type error in strict-mode browsers. This is a known third-party behavior, not a proxy rewrite bug.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/first-party.test.ts`:
- Line 947: The test currently unconditionally suppresses the regex "/MIME type
.* is not executable/i", which hides real proxy/content-type regressions; change
the logic that builds the ignored-errors list (where "/MIME type .* is not
executable/i" is added) so that this pattern is only applied for the PostHog
config.js case — detect the PostHog provider or request path (e.g., provider
name 'posthog' or URL ending in "config.js") and add the MIME-ignore only in
that branch; leave all other providers/tests to surface the MIME execution error
normally.
Only suppress the MIME-not-executable error when the URL contains config.js, so real MIME type regressions on other providers still surface.
Arrays passed to stripPayloadFingerprinting were coerced into objects
({"0": ..., "1": ...}). Now arrays are detected and each element is
stripped individually, preserving the original array shape.
4xx responses from upstream APIs are expected with fake test API keys (e.g. PostHog returning 404/401 for phc_test). Only 5xx indicates actual proxy infrastructure failures.
Use the same real API key from the basic fixture so upstream returns valid responses. Scope failed proxy request assertions to only check the current provider's paths to avoid cross-provider noise from globally-registered scripts.
src/runtime/server/proxy-handler.ts
Outdated
| const isBinaryBody = Boolean( | ||
| originalHeaders['content-encoding'] | ||
| || contentType.includes('octet-stream') | ||
| || (compressionParam && /gzip|deflate|br|compress/i.test(compressionParam)), |
There was a problem hiding this comment.
They also use base64, see: https://posthog.com/docs/references/posthog-js-1.285.0/types/Compression
- Add base64 to compression query param regex (PostHog uses it) - Replace placeholder IDs with real ones from playground across plausible, cloudflare, rybbit, fathom, intercom, and crisp
Every console error, uncaught exception, and failed proxy request is now treated as critical. No noise lists, no per-provider scoping — if it errors, the test fails.
…ider lists
- Use [input, { trigger: 'manual' }] registry format with runtimeConfig
- Remove all noise filtering from test assertions
- Fix Nitro publicAssets leading-dash filename bug
- Add PostHog to error check and bundle coverage provider lists
- Update snapshots
- Reddit pixel ID → a2_ilz4u0kbdr3v - Rybbit siteId → 874 (was analyticsId which doesn't exist) - Umami websiteId → ae15c227-67e8-434a-831f-67e6df88bd6c - Add pixel-config.reddit.com to reddit proxy config (CORS fix) - Update page components to match config keys
🔗 Linked issue
Resolves #618
❓ Type of change
📚 Description
The first-party proxy handler unconditionally read request bodies via h3's
readBody(), which decodes binary bytes as UTF-8 text. This corrupted compressed payloads (e.g. PostHog withcompression=gzip-js) and caused repeated upstream timeouts (504).When no privacy transforms are needed (
anyPrivacy=false) or the body is binary/compressed (content-encodingheader orapplication/octet-stream), the raw request stream is now piped directly to upstream viagetRequestWebStream()— zero buffering, zero re-encoding. Privacy stripping still usesreadBody()only for text bodies that actually need transforms.Also adds PostHog to the first-party E2E test fixture and a new unit test suite (
proxy-handler-binary.test.ts) covering gzip, binary, and passthrough scenarios.