You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The chunk-upload code path double-compresses: (1) DEFLATE inside the artifact-bundle ZIP, (2) zstd/gzip on each wire chunk. With zstd now in the protocol (#823), the ZIP-level DEFLATE adds significant CPU cost while contributing essentially nothing to wire size — zstd extracts the same redundancy in a fraction of the time.
Switching the ZIP to STORED (compression=zipfile.ZIP_STORED) and letting per-chunk zstd L3 do all the compression is strictly better on every axis I measured: less CPU, less wire, simpler code, no protocol change.
Measurements
Three real-world payload shapes, comparing the current architecture (A) against STORED + zstd (B):
Payload
Current (DEFLATE + zstd)
Proposed (STORED + zstd)
CPU saved
Wire saved
CLI binary (3.2 MB JS + 11.2 MB map)
717 ms / 3,797,107 B
177 ms / 3,591,491 B
−540 ms (75%)
−205 KB (5.4%)
Docs site (5 pairs, 2.77 MiB)
173 ms / 789,228 B
56 ms / 784,954 B
−117 ms (67%)
−4 KB (0.5%)
Synthetic JS+map (10 MiB)
486 ms / 7,686,289 B
134 ms / 7,626,670 B
−352 ms (72%)
−60 KB (0.8%)
Methodology: rebuilt the actual artifact-bundle ZIPs from real source files, then chunked at the server-advertised 8 MiB and compressed each chunk per the wire codec. Times are encode-only (decompression measured separately in AGENTS.md lore: zstd L3 ~13 ms vs gzip L6 ~22 ms on equivalent server-side workload).
Why the wire size barely changes
The current architecture's ZIP-level DEFLATE L6 already extracts ~93% of the redundancy from bin.js + bin.js.map:
RAW total 14,028,274 bytes 100.0%
DEFLATE L6 inside ZIP 3,797,010 bytes 27.1% ← what the ZIP becomes
gzip L6 on top 3,797,267 bytes 27.1% ← essentially unchanged
zstd L3 on top 3,797,107 bytes 27.1% ← essentially unchanged
The per-chunk wire codec has near-zero work to do because DEFLATE has already done it. Switch to STORED and the wire codec does the actual work:
RAW total 14,028,274 bytes 100.0%
zstd L3 on raw chunks 3,591,491 bytes 25.6% ← strictly smaller
The 5.4% savings on the CLI payload is consistent with the prior AGENTS.md benchmark ("zstd L3 vs gzip L6 on real 8 MiB sourcemap chunks: ~5% smaller"). That benchmark was on uncompressed input — it described the codec's behavior in isolation, not the architecture's behavior end-to-end.
Why CPU drops 67–75%
DEFLATE L6 is doing real work compressing the ZIP entries on the encode side; that work is wasted because zstd then operates on already-compressed bytes (which also takes time, even though the output is essentially the same). Skipping the DEFLATE pass eliminates the wasted work. Total wall-clock CPU for the CLI binary case: 717 ms → 177 ms.
Server side
Sentry's ArtifactBundle reader uses zipfile.ZipFile(fileobj) (in src/sentry/models/artifactbundle.py), which transparently handles both DEFLATE and STORED entries. No protocol change. STORED entries are also marginally cheaper to read on the server (skip the per-entry DEFLATE decompress on lookup).
Rollout considerations
Pre-zstd self-hosted servers (gzip-only): STORED + gzip L6 gives identical wire size to the current architecture, just shifts the CPU work from the encode side to the wire side. Net: break-even on those servers; no regression.
Concurrency: the larger STORED bundle (14 MiB vs 3.6 MiB) splits into more 8 MiB chunks (2 vs 1 for the CLI binary). Servers advertise concurrency ≥ 8, so parallel uploads compensate. No latency penalty expected.
Chunk dedup: STORED chunks contain raw bytes; DEFLATE chunks contain compressed bytes. Both reflow on tiny edits to the source, so dedup behavior is comparable.
Memory: ZipWriter currently holds one entry's compressed output in memory at a time. STORED removes the DEFLATE buffer, reducing peak memory. Strictly better.
Proposed implementation
Add compress: boolean (default false post-zstd, gated on a flag while gathering data) to ZipWriter.addEntry(), or simpler — add a static factory ZipWriter.createStored() that hardcodes STORED. Update buildArtifactBundle() in src/lib/api/sourcemaps.ts to use the STORED variant.
I'm happy to follow up with a PR. Filing as an issue first because:
The numbers come from local benchmarks; want a sanity check on whether anyone sees a payload shape where DEFLATE inside the ZIP is doing real work that zstd doesn't replicate.
May want a small protocol kill-switch (SENTRY_CHUNK_UPLOAD_FORCE_DEFLATE_ZIP=1) for early debugging if a server somewhere chokes on STORED entries.
Summary
The chunk-upload code path double-compresses: (1) DEFLATE inside the artifact-bundle ZIP, (2) zstd/gzip on each wire chunk. With zstd now in the protocol (#823), the ZIP-level DEFLATE adds significant CPU cost while contributing essentially nothing to wire size — zstd extracts the same redundancy in a fraction of the time.
Switching the ZIP to STORED (
compression=zipfile.ZIP_STORED) and letting per-chunk zstd L3 do all the compression is strictly better on every axis I measured: less CPU, less wire, simpler code, no protocol change.Measurements
Three real-world payload shapes, comparing the current architecture (A) against STORED + zstd (B):
Methodology: rebuilt the actual artifact-bundle ZIPs from real source files, then chunked at the server-advertised 8 MiB and compressed each chunk per the wire codec. Times are encode-only (decompression measured separately in AGENTS.md lore: zstd L3 ~13 ms vs gzip L6 ~22 ms on equivalent server-side workload).
Why the wire size barely changes
The current architecture's ZIP-level DEFLATE L6 already extracts ~93% of the redundancy from
bin.js+bin.js.map:The per-chunk wire codec has near-zero work to do because DEFLATE has already done it. Switch to STORED and the wire codec does the actual work:
The 5.4% savings on the CLI payload is consistent with the prior AGENTS.md benchmark ("zstd L3 vs gzip L6 on real 8 MiB sourcemap chunks: ~5% smaller"). That benchmark was on uncompressed input — it described the codec's behavior in isolation, not the architecture's behavior end-to-end.
Why CPU drops 67–75%
DEFLATE L6 is doing real work compressing the ZIP entries on the encode side; that work is wasted because zstd then operates on already-compressed bytes (which also takes time, even though the output is essentially the same). Skipping the DEFLATE pass eliminates the wasted work. Total wall-clock CPU for the CLI binary case: 717 ms → 177 ms.
Server side
Sentry's
ArtifactBundlereader useszipfile.ZipFile(fileobj)(insrc/sentry/models/artifactbundle.py), which transparently handles both DEFLATE and STORED entries. No protocol change. STORED entries are also marginally cheaper to read on the server (skip the per-entry DEFLATE decompress on lookup).Rollout considerations
Proposed implementation
Add
compress: boolean(defaultfalsepost-zstd, gated on a flag while gathering data) toZipWriter.addEntry(), or simpler — add a static factoryZipWriter.createStored()that hardcodes STORED. UpdatebuildArtifactBundle()insrc/lib/api/sourcemaps.tsto use the STORED variant.I'm happy to follow up with a PR. Filing as an issue first because:
SENTRY_CHUNK_UPLOAD_FORCE_DEFLATE_ZIP=1) for early debugging if a server somewhere chokes on STORED entries.Related