perf(sourcemap-upload): switch artifact-bundle ZIP to STORED to let per-chunk zstd actually compress (75% CPU saved, 5% wire saved)

## Summary

The chunk-upload code path double-compresses: (1) DEFLATE inside the artifact-bundle ZIP, (2) zstd/gzip on each wire chunk. With zstd now in the protocol (#823), the ZIP-level DEFLATE adds significant CPU cost while contributing essentially nothing to wire size — zstd extracts the same redundancy in a fraction of the time.

Switching the ZIP to STORED (`compression=zipfile.ZIP_STORED`) and letting per-chunk zstd L3 do all the compression is **strictly better on every axis I measured**: less CPU, less wire, simpler code, no protocol change.

## Measurements

Three real-world payload shapes, comparing the current architecture (A) against STORED + zstd (B):

| Payload | Current (DEFLATE + zstd) | Proposed (STORED + zstd) | CPU saved | Wire saved |
|---------|-------------------------|---------------------------|-----------|------------|
| CLI binary (3.2 MB JS + 11.2 MB map) | 717 ms / 3,797,107 B | 177 ms / 3,591,491 B | **−540 ms (75%)** | **−205 KB (5.4%)** |
| Docs site (5 pairs, 2.77 MiB) | 173 ms / 789,228 B | 56 ms / 784,954 B | **−117 ms (67%)** | −4 KB (0.5%) |
| Synthetic JS+map (10 MiB) | 486 ms / 7,686,289 B | 134 ms / 7,626,670 B | **−352 ms (72%)** | −60 KB (0.8%) |

Methodology: rebuilt the actual artifact-bundle ZIPs from real source files, then chunked at the server-advertised 8 MiB and compressed each chunk per the wire codec. Times are encode-only (decompression measured separately in AGENTS.md lore: zstd L3 ~13 ms vs gzip L6 ~22 ms on equivalent server-side workload).

## Why the wire size barely changes

The current architecture's ZIP-level DEFLATE L6 already extracts ~93% of the redundancy from `bin.js` + `bin.js.map`:

```
RAW total                14,028,274 bytes  100.0%
DEFLATE L6 inside ZIP     3,797,010 bytes   27.1%   ← what the ZIP becomes
gzip L6 on top            3,797,267 bytes   27.1%   ← essentially unchanged
zstd L3 on top            3,797,107 bytes   27.1%   ← essentially unchanged
```

The per-chunk wire codec has near-zero work to do because DEFLATE has already done it. Switch to STORED and the wire codec does the actual work:

```
RAW total                14,028,274 bytes  100.0%
zstd L3 on raw chunks     3,591,491 bytes   25.6%   ← strictly smaller
```

The 5.4% savings on the CLI payload is consistent with the prior AGENTS.md benchmark ("zstd L3 vs gzip L6 on real 8 MiB sourcemap chunks: ~5% smaller"). That benchmark was on uncompressed input — it described the codec's behavior in isolation, not the architecture's behavior end-to-end.

## Why CPU drops 67–75%

DEFLATE L6 is doing real work compressing the ZIP entries on the encode side; that work is wasted because zstd then operates on already-compressed bytes (which also takes time, even though the output is essentially the same). Skipping the DEFLATE pass eliminates the wasted work. Total wall-clock CPU for the CLI binary case: 717 ms → 177 ms.

## Server side

Sentry's `ArtifactBundle` reader uses `zipfile.ZipFile(fileobj)` (in `src/sentry/models/artifactbundle.py`), which transparently handles both DEFLATE and STORED entries. No protocol change. STORED entries are also marginally cheaper to read on the server (skip the per-entry DEFLATE decompress on lookup).

## Rollout considerations

- **Pre-zstd self-hosted servers (gzip-only)**: STORED + gzip L6 gives identical wire size to the current architecture, just shifts the CPU work from the encode side to the wire side. Net: break-even on those servers; no regression.
- **Concurrency**: the larger STORED bundle (14 MiB vs 3.6 MiB) splits into more 8 MiB chunks (2 vs 1 for the CLI binary). Servers advertise concurrency ≥ 8, so parallel uploads compensate. No latency penalty expected.
- **Chunk dedup**: STORED chunks contain raw bytes; DEFLATE chunks contain compressed bytes. Both reflow on tiny edits to the source, so dedup behavior is comparable.
- **Memory**: ZipWriter currently holds one entry's compressed output in memory at a time. STORED removes the DEFLATE buffer, reducing peak memory. Strictly better.

## Proposed implementation

Add `compress: boolean` (default `false` post-zstd, gated on a flag while gathering data) to `ZipWriter.addEntry()`, or simpler — add a static factory `ZipWriter.createStored()` that hardcodes STORED. Update `buildArtifactBundle()` in `src/lib/api/sourcemaps.ts` to use the STORED variant.

I'm happy to follow up with a PR. Filing as an issue first because:

1. The numbers come from local benchmarks; want a sanity check on whether anyone sees a payload shape where DEFLATE inside the ZIP is doing real work that zstd doesn't replicate.
2. May want a small protocol kill-switch (`SENTRY_CHUNK_UPLOAD_FORCE_DEFLATE_ZIP=1`) for early debugging if a server somewhere chokes on STORED entries.

## Related

- #823 — zstd compression for chunk uploads (this issue is a follow-up: the codec lands the capability; this issue makes it actually pay off)
- AGENTS.md lore entry "Chunk-upload compression level choices (CLI side)" — the prior benchmark, on isolated uncompressed input, missed that the architecture pre-compresses


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(sourcemap-upload): switch artifact-bundle ZIP to STORED to let per-chunk zstd actually compress (75% CPU saved, 5% wire saved) #847

Summary

Measurements

Why the wire size barely changes

Why CPU drops 67–75%

Server side

Rollout considerations

Proposed implementation

Related

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Payload	Current (DEFLATE + zstd)	Proposed (STORED + zstd)	CPU saved	Wire saved
CLI binary (3.2 MB JS + 11.2 MB map)	717 ms / 3,797,107 B	177 ms / 3,591,491 B	−540 ms (75%)	−205 KB (5.4%)
Docs site (5 pairs, 2.77 MiB)	173 ms / 789,228 B	56 ms / 784,954 B	−117 ms (67%)	−4 KB (0.5%)
Synthetic JS+map (10 MiB)	486 ms / 7,686,289 B	134 ms / 7,626,670 B	−352 ms (72%)	−60 KB (0.8%)

Uh oh!

perf(sourcemap-upload): switch artifact-bundle ZIP to STORED to let per-chunk zstd actually compress (75% CPU saved, 5% wire saved) #847

Description

Summary

Measurements

Why the wire size barely changes

Why CPU drops 67–75%

Server side

Rollout considerations

Proposed implementation

Related

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions