Skip to content

fix(preprod): Reduce snapshot download concurrency to prevent stream failures#116267

Merged
NicoHinderling merged 1 commit into
masterfrom
nico/ref/snapshot-download-tune-concurrency
May 27, 2026
Merged

fix(preprod): Reduce snapshot download concurrency to prevent stream failures#116267
NicoHinderling merged 1 commit into
masterfrom
nico/ref/snapshot-download-tune-concurrency

Conversation

@NicoHinderling
Copy link
Copy Markdown
Contributor

Reduce FETCH_MAX_WORKERS from 32 → 16 and FETCH_BATCH_SIZE from 200 → 100 in the snapshot image download endpoint.

Large snapshots (~40K images) cause the streaming download to fail after ~15 seconds with an HTTP/2 INTERNAL_ERROR. The server-side logs show:

  • 2.37 GB yielded at ~157 MB/s throughput
  • 906 MB RSS (well past the 600 MB reload_on_rss threshold)
  • Only 3,947 / ~40,000 images delivered before client_disconnect
  • 0 fetch failures — the objectstore reads are fine, the transport layer is the bottleneck

The high concurrency causes Python memory fragmentation (32 threads churning through ~600 KB images) that pushes RSS past container limits, and the throughput overwhelms HTTP/2 flow control. Halving both constants reduces memory pressure and data rate while keeping each batch well within the 90s proxy timeout.

…failures

Large snapshots (~40K images) cause the download stream to die after ~15s
with an HTTP/2 INTERNAL_ERROR. At 32 workers / 200 batch, the endpoint
produces ~157 MB/s and pushes RSS to ~906 MB (past the 600 MB
reload_on_rss threshold), overwhelming the HTTP/2 transport layer.

Halving to 16 workers / 100 batch reduces both throughput and memory
pressure while keeping each batch well within the 90s proxy timeout.
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 26, 2026
@NicoHinderling NicoHinderling marked this pull request as ready for review May 26, 2026 23:49
@NicoHinderling NicoHinderling requested a review from a team as a code owner May 26, 2026 23:49
@NicoHinderling NicoHinderling enabled auto-merge (squash) May 26, 2026 23:57
@NicoHinderling NicoHinderling merged commit d74a8bc into master May 27, 2026
63 checks passed
@NicoHinderling NicoHinderling deleted the nico/ref/snapshot-download-tune-concurrency branch May 27, 2026 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants