feat: size-based download integrity + idle timeout by roziscoding · Pull Request #22 · roziscoding/jack

roziscoding · 2026-06-07T08:00:11Z

Summary

Hardens peer downloads against silent corruption and stalls. Targets
feat/qbittorrent-api-emulation (stacked on top of PR #21), not main.

Phase 1 — Size-based integrity

Resolve the expected size from the transfer header (Content-Length /
Content-Range total) falling back to the *arr releaseSize, so a peer that
doesn't advertise a size no longer yields expectedBytes: null.
The final downloadedBytes === expectedBytes check is now unconditional — a
truncated stream is no longer renamed into place and imported as complete.
Fail-fast with UnknownSizeError (permanent, non-retried) when no size is
known at all, before any byte is written.
Adds release_size as an ExpectedBytesSource (schema + migration 0003) for
accurate telemetry.
Resume edge cases via releaseSize: a .part larger than the file is discarded
and restarted (part_oversize); a .part already equal to the file is finalized
without re-downloading.

Phase 2 — Idle timeout

Replaces the 30-min whole-request AbortSignal.timeout with an inactivity
timeout (downloads.idleTimeoutMs, default 60s), armed only around the fetch and
each read so slow disk I/O never trips it.
The abort carries a sentinel reason so only it — not a later real error — becomes
a retryable IdleTimeoutError; the .part is preserved so retry resumes.
Reader is cancelled on error; the body is cancelled if the .part can't be opened.

Why not "Part 2" of the original design

The brainstorm proposed making the peer send Content-Length, but research found
the peer already does (peer.router.ts). The expectedBytes: null seen in
prod was an older peer / a stripped header — handled by the releaseSize fallback.
Content hashing was considered and dropped (HTTPS covers transport integrity;
size-only is the agreed bar).

Testing

bun test apps/backend/src/ — 172 pass; lint clean.
New/updated: size fallback, fail-fast, truncation, resume oversize/exact-finalize,
idle stall + slow-but-active, release_size DB round-trip, retry-policy
classification (IdleTimeoutError retryable, plain AbortError not).

Greptile Summary

Hardens peer downloads with two independent layers: size-based integrity (fail-fast on unknown size, unconditional truncation check, releaseSize fallback, and pre-fetch oversize/exact-finalize resume guards) and per-chunk inactivity timeout (replacing the old whole-request AbortSignal.timeout with an armIdle/clearIdle pattern so disk I/O doesn't count as idle). Two stacked SQLite migrations add release_size as a valid expected_bytes_source and then content_range, keeping the DB schema in lockstep.

Size integrity: transferSize (from Content-Range or Content-Length) now falls back to releaseSize, and the downloadedBytes === expectedBytes check is unconditional; UnknownSizeError fires before any bytes are written if no size source exists.
Idle timeout: A single AbortController is armed only during fetch() and each reader.read(), cleared immediately after, so slow disk writes never trip it; IdleTimeoutError (retryable, .part preserved) is distinguished from a plain AbortError (permanent) via a sentinel abort reason.
Resume guards: A .part larger than releaseSize is discarded before the request; one already equal is finalized without a network round-trip, emitting the expected headers+completed events for DB persistence.

Confidence Score: 5/5

The change is safe to merge — the download integrity and timeout logic are well-structured, thoroughly tested, and both new error types are correctly wired into the retry policy.

The idle-timeout design (per-chunk armIdle/clearIdle with a sentinel abort reason) correctly isolates network stalls from disk I/O, and the unconditional size check closes the silent-truncation gap. The two stacked migrations sequence correctly since both are applied before the service starts. The only findings are minor observability gaps (original errors not forwarded as cause to IdleTimeoutError), with no impact on correctness or retry behavior.

apps/backend/src/lib/servers/peer.ts — the idleTimeout() factory in both catch sites drops the original error rather than chaining it via cause.

Important Files Changed

Filename	Overview
apps/backend/src/lib/servers/peer.ts	Core download logic — adds idle timeout with armIdle/clearIdle per read, UnknownSizeError fail-fast, part_oversize resume guard, and moves releaseLock after completeness check. Logic is sound; the idleTimeout() factory doesn't chain the original error as cause.
apps/backend/src/modules/downloads/retry-policy.ts	Adds IdleTimeoutError as retryable; UnknownSizeError is correctly left out (permanent failure). Plain AbortError falls through to false as intended.
apps/backend/src/modules/downloads/downloads.service.ts	Threads idleTimeoutMs from config into downloadFile options. One-line addition, straightforward.
apps/backend/drizzle/0003_high_makkari.sql	Adds release_size column and CHECK constraint for content_length/release_size. content_range added in the follow-on 0004 migration; sequencing is correct since both migrations run before the service starts.
apps/backend/drizzle/0004_sleepy_retro_girl.sql	Expands the expected_bytes_source CHECK to include content_range, matching the final schema.ts definition.
apps/backend/src/database/schema.ts	ExpectedBytesSource type and CHECK constraint expanded to include content_range and release_size, consistent with 0004 migration.
apps/backend/src/tests/peer-download.test.ts	Comprehensive new tests for fail-fast, idle stall, slow-but-active, part_oversize, exact-finalize, and content_range source attribution. Coverage matches all new code paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[downloadFile called] --> B{existingBytes > 0 AND releaseSize known?}
    B -->|yes, part > releaseSize| C[unlink part → emitRestart part_oversize → existingBytes=0]
    B -->|yes, part == releaseSize| D[rename part→dest\nemit headers+completed\nreturn]
    B -->|no / unknown size| E[doFetch with idle timer]
    C --> E

    E --> F{response status}
    F -->|206 valid| G[resuming=true]
    F -->|206 invalid / 416| H[restartFresh → doFetch fresh]
    F -->|200 ok| I[resuming=false]
    H --> I

    G --> J[Parse Content-Range total as transferSize]
    I --> K[Parse Content-Length as transferSize]

    J & K --> L{transferSize != null?}
    L -->|yes| M[expectedBytes = transferSize\nexpectedBytesSource = content_range or content_length]
    L -->|no, releaseSize known| N[expectedBytes = releaseSize\nexpectedBytesSource = release_size]
    L -->|no, no releaseSize| O[cancel body → throw UnknownSizeError permanent]

    M & N --> P[Read loop\narmIdle per chunk\nclearIdle after each read]
    P -->|chunk received| Q[write to .part\nupdate progress]
    Q --> P
    P -->|idle timer fires| R[AbortController aborts\nreader.read rejects AbortError\ncatch: throw IdleTimeoutError retryable]
    P -->|done=true| S{downloadedBytes == expectedBytes?}
    S -->|no| T[throw IncompleteDownloadError retryable\ncatch: reader.cancel → releaseLock]
    S -->|yes| U[releaseLock → rename part→dest → emit completed]

_{Reviews (2): Last reviewed commit: "fix: address review — content_range sour..." | Re-trigger Greptile}

Resolve the expected size from the transfer header (Content-Length / Content-Range total) falling back to the *arr release size, and make the final size check unconditional so a truncated stream is no longer renamed into place and imported as complete. When no size is known at all, fail fast with UnknownSizeError before writing any bytes. Adds 'release_size' as an ExpectedBytesSource (schema + migration 0003) for accurate telemetry, and handles resume edge cases via releaseSize: a .part larger than the file is discarded and restarted; a .part already equal to the file is finalized without re-downloading.

The 30-min whole-request deadline both killed large active downloads and was slow to notice a stalled connection. Replace it with an inactivity timeout (downloads.idleTimeoutMs, default 60s) armed only around the fetch and each read, so slow disk writes never trip it. The abort carries a sentinel reason so only it — not a later real error — is reclassified as a retryable IdleTimeoutError; the .part is preserved so retry resumes. Also tears down the reader (cancel) on error and cancels the body if the .part file can't be opened.

- Label a 206 resume's size as 'content_range' (not 'content_length') for accurate telemetry; adds it to ExpectedBytesSource (migration 0004). - Release the stream reader only after the completeness check, so the error path's cancel+releaseLock runs against a still-locked reader on the IncompleteDownloadError (retried) path instead of being a swallowed no-op.

Squash of feat/download-integrity-monitoring onto the size-based integrity / idle-timeout work already landed via #22, so this adds only the net-new changes on top of that branch: Primary fix — failed qB downloads no longer silently report 'ok': addTorrent ignored startQbDownload's result, so an unknown peer or an unsafe peer-supplied filename created no row and was invisible to both jack and *arr, which waited out its full stuck-download grace period. startQbDownload now returns 'started' | 'duplicate' | 'failed'; 'failed' propagates HTTP 503 so *arr retries promptly, while an already-active 'duplicate' stays a success so a re-add of an in-flight release is not turned into a needless retry. Also: bump TypeScript to 6 and resolve the resulting type errors (getReader cast, fetch-spy typings, tsconfig types: "bun").

…adarr (#21) * feat: emulate qBittorrent auth + app endpoints Add /api/v2 routes for qBittorrent WebUI API emulation so Sonarr/Radarr can register jack as a qBittorrent download client. Mounts before the global requireApiKey middleware with its own SID-cookie auth. - POST /auth/login validates username against configured server names - GET /app/{version,webapiVersion,preferences} return contract values - GET /torrents/{info,categories} return empty/stub for connection test - Stub infohash helper for Phase 2 torrent mapping * feat: map jack downloads to qBittorrent torrents Add qb_category/qb_source_server columns (migration 0002) and the download→qBittorrent-torrent mapper so Sonarr/Radarr see real progress. - toQbTorrent: real stub infohash, pausedUP on completion, content_path != save_path, byte-based progress - torrents/info filters by category + hashes - torrents/properties (404 unknown) and torrents/files - chore: untrack accidentally-committed .claude lock + e2e sqlite db * feat: qBittorrent add/delete/setCategory with *arr-pull import switch Add the qBittorrent write API and switch qB-added downloads to *arr-pull imports (blackhole-added downloads keep the existing push). - DownloadsService: shared createDownload core + startQbDownload; skip triggerImport when qbSourceServer is set (per-download pull switch) - repository delete/setQbCategory - torrents/add accepts only jack stubs / jack download URLs (rejects magnets and foreign torrents); delete/setCategory scoped to the session - wire a single DownloadsService through getApp + the blackhole watcher * feat: auto-register jack as a qBittorrent download client Replace the TorrentBlackhole auto-registration with a qBittorrent client pointing at ${jack.baseUrl}/api/v2, keeping the indexer↔client binding so only the Jack client grabs the Jack indexer's releases. - registerDownloadClient builds a QBittorrentSettings client (host/port/ useSsl/urlBase/username/password + per-app category field); matches an existing "Jack" client by name to upgrade a prior blackhole client - best-effort setShareLimits/topPrio/setForceStart no-ops - e2e qBittorrent-flow test (login → add → poll → import) * refactor: clearer qB add failure + session sweep - torrents/add returns 503 (not 415) when the download pipeline is unconfigured (no downloads config) — a server misconfig, not a bad torrent; 415 stays for magnets/foreign torrents - QbSessionStore sweeps expired sessions on login so abandoned SIDs don't accumulate (get() only evicted lazily) * test(e2e): update auto-registration for qBittorrent client; drop blackhole flow - auto-registration e2e now asserts the qBittorrent download client (name "Jack", implementation QBittorrent, host/port = jack-beta) - remove download-flow.test.ts: the blackhole-via-stack flow is superseded by qbittorrent-flow.test.ts (real *arr integration) and conflicted with it on the shared fixture; blackhole watcher logic stays covered by downloads-service unit tests * refactor: remove blackhole watcher; jack is qBittorrent-only now *arr hands grabs to jack via POST /api/v2/torrents/add (parsed in-memory), so nothing writes to a watch folder anymore. Drop the folder-watching path entirely for a pure qB-driven, *arr-pull download service. - delete BlackholeWatcher; remove DownloadsService.processTorrentFile, the reenqueue bookkeeping, and the stub-file cleanup - remove the now-dead push-import path: DownloadsService.triggerImport + the destinations dependency, and ArrServerConnector.triggerImport / importCommandName (qB downloads import via *arr pull) - drop the watchPath config option (completedPath stays) - stop constructing/starting the watcher in index.ts (resume-on-startup and stale-row reconciliation are retained) - update tests, e2e setup/compose, and example config accordingly * fix: address review — refresh qB session TTL on access, gate qB mount on downloads config - QbSessionStore.get() now extends expiry on each successful access so an actively-polling *arr SID never expires mid-session (matches real qB) - mount the /api/v2 qB API only when config.downloads is set, so the reported save_path is always the real completedPath (never "") * chore(e2e): drop unused blackhole-watch volume dir from e2e task * fix: always register the Jack indexer + qB client (forceSave) The Jack indexer was registered unbound: registerDownloadClient ran with forceSave=false, so *arr's connection test to jack's qB API at registration time could fail and throw, leaving downloadClientId undefined — the indexer then registered with no Download Client ("Any"). - register both the indexer and the qBittorrent client with forceSave=true, so the client always saves (returns its id → indexer gets bound) and the indexer saves even when its test query returns no results - always run auto-registration (drop the no-peers skip); with forceSave the indexer is accepted with an empty catalog and starts working once peers exist - e2e: assert the Jack indexer's downloadClientId equals the Jack qB client id (regression guard for the binding) Verified against a real Radarr: in-place TorrentBlackhole→QBittorrent PUT and top-level downloadClientId binding both persist on create and update. * feat: tag auto-registered qBittorrent download client *arr round-robins torrent grabs across all enabled torrent clients, so real torrents from other indexers could be routed to jack's qB client (which rejects them). Tag the client with a configurable label (default "jack-internal") that no media carries, so *arr keeps those grabs out of it; Jack grabs still arrive via the indexer's explicit downloadClientId binding, which bypasses tag filtering. Adds autoregister.tag to the config (set "" to disable) and an ensureTag helper that creates the tag in each destination if missing. * docs: align README and compose examples with qBittorrent-only flow Rewrite the download flow, mounts, and troubleshooting for the qBittorrent API model (no watch folder); add commented depends_on service_healthy examples and document autoregister.tag. * docs: clarify ETA_UNKNOWN sentinel comment * fix: isolate Jack download client via lowest priority, not a tag The tag approach was broken: Radarr/Sonarr filter the indexer-bound download client by the *movie's* tags too (DownloadClientProvider builds the tag-filtered pool before resolving the indexer's DownloadClientId), so a "jack-internal" tag no movie carries made every grab fail with "Indexer specified download client does not exist". Instead, register the Jack qBittorrent client at *arr's lowest priority (50). The general client pool only round-robins the best-priority group, so real torrents from other indexers never land on Jack — while the indexer→client binding still routes Jack grabs to it (resolved before priority grouping). Tags are explicitly cleared so an upgrade heals a client previously broken by the tag. Reverts the autoregister.tag config/ensureTag added earlier. * feat: size-based download integrity + idle timeout (#22) * feat: verify peer downloads by size with fail-fast on unknown size Resolve the expected size from the transfer header (Content-Length / Content-Range total) falling back to the *arr release size, and make the final size check unconditional so a truncated stream is no longer renamed into place and imported as complete. When no size is known at all, fail fast with UnknownSizeError before writing any bytes. Adds 'release_size' as an ExpectedBytesSource (schema + migration 0003) for accurate telemetry, and handles resume edge cases via releaseSize: a .part larger than the file is discarded and restarted; a .part already equal to the file is finalized without re-downloading. * feat: replace total download timeout with a network-scoped idle timeout The 30-min whole-request deadline both killed large active downloads and was slow to notice a stalled connection. Replace it with an inactivity timeout (downloads.idleTimeoutMs, default 60s) armed only around the fetch and each read, so slow disk writes never trip it. The abort carries a sentinel reason so only it — not a later real error — is reclassified as a retryable IdleTimeoutError; the .part is preserved so retry resumes. Also tears down the reader (cancel) on error and cancels the body if the .part file can't be opened. * fix: address review — content_range source label + reader cleanup order - Label a 206 resume's size as 'content_range' (not 'content_length') for accurate telemetry; adds it to ExpectedBytesSource (migration 0004). - Release the stream reader only after the completeness check, so the error path's cancel+releaseLock runs against a still-locked reader on the IncompleteDownloadError (retried) path instead of being a swallowed no-op. * fix: surface failed qB downloads to *arr instead of silent 'ok' Squash of feat/download-integrity-monitoring onto the size-based integrity / idle-timeout work already landed via #22, so this adds only the net-new changes on top of that branch: Primary fix — failed qB downloads no longer silently report 'ok': addTorrent ignored startQbDownload's result, so an unknown peer or an unsafe peer-supplied filename created no row and was invisible to both jack and *arr, which waited out its full stuck-download grace period. startQbDownload now returns 'started' | 'duplicate' | 'failed'; 'failed' propagates HTTP 503 so *arr retries promptly, while an already-active 'duplicate' stays a success so a re-add of an in-flight release is not turned into a needless retry. Also: bump TypeScript to 6 and resolve the resulting type errors (getReader cast, fetch-spy typings, tsconfig types: "bun"). * fix: serve peer files with native backpressure to prevent OOM Serving a peer file via a hand-pumped Bun.file().stream() does not apply TCP backpressure: when the consumer stalls or aborts mid-download, the origin reads the entire file into memory as fast as it can, starving the event loop and exhausting RAM. On a swapless host a single stalled stream of a large file is enough to hard-freeze the process until the kernel OOM-kills it. Return the BunFile (or its sliced view) directly from streamFile and hand it to new Response, so Bun.serve streams it with native backpressure and sendfile. Verified: peak RSS under a stalled consumer drops from ~2x the file size to flat, while range serving stays byte-correct. * feat: widen peer download retry backoff to survive a peer outage A peer can be unreachable for ~15-20min (restart, tunnel hiccup). The old schedule (5 attempts, 60s max backoff) gave up in ~3min, terminally failing a resumable download long before the peer recovered. Raise maxDownloadAttempts to 13 and retryMaxDelayMs to 30min so the exponential schedule spans a worst-case ~64min window (~32min with jitter); early retries stay ~1s for ordinary network blips. The download keeps retrying and resumes from its .part once the peer is back.

roziscoding added 2 commits June 7, 2026 02:27

greptile-apps Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread apps/backend/src/lib/servers/peer.ts Outdated

Comment thread apps/backend/src/lib/servers/peer.ts

roziscoding merged commit 4bc6eaf into feat/qbittorrent-api-emulation Jun 7, 2026
8 of 10 checks passed

roziscoding deleted the feat/download-integrity-monitoring branch June 7, 2026 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: size-based download integrity + idle timeout#22

feat: size-based download integrity + idle timeout#22
roziscoding merged 3 commits into
feat/qbittorrent-api-emulationfrom
feat/download-integrity-monitoring

roziscoding commented Jun 7, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

roziscoding commented Jun 7, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1 — Size-based integrity

Phase 2 — Idle timeout

Why not "Part 2" of the original design

Testing

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

roziscoding commented Jun 7, 2026 •

edited by greptile-apps Bot

Loading