Skip to content

feat: size-based download integrity + idle timeout#22

Merged
roziscoding merged 3 commits into
feat/qbittorrent-api-emulationfrom
feat/download-integrity-monitoring
Jun 7, 2026
Merged

feat: size-based download integrity + idle timeout#22
roziscoding merged 3 commits into
feat/qbittorrent-api-emulationfrom
feat/download-integrity-monitoring

Conversation

@roziscoding

@roziscoding roziscoding commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Summary

Hardens peer downloads against silent corruption and stalls. Targets
feat/qbittorrent-api-emulation (stacked on top of PR #21), not main.

Phase 1 — Size-based integrity

  • Resolve the expected size from the transfer header (Content-Length /
    Content-Range total) falling back to the *arr releaseSize, so a peer that
    doesn't advertise a size no longer yields expectedBytes: null.
  • The final downloadedBytes === expectedBytes check is now unconditional — a
    truncated stream is no longer renamed into place and imported as complete.
  • Fail-fast with UnknownSizeError (permanent, non-retried) when no size is
    known at all, before any byte is written.
  • Adds release_size as an ExpectedBytesSource (schema + migration 0003) for
    accurate telemetry.
  • Resume edge cases via releaseSize: a .part larger than the file is discarded
    and restarted (part_oversize); a .part already equal to the file is finalized
    without re-downloading.

Phase 2 — Idle timeout

  • Replaces the 30-min whole-request AbortSignal.timeout with an inactivity
    timeout (downloads.idleTimeoutMs, default 60s), armed only around the fetch and
    each read so slow disk I/O never trips it.
  • The abort carries a sentinel reason so only it — not a later real error — becomes
    a retryable IdleTimeoutError; the .part is preserved so retry resumes.
  • Reader is cancelled on error; the body is cancelled if the .part can't be opened.

Why not "Part 2" of the original design

The brainstorm proposed making the peer send Content-Length, but research found
the peer already does (peer.router.ts). The expectedBytes: null seen in
prod was an older peer / a stripped header — handled by the releaseSize fallback.
Content hashing was considered and dropped (HTTPS covers transport integrity;
size-only is the agreed bar).

Testing

  • bun test apps/backend/src/ — 172 pass; lint clean.
  • New/updated: size fallback, fail-fast, truncation, resume oversize/exact-finalize,
    idle stall + slow-but-active, release_size DB round-trip, retry-policy
    classification (IdleTimeoutError retryable, plain AbortError not).

Greptile Summary

Hardens peer downloads with two independent layers: size-based integrity (fail-fast on unknown size, unconditional truncation check, releaseSize fallback, and pre-fetch oversize/exact-finalize resume guards) and per-chunk inactivity timeout (replacing the old whole-request AbortSignal.timeout with an armIdle/clearIdle pattern so disk I/O doesn't count as idle). Two stacked SQLite migrations add release_size as a valid expected_bytes_source and then content_range, keeping the DB schema in lockstep.

  • Size integrity: transferSize (from Content-Range or Content-Length) now falls back to releaseSize, and the downloadedBytes === expectedBytes check is unconditional; UnknownSizeError fires before any bytes are written if no size source exists.
  • Idle timeout: A single AbortController is armed only during fetch() and each reader.read(), cleared immediately after, so slow disk writes never trip it; IdleTimeoutError (retryable, .part preserved) is distinguished from a plain AbortError (permanent) via a sentinel abort reason.
  • Resume guards: A .part larger than releaseSize is discarded before the request; one already equal is finalized without a network round-trip, emitting the expected headers+completed events for DB persistence.

Confidence Score: 5/5

The change is safe to merge — the download integrity and timeout logic are well-structured, thoroughly tested, and both new error types are correctly wired into the retry policy.

The idle-timeout design (per-chunk armIdle/clearIdle with a sentinel abort reason) correctly isolates network stalls from disk I/O, and the unconditional size check closes the silent-truncation gap. The two stacked migrations sequence correctly since both are applied before the service starts. The only findings are minor observability gaps (original errors not forwarded as cause to IdleTimeoutError), with no impact on correctness or retry behavior.

apps/backend/src/lib/servers/peer.ts — the idleTimeout() factory in both catch sites drops the original error rather than chaining it via cause.

Important Files Changed

Filename Overview
apps/backend/src/lib/servers/peer.ts Core download logic — adds idle timeout with armIdle/clearIdle per read, UnknownSizeError fail-fast, part_oversize resume guard, and moves releaseLock after completeness check. Logic is sound; the idleTimeout() factory doesn't chain the original error as cause.
apps/backend/src/modules/downloads/retry-policy.ts Adds IdleTimeoutError as retryable; UnknownSizeError is correctly left out (permanent failure). Plain AbortError falls through to false as intended.
apps/backend/src/modules/downloads/downloads.service.ts Threads idleTimeoutMs from config into downloadFile options. One-line addition, straightforward.
apps/backend/drizzle/0003_high_makkari.sql Adds release_size column and CHECK constraint for content_length/release_size. content_range added in the follow-on 0004 migration; sequencing is correct since both migrations run before the service starts.
apps/backend/drizzle/0004_sleepy_retro_girl.sql Expands the expected_bytes_source CHECK to include content_range, matching the final schema.ts definition.
apps/backend/src/database/schema.ts ExpectedBytesSource type and CHECK constraint expanded to include content_range and release_size, consistent with 0004 migration.
apps/backend/src/tests/peer-download.test.ts Comprehensive new tests for fail-fast, idle stall, slow-but-active, part_oversize, exact-finalize, and content_range source attribution. Coverage matches all new code paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[downloadFile called] --> B{existingBytes > 0 AND releaseSize known?}
    B -->|yes, part > releaseSize| C[unlink part → emitRestart part_oversize → existingBytes=0]
    B -->|yes, part == releaseSize| D[rename part→dest\nemit headers+completed\nreturn]
    B -->|no / unknown size| E[doFetch with idle timer]
    C --> E

    E --> F{response status}
    F -->|206 valid| G[resuming=true]
    F -->|206 invalid / 416| H[restartFresh → doFetch fresh]
    F -->|200 ok| I[resuming=false]
    H --> I

    G --> J[Parse Content-Range total as transferSize]
    I --> K[Parse Content-Length as transferSize]

    J & K --> L{transferSize != null?}
    L -->|yes| M[expectedBytes = transferSize\nexpectedBytesSource = content_range or content_length]
    L -->|no, releaseSize known| N[expectedBytes = releaseSize\nexpectedBytesSource = release_size]
    L -->|no, no releaseSize| O[cancel body → throw UnknownSizeError permanent]

    M & N --> P[Read loop\narmIdle per chunk\nclearIdle after each read]
    P -->|chunk received| Q[write to .part\nupdate progress]
    Q --> P
    P -->|idle timer fires| R[AbortController aborts\nreader.read rejects AbortError\ncatch: throw IdleTimeoutError retryable]
    P -->|done=true| S{downloadedBytes == expectedBytes?}
    S -->|no| T[throw IncompleteDownloadError retryable\ncatch: reader.cancel → releaseLock]
    S -->|yes| U[releaseLock → rename part→dest → emit completed]
Loading

Fix All in Claude Code Fix All in Codex

Reviews (2): Last reviewed commit: "fix: address review — content_range sour..." | Re-trigger Greptile

Resolve the expected size from the transfer header (Content-Length /
Content-Range total) falling back to the *arr release size, and make the
final size check unconditional so a truncated stream is no longer renamed
into place and imported as complete. When no size is known at all, fail
fast with UnknownSizeError before writing any bytes.

Adds 'release_size' as an ExpectedBytesSource (schema + migration 0003)
for accurate telemetry, and handles resume edge cases via releaseSize: a
.part larger than the file is discarded and restarted; a .part already
equal to the file is finalized without re-downloading.
The 30-min whole-request deadline both killed large active downloads and
was slow to notice a stalled connection. Replace it with an inactivity
timeout (downloads.idleTimeoutMs, default 60s) armed only around the fetch
and each read, so slow disk writes never trip it. The abort carries a
sentinel reason so only it — not a later real error — is reclassified as
a retryable IdleTimeoutError; the .part is preserved so retry resumes.

Also tears down the reader (cancel) on error and cancels the body if the
.part file can't be opened.
Comment thread apps/backend/src/lib/servers/peer.ts Outdated
Comment thread apps/backend/src/lib/servers/peer.ts
- Label a 206 resume's size as 'content_range' (not 'content_length') for
  accurate telemetry; adds it to ExpectedBytesSource (migration 0004).
- Release the stream reader only after the completeness check, so the error
  path's cancel+releaseLock runs against a still-locked reader on the
  IncompleteDownloadError (retried) path instead of being a swallowed no-op.
@roziscoding roziscoding merged commit 4bc6eaf into feat/qbittorrent-api-emulation Jun 7, 2026
8 of 10 checks passed
@roziscoding roziscoding deleted the feat/download-integrity-monitoring branch June 7, 2026 08:22
roziscoding added a commit that referenced this pull request Jun 7, 2026
Squash of feat/download-integrity-monitoring onto the size-based
integrity / idle-timeout work already landed via #22, so this adds only
the net-new changes on top of that branch:

Primary fix — failed qB downloads no longer silently report 'ok':
addTorrent ignored startQbDownload's result, so an unknown peer or an
unsafe peer-supplied filename created no row and was invisible to both
jack and *arr, which waited out its full stuck-download grace period.
startQbDownload now returns 'started' | 'duplicate' | 'failed'; 'failed'
propagates HTTP 503 so *arr retries promptly, while an already-active
'duplicate' stays a success so a re-add of an in-flight release is not
turned into a needless retry.

Also: bump TypeScript to 6 and resolve the resulting type errors
(getReader cast, fetch-spy typings, tsconfig types: "bun").
roziscoding added a commit that referenced this pull request Jun 7, 2026
…adarr (#21)

* feat: emulate qBittorrent auth + app endpoints

Add /api/v2 routes for qBittorrent WebUI API emulation so Sonarr/Radarr
can register jack as a qBittorrent download client. Mounts before the
global requireApiKey middleware with its own SID-cookie auth.

- POST /auth/login validates username against configured server names
- GET /app/{version,webapiVersion,preferences} return contract values
- GET /torrents/{info,categories} return empty/stub for connection test
- Stub infohash helper for Phase 2 torrent mapping

* feat: map jack downloads to qBittorrent torrents

Add qb_category/qb_source_server columns (migration 0002) and the
download→qBittorrent-torrent mapper so Sonarr/Radarr see real progress.

- toQbTorrent: real stub infohash, pausedUP on completion, content_path
  != save_path, byte-based progress
- torrents/info filters by category + hashes
- torrents/properties (404 unknown) and torrents/files
- chore: untrack accidentally-committed .claude lock + e2e sqlite db

* feat: qBittorrent add/delete/setCategory with *arr-pull import switch

Add the qBittorrent write API and switch qB-added downloads to *arr-pull
imports (blackhole-added downloads keep the existing push).

- DownloadsService: shared createDownload core + startQbDownload; skip
  triggerImport when qbSourceServer is set (per-download pull switch)
- repository delete/setQbCategory
- torrents/add accepts only jack stubs / jack download URLs (rejects
  magnets and foreign torrents); delete/setCategory scoped to the session
- wire a single DownloadsService through getApp + the blackhole watcher

* feat: auto-register jack as a qBittorrent download client

Replace the TorrentBlackhole auto-registration with a qBittorrent client
pointing at ${jack.baseUrl}/api/v2, keeping the indexer↔client binding so
only the Jack client grabs the Jack indexer's releases.

- registerDownloadClient builds a QBittorrentSettings client (host/port/
  useSsl/urlBase/username/password + per-app category field); matches an
  existing "Jack" client by name to upgrade a prior blackhole client
- best-effort setShareLimits/topPrio/setForceStart no-ops
- e2e qBittorrent-flow test (login → add → poll → import)

* refactor: clearer qB add failure + session sweep

- torrents/add returns 503 (not 415) when the download pipeline is
  unconfigured (no downloads config) — a server misconfig, not a bad
  torrent; 415 stays for magnets/foreign torrents
- QbSessionStore sweeps expired sessions on login so abandoned SIDs
  don't accumulate (get() only evicted lazily)

* test(e2e): update auto-registration for qBittorrent client; drop blackhole flow

- auto-registration e2e now asserts the qBittorrent download client
  (name "Jack", implementation QBittorrent, host/port = jack-beta)
- remove download-flow.test.ts: the blackhole-via-stack flow is
  superseded by qbittorrent-flow.test.ts (real *arr integration) and
  conflicted with it on the shared fixture; blackhole watcher logic
  stays covered by downloads-service unit tests

* refactor: remove blackhole watcher; jack is qBittorrent-only now

*arr hands grabs to jack via POST /api/v2/torrents/add (parsed in-memory),
so nothing writes to a watch folder anymore. Drop the folder-watching path
entirely for a pure qB-driven, *arr-pull download service.

- delete BlackholeWatcher; remove DownloadsService.processTorrentFile,
  the reenqueue bookkeeping, and the stub-file cleanup
- remove the now-dead push-import path: DownloadsService.triggerImport +
  the destinations dependency, and ArrServerConnector.triggerImport /
  importCommandName (qB downloads import via *arr pull)
- drop the watchPath config option (completedPath stays)
- stop constructing/starting the watcher in index.ts (resume-on-startup
  and stale-row reconciliation are retained)
- update tests, e2e setup/compose, and example config accordingly

* fix: address review — refresh qB session TTL on access, gate qB mount on downloads config

- QbSessionStore.get() now extends expiry on each successful access so an
  actively-polling *arr SID never expires mid-session (matches real qB)
- mount the /api/v2 qB API only when config.downloads is set, so the
  reported save_path is always the real completedPath (never "")

* chore(e2e): drop unused blackhole-watch volume dir from e2e task

* fix: always register the Jack indexer + qB client (forceSave)

The Jack indexer was registered unbound: registerDownloadClient ran with
forceSave=false, so *arr's connection test to jack's qB API at registration
time could fail and throw, leaving downloadClientId undefined — the indexer
then registered with no Download Client ("Any").

- register both the indexer and the qBittorrent client with forceSave=true,
  so the client always saves (returns its id → indexer gets bound) and the
  indexer saves even when its test query returns no results
- always run auto-registration (drop the no-peers skip); with forceSave the
  indexer is accepted with an empty catalog and starts working once peers exist
- e2e: assert the Jack indexer's downloadClientId equals the Jack qB client id
  (regression guard for the binding)

Verified against a real Radarr: in-place TorrentBlackhole→QBittorrent PUT and
top-level downloadClientId binding both persist on create and update.

* feat: tag auto-registered qBittorrent download client

*arr round-robins torrent grabs across all enabled torrent clients, so
real torrents from other indexers could be routed to jack's qB client
(which rejects them). Tag the client with a configurable label (default
"jack-internal") that no media carries, so *arr keeps those grabs out of
it; Jack grabs still arrive via the indexer's explicit downloadClientId
binding, which bypasses tag filtering.

Adds autoregister.tag to the config (set "" to disable) and an ensureTag
helper that creates the tag in each destination if missing.

* docs: align README and compose examples with qBittorrent-only flow

Rewrite the download flow, mounts, and troubleshooting for the
qBittorrent API model (no watch folder); add commented depends_on
service_healthy examples and document autoregister.tag.

* docs: clarify ETA_UNKNOWN sentinel comment

* fix: isolate Jack download client via lowest priority, not a tag

The tag approach was broken: Radarr/Sonarr filter the indexer-bound
download client by the *movie's* tags too (DownloadClientProvider builds
the tag-filtered pool before resolving the indexer's DownloadClientId),
so a "jack-internal" tag no movie carries made every grab fail with
"Indexer specified download client does not exist".

Instead, register the Jack qBittorrent client at *arr's lowest priority
(50). The general client pool only round-robins the best-priority group,
so real torrents from other indexers never land on Jack — while the
indexer→client binding still routes Jack grabs to it (resolved before
priority grouping). Tags are explicitly cleared so an upgrade heals a
client previously broken by the tag.

Reverts the autoregister.tag config/ensureTag added earlier.

* feat: size-based download integrity + idle timeout (#22)

* feat: verify peer downloads by size with fail-fast on unknown size

Resolve the expected size from the transfer header (Content-Length /
Content-Range total) falling back to the *arr release size, and make the
final size check unconditional so a truncated stream is no longer renamed
into place and imported as complete. When no size is known at all, fail
fast with UnknownSizeError before writing any bytes.

Adds 'release_size' as an ExpectedBytesSource (schema + migration 0003)
for accurate telemetry, and handles resume edge cases via releaseSize: a
.part larger than the file is discarded and restarted; a .part already
equal to the file is finalized without re-downloading.

* feat: replace total download timeout with a network-scoped idle timeout

The 30-min whole-request deadline both killed large active downloads and
was slow to notice a stalled connection. Replace it with an inactivity
timeout (downloads.idleTimeoutMs, default 60s) armed only around the fetch
and each read, so slow disk writes never trip it. The abort carries a
sentinel reason so only it — not a later real error — is reclassified as
a retryable IdleTimeoutError; the .part is preserved so retry resumes.

Also tears down the reader (cancel) on error and cancels the body if the
.part file can't be opened.

* fix: address review — content_range source label + reader cleanup order

- Label a 206 resume's size as 'content_range' (not 'content_length') for
  accurate telemetry; adds it to ExpectedBytesSource (migration 0004).
- Release the stream reader only after the completeness check, so the error
  path's cancel+releaseLock runs against a still-locked reader on the
  IncompleteDownloadError (retried) path instead of being a swallowed no-op.

* fix: surface failed qB downloads to *arr instead of silent 'ok'

Squash of feat/download-integrity-monitoring onto the size-based
integrity / idle-timeout work already landed via #22, so this adds only
the net-new changes on top of that branch:

Primary fix — failed qB downloads no longer silently report 'ok':
addTorrent ignored startQbDownload's result, so an unknown peer or an
unsafe peer-supplied filename created no row and was invisible to both
jack and *arr, which waited out its full stuck-download grace period.
startQbDownload now returns 'started' | 'duplicate' | 'failed'; 'failed'
propagates HTTP 503 so *arr retries promptly, while an already-active
'duplicate' stays a success so a re-add of an in-flight release is not
turned into a needless retry.

Also: bump TypeScript to 6 and resolve the resulting type errors
(getReader cast, fetch-spy typings, tsconfig types: "bun").

* fix: serve peer files with native backpressure to prevent OOM

Serving a peer file via a hand-pumped Bun.file().stream() does not apply
TCP backpressure: when the consumer stalls or aborts mid-download, the
origin reads the entire file into memory as fast as it can, starving the
event loop and exhausting RAM. On a swapless host a single stalled stream
of a large file is enough to hard-freeze the process until the kernel
OOM-kills it.

Return the BunFile (or its sliced view) directly from streamFile and hand
it to new Response, so Bun.serve streams it with native backpressure and
sendfile. Verified: peak RSS under a stalled consumer drops from ~2x the
file size to flat, while range serving stays byte-correct.

* feat: widen peer download retry backoff to survive a peer outage

A peer can be unreachable for ~15-20min (restart, tunnel hiccup). The old
schedule (5 attempts, 60s max backoff) gave up in ~3min, terminally
failing a resumable download long before the peer recovered. Raise
maxDownloadAttempts to 13 and retryMaxDelayMs to 30min so the exponential
schedule spans a worst-case ~64min window (~32min with jitter); early
retries stay ~1s for ordinary network blips. The download keeps retrying
and resumes from its .part once the peer is back.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant