Skip to content
rhoopr edited this page Apr 30, 2026 · 11 revisions

Retry & Resilience

kei is designed to handle unreliable network conditions and transient iCloud API failures.

Error Classification

Errors are classified as transient or permanent:

Type Examples Retried?
Transient HTTP 5xx, 429 (rate limit), timeouts, connection resets, checksum mismatch Yes
Permanent HTTP 4xx (except 429), disk I/O errors No

Checksum mismatches are treated as transient because they typically indicate a truncated transfer or expired CDN URL, not actual data corruption.

Exponential Backoff

Retries use exponential backoff with jitter. The initial delay is derived from --max-retries (more retries → longer initial wait, since "more patient" usually means "let the failing service recover" rather than "hammer faster"):

--max-retries Initial delay Pattern
1 – 2 2s ~2s, ~4s
3 (default) 5s ~5s, ~10s, ~20s
4 – 6 10s ~10s, ~20s, ~40s, ~60s, ~60s, ~60s
7 + 30s starts at 30s and caps at 60s

Random jitter is added to prevent multiple concurrent downloads from retrying at the same time. The maximum delay is capped at 60 seconds.

The legacy --retry-delay flag still parses (deprecated, removal in v0.20.0) and overrides the derived initial delay during the deprecation window.

Two-Phase Recovery

The download pipeline has two phases:

  1. Main pass - Downloads all assets with per-file retries (up to --max-retries, default: 3, max 100)
  2. Cleanup pass - Re-fetches CDN URLs from iCloud for all failures, then retries them at concurrency 1

The cleanup pass addresses a common failure mode: CDN download URLs expire during long syncs. By re-fetching URLs from iCloud, the cleanup pass gets fresh URLs that are more likely to succeed. Running at concurrency 1 gives large files full bandwidth.

Timeout Strategy

File downloads use a dedicated HTTP client with no total request timeout, preventing large files from being killed mid-transfer. Stalled connections are detected via a 120s read timeout (no bytes received for 120 seconds). API calls use a separate 30s total timeout for fast failure.

API Call Retries

Retries apply not just to downloads but also to API calls:

  • Album and photo enumeration
  • Zone listing
  • Library fetching

These use the same backoff strategy.

CloudKit Server Errors

CloudKit sometimes returns success HTTP status codes (200) but embeds error codes in the JSON response body. kei detects these server-side errors and retries them automatically:

Error Code Meaning
TRY_AGAIN_LATER General server overload
CAS_OP_LOCK Concurrent access conflict
RETRY_LATER Throttle signal
THROTTLED Rate limiting

These errors are most common during photo enumeration of large libraries, where they can cause silent page loss if not retried. The same exponential backoff strategy is used.

Download Summary

After all downloads complete, a summary is printed with:

  • Total assets processed
  • Successful downloads (with EXIF failure count if any occurred)
  • Failed downloads
  • Elapsed time

Related Flags

Clone this wiki locally