Skip to content

State Tracking

rhoopr edited this page Jun 1, 2026 · 11 revisions

State Tracking

kei uses a SQLite database to track the state of every asset across sync runs.

Database Location

The state database is stored at {data_dir}/{username}.db (default: ~/.config/kei/{username}.db).

What's Tracked

For each iCloud asset:

Field Description
library CloudKit library zone, such as PrimarySync or a SharedSync-* zone
asset_id Unique iCloud asset identifier
version_size iCloud resource/version identifier for the downloaded variant
status pending, downloaded, or failed
checksum Apple's MMCS checksum string from iCloud, kept for provider traceability
download_checksum SHA-256 of the downloaded bytes before metadata writes
local_checksum SHA-256 of the final local file, used by verify --checksums
filename Original filename
local_path Where the file was downloaded
download_attempts Number of retry attempts
last_error Error message from last failure
created_at Asset creation date in iCloud
downloaded_at When the file was downloaded locally
metadata_hash Hash of captured provider metadata for rewrite/backfill checks
imported_size, imported_mtime Import snapshot used to skip rehashing unchanged adopted files

For each sync run:

Field Description
started_at When the sync began
completed_at When the sync finished (null if interrupted)
assets_seen Total assets enumerated from API
assets_downloaded Successfully downloaded count
assets_failed Failed download count
enumeration_errors Count of hard enumeration errors observed during the run
interrupted Whether the run was interrupted (legacy boolean; kept alongside status for back-compat)
status Lifecycle: running, complete, or interrupted. A SIGKILL'd process leaves the row at running, which the next startup promotes to interrupted.

Benefits

Near-Instant Subsequent Syncs

On the first run, every asset is enumerated and downloaded. On subsequent runs, assets already marked as downloaded are skipped without any filesystem checks. This makes re-running kei nearly instant for unchanged libraries.

Automatic Re-Download

If the database says a file is downloaded but the file is missing from disk, it's automatically re-downloaded. This handles cases where files were accidentally deleted or moved.

Failed Asset Tracking

Assets that fail to download are marked as failed with their error message. This includes assets that exhaust retry budgets from [download.retry] - they're moved from pending to failed with a descriptive error rather than being silently skipped.

You can view failed assets with kei status --failed. On the next sync, all failed assets are automatically reset to pending and retried with fresh attempt counts - no manual intervention needed.

Resume After Interruption

If a sync is interrupted (Ctrl+C, crash, reboot), the next run picks up where it left off. Assets already downloaded are skipped, and only pending/failed assets are attempted.

Incremental Sync

After the first full sync, kei stores a CloudKit syncToken for each library zone. On the next run, it calls Apple's changes/database endpoint to check if anything changed. If nothing did, the sync completes in 1-2 API calls instead of the ~75 needed for full library enumeration.

When changes exist, the changes/zone endpoint returns only new, modified, or deleted records since the last token. Created assets are downloaded normally. Deletions and hidden assets are written to state and skipped. kei only advances the stored token after the relevant pass and cycle finish safely.

Tokens are stored per-zone in the metadata table as sync_token:{zone_name}.

Watch mode also tracks Apple's database-level changes token separately as db_sync_token. It is advanced only after the selected-zone cycle completes safely. If Apple returns an empty complete precheck page, kei skips that wakeup but keeps the prior database token so the next wakeup rechecks from a known checkpoint.

Token withholding on partial failure

The sync token is only advanced when the run is safe. Unsafe cases include:

  • partial failures or interrupted shutdown
  • dry runs, read-only filename output, and recent-limited runs
  • hard enumeration errors, pagination shortfalls, or ambiguous empty pages
  • source delete or hidden-state writes that fail or update zero rows
  • full-query asset/master records that cannot be paired safely
  • album relation hydration that has not completed
  • path or enumeration config drift before a full reconciliation has run

When token advancement is blocked, sync_report.json includes sync_token_blocked and reason fields. If kei collected token receiver telemetry, the report includes the expected receiver count, receivers with tokens, missing receivers, blank receivers, dropped receivers, and unique token count even when advancement was not blocked.

Fallback behavior

If Apple rejects a stored token (expired or invalid), kei logs a warning and automatically falls back to a full enumeration. The new token from the full sync is stored for next time. No manual intervention needed.

kei also falls back to full enumeration when pending or failed assets need another look. The changes/zone API only returns new modifications - it can't re-enumerate assets that were already seen but not yet downloaded. Once all pending assets are resolved, incremental sync resumes.

Path-affecting config drift clears stored zone tokens and forces a full reconciliation. That covers changes such as download directory, folder templates, filename policy, media/resource selection, and date/recent filters where known assets need to be planned into a new path shape.

Album-filtered runs can avoid a full library enumeration once trusted album membership snapshots exist. If a selected album lacks a trusted snapshot, kei can run a targeted album backfill to build one, then use incremental routing later.

Transient zone-level errors (THROTTLED, RETRY_LATER, etc.) don't trigger a full re-enumeration. Only InvalidToken and ZoneNotFound cause a fallback. Other errors propagate and the sync retries with the existing token on the next cycle.

Full enumeration reasons

When a run uses full enumeration instead of incremental sync, kei records a bounded full_enumeration_reason in the JSON report, structured logs, and Prometheus metrics. Current reasons include:

Reason Meaning
no_stored_token First run, reset tokens, or no usable token for the zone
retry_failed_rows Failed rows need re-enumeration before retry
pending_rows Pending rows from a prior run need re-enumeration
metadata_backfill Metadata rewrite/backfill work needs a full asset view
album_relation_hydration_incomplete Album relation data is not trusted enough for incremental routing yet
enum_config_hash_drift Enumeration-affecting config changed
download_config_hash_drift Path-affecting download config changed
explicit_retry_failed The run was started with --retry-failed
other_static_reason A less specific safe fallback path was used

Forcing a full scan

Clear stored sync tokens, then run sync again:

kei reset sync-token
kei sync

See kei reset sync-token.

Subcommands

The state database enables several management commands:

Command Description
status Show sync status and database summary
status --failed List failed assets with error messages
sync --retry-failed Reset failed assets to pending and re-sync
reset state Delete the database and start fresh
reset sync-token Clear stored sync tokens
import-existing Scan local files and mark matching assets as downloaded
verify Check that downloaded files still exist
verify --checksums Also verify SHA256 checksums

Import Existing Files

If you have files from a previous tool (Python icloudpd, manual download, etc.), use import-existing to populate the database:

kei import-existing --config ~/.config/kei/config.toml

This scans the download directory, matches files to iCloud assets by filename and size, and marks them as downloaded. The next sync will skip these files.

Reset State

To start fresh and re-download everything:

kei reset state --yes

This deletes the database file. The next sync will treat all assets as new.

Database Schema

The database uses SQLite. Current schema version is v12. Migrations apply automatically when upgrading kei versions; each step runs inside a SAVEPOINT so a failure rolls back only that step.

Tables:

  • assets - one row per iCloud asset version, keyed by (library, id, version_size)
  • sync_runs - one row per sync execution, including lifecycle status and enumeration error counts
  • metadata - key-value store for sync tokens, schema markers, and the shared-library notice latch
  • asset_albums - many-to-many between assets and album names, scoped by library
  • asset_people - face-recognition labels per asset, scoped by library
  • album_containers, album_membership_snapshots, asset_album_memberships - trusted album membership cache for album-aware incremental routing

Schema-version highlights:

Version Change
v1 assets (with (id, version_size) PK) and sync_runs
v2 metadata key-value table
v3 assets.local_checksum column (locally-computed SHA-256)
v4 assets.download_checksum column (pre-EXIF download hash)
v5 Provider-agnostic metadata columns (source, is_favorite, rating, GPS, media_subtype, keywords, description, provider_data, metadata_hash, etc.) plus the asset_albums and asset_people tables. Sync tokens are invalidated on the first crossing so the metadata backfill repopulates without re-downloading files.
v6 assets.metadata_write_failed_at so the metadata-only rewrite path can re-drive failed EXIF/XMP embeds on subsequent syncs.
v7 sync_runs.status lifecycle column (running / complete / interrupted); existing rows are backfilled from the (completed_at, interrupted) pair.
v8 Adds library to the assets primary key so shared-library assets cannot collide with primary-library assets.
v9 Adds library to asset_albums and asset_people primary keys.
v10 Adds sync_runs.enumeration_errors for hard enumeration failure reporting.
v11 Adds assets.imported_size and assets.imported_mtime so repeated import-existing runs can skip SHA-256 reads when size and mtime are unchanged.
v12 Adds album container and membership snapshot tables used by album-aware incremental routing.

Related

Commands

Getting Started

Features

Clone this wiki locally