Fixes the "10h silent ingest burns Opus quota" UX problem. Ingest is
now a background job with observable progress and cooperative
cancellation, and the worker pins its own model so interactive-session
model never leaks into ingest LLM calls.
Architecture:
* SQLite-backed jobs queue (migration 0013)
* asyncio worker runs inside the MCP server process, claims jobs,
executes ingest_file / ingest_repo with progress + cancel checks
* `ALEXANDRIA_CLAUDE_MODEL` pinned per job from config.jobs.model
(default `haiku`) so background ingest is cheap regardless of what
model powers the interactive session
MCP tool surface:
* ingest() now enqueues and waits up to wait_s (default 60s) for
completion — short URL ingests still feel synchronous, long repo
ingests return a job handle
* scope="docs" opt-in restricts repo ingests to README + top-level
markdown + docs/** (turns a 40h vllm ingest into a 20min one)
* jobs_list, jobs_status, jobs_cancel tools expose the queue to agents
CLI:
* `alxia jobs list / status / tail / cancel`
Skill:
* SKILL.md teaches agents: use wait_s=0 for batches, poll jobs_status
instead of blocking, default scope="docs" for repos, prefer
cancellation over silent waiting
Docs:
* docs/guides/jobs.md covering architecture, scope control, model
pinning, and operational gotchas
Tests:
* 9 jobs queue tests (enqueue, claim, progress, cancel, list, json
round-trip)
* migration version bump in e2e test
Version: 0.37.0