Skip to content

v0.7.0 — Bulk Ingest

Choose a tag to compare

@panjieke panjieke released this 07 May 07:34
· 107 commits to main since this release

Bulk Ingest from Local Storage

Ingest dozens or hundreds of files at once from a local directory — no MCP wire overhead, no running server required.

New features

  • BulkIngestor class — scan source directory, SHA-256 content dedup via manifest, copy to raw/, batch compile with asyncio.Semaphore concurrency control, single reindex at end (cortex/compiler/bulk.py)
  • cortex-bulk-ingest CLI--source, --concurrency, --force, --dry-run (scripts/bulk_ingest.py)
  • One-click shell script./scripts/bulk_ingest.sh /path/to/papers loads .env, prints summary, runs the full pipeline
  • POST /api/v1/bulk-ingest REST endpoint — programmatic bulk ingest without MCP (cortex/api/routes.py)
  • Docker INGEST_DIR volume mount — mount a local directory as /ingest in the container

Deduplication

Files are matched by SHA-256 content hash (not filename) via .cortex/ingest-manifest.json. Same content under a different name is skipped. --force bypasses the check.

Tests

31 new tests covering scan, hash, manifest, copy, compile batch, CLI parsing, REST endpoint, and Docker compose validation. Full suite: 286 passed.