Skip to content

v0.1.0 — provenance-first document parser

Latest

Choose a tag to compare

@harish-ai-engineer harish-ai-engineer released this 03 Jul 16:30

First release. Install from PyPI:

pip install agentcontext-core            # zero-dependency core
pip install "agentcontext-core[pdf,docx]"

(The distribution is named agentcontext-core because plain agentcontext is name-blocked on PyPI by an unrelated project. The import is import agentcontext.)

Highlights

  • Provenance on every block — source, page, hierarchical section path, char span; unknown fields are explicit null, never omitted
  • PDF (text-layer), DOCX, HTML, Markdown, plain text → clean Markdown + lossless UDM JSON
  • --cite inline: Markdown with provenance anchors on every block
  • SDK: Document.parse(), .blocks, .tables, .to_json()
  • Benchmark harness + seed results vs MarkItDown (see BENCHMARKS.md) — provenance is the wedge: 100% vs 0%
  • 50-page PDF parses in 0.51s; zero hard dependencies

Full pipeline preview (chunking → embeddings → retrieval → context packages) lives on the platform branch.