Skip to content
This repository was archived by the owner on May 21, 2026. It is now read-only.

v0.3.0

Choose a tag to compare

@lgruen-vcgs lgruen-vcgs released this 18 Mar 23:11
· 12 commits to main since this release

Breaking changes

Replace full-document alignment with quote-only resolution.

New API

  • DocumentIndex(pdf_bytes) — extract per-character bounding boxes from a PDF (one-time cost)
  • doc.resolve(quotes) — resolve verbatim quotes to bounding boxes via Smith-Waterman alignment
  • from groundmark.convert import convert, Config — PDF-to-Markdown conversion via LLM (requires optional [bedrock]/[anthropic]/etc. extra)

Removed

  • process() function and ProcessResult — replaced by convert() and DocumentIndex.resolve()
  • strip(), annotate(), resolve() re-exports from anchorite
  • PdfplumberAnchorProvider — pdfplumber dependency removed
  • Visualize module
  • anchorite dependency — alignment now uses seq-smith directly

Dependencies restructured

Core dependencies (seq-smith, pypdfium2) are always installed. LLM providers are optional extras:

uv add groundmark                          # resolve only
uv add groundmark --extra bedrock          # + Bedrock conversion
uv add groundmark --extra anthropic,bedrock # multiple providers