Skip to content

loganrooks/agentic-ocr

Repository files navigation

agentic-ocr

Pipeline, eval/checkers, experiments, and execution runner for an agentic OCR + semantic-segmentation system targeting humanities/philosophy libraries.

This is the system-under-test repo of a three-repo topology (see PLAN.md §11.1); it pins the schema and corpus-generator repos and records their pins on every eval result.

This README holds pointers only — no claims that can go stale (PLAN §11.2).

Document Authority
PLAN.md Strategy. Edited only at phase gates.
STATE.md What is true now. Read this first.
ledger.md Append-only predict→verdict log.
experiments/ Pre-registrations + results, immutable once verdict-labeled.
docs/prior-findings.md Distilled empirical record carried from scholardoc.

Layout

  • eval/ — checker suite + eval/lib/ scoring core (ported from scholardoc).
  • eval/fixtures/ — JSON eval fixtures (no PDFs ever; PLAN §11.5).
  • tests/ — pytest suite for eval/.
  • experiments/E1…E7/ — one pre-registered experiment each (PLAN §9).
  • runner/ — SSH-over-Tailscale + rsync execution skeleton (PLAN §7.1).
  • docs/ — prior findings + ADRs.

Dev loop

uv sync
uv run pytest
uv run ruff check .
uv run mypy

Phase 0 status (apparatus, no models yet) is tracked in STATE.md.

About

Agentic OCR + semantic segmentation pipeline, eval harness, and pre-registered experiment programme for humanities/philosophy libraries

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages