Releases: parsehawk/parsehawk
Releases · parsehawk/parsehawk
v0.1.1
What's Changed
- feat(web): surface copyable resource IDs in the UI by @simx11 in #4
- Tune vLLM runtime defaults by model profile by @francisrafal in #11
- fix(test): fail e2e runtime tests instead of skipping them by @benemanu in #16
Full Changelog: v0.1.0...v0.1.1
v0.1.0
Initial public release of ParseHawk.
ParseHawk is an open-source document extraction tool for turning messy PDFs and images into structured JSON. It includes a local-first CLI, REST API, Web UI, schema builder, and a bundled local inference runtime powered by NuExtract3 through vLLM.
What’s included
- CLI for starting, stopping, restarting, checking status, and running one-shot extractions
- REST API for files, extractors, schemas, and extraction jobs
- Web UI for uploading documents, defining schemas, and inspecting extraction results
- Local SQLite-backed persistence
- Local file storage
- Built-in schema validation
- Docker-based app runtime for API, worker, and Web UI
- Local model runtime support:
- vLLM Metal on Apple Silicon
- vLLM Docker runtime on Linux x86_64 with NVIDIA GPUs
- Default extraction model:
numind/NuExtract3-W4A16 - Example receipt extraction workflow
- Apache-2.0 license