Skip to content

Releases: parsehawk/parsehawk

v0.1.1

25 Jun 14:38

Choose a tag to compare

What's Changed

  • feat(web): surface copyable resource IDs in the UI by @simx11 in #4
  • Tune vLLM runtime defaults by model profile by @francisrafal in #11
  • fix(test): fail e2e runtime tests instead of skipping them by @benemanu in #16

Full Changelog: v0.1.0...v0.1.1

v0.1.0

25 Jun 09:48

Choose a tag to compare

Initial public release of ParseHawk.

ParseHawk is an open-source document extraction tool for turning messy PDFs and images into structured JSON. It includes a local-first CLI, REST API, Web UI, schema builder, and a bundled local inference runtime powered by NuExtract3 through vLLM.

What’s included

  • CLI for starting, stopping, restarting, checking status, and running one-shot extractions
  • REST API for files, extractors, schemas, and extraction jobs
  • Web UI for uploading documents, defining schemas, and inspecting extraction results
  • Local SQLite-backed persistence
  • Local file storage
  • Built-in schema validation
  • Docker-based app runtime for API, worker, and Web UI
  • Local model runtime support:
    • vLLM Metal on Apple Silicon
    • vLLM Docker runtime on Linux x86_64 with NVIDIA GPUs
  • Default extraction model: numind/NuExtract3-W4A16
  • Example receipt extraction workflow
  • Apache-2.0 license