Skip to content

Releases: promptrails/parserails

v0.1.0

18 Jun 11:27

Choose a tag to compare

First public release of ParseRails — fast, light, cgo-free document parsing for Go.

ParseRails is the Go counterpart to run-llama/liteparse: it wraps Google's PDFium (the engine behind Chrome's PDF viewer) compiled to WebAssembly (wazero) — so the default build needs no cgo and no system libraries.

Highlights

  • Spatial text extraction — every word with its bounding box, page, and (opt-in) font size. word and line granularity.
  • Page renderingRenderPage rasterizes any page to an image.Image.
  • Pluggable OCR with automatic fallback for scanned/image-only pages — bundled cgo-free Tesseract adapter (ocr/tesseract) + HTTP adapter (ocr/httpocr).
  • Office formats — DOCX/PPTX/XLSX/ODT/RTF via headless LibreOffice.
  • ExtractText — whole-page text fast path for RAG/search (no boxes), several times cheaper than full parsing.
  • Files & batchesParseFile (PDF or office) and concurrent ParseFiles.
  • CLIgo install github.com/promptrails/parserails/cmd/parserails@latest (parse / render).
  • Two backends — cgo-free WASM by default, or native -tags parserails_cgo (links libpdfium) for max throughput on controlled hosts.

Docs & examples

  • Docs: https://promptrails.github.io/parserails/
  • Runnable examples (each its own module + Dockerfile): examples/extract-text, examples/parse-server, examples/native-cgo
  • Benchmarks vs ledongthuc/pdf, pdfcpu, unipdf under benchmark/
go get github.com/promptrails/parserails@v0.1.0