Release v0.1.0 · promptrails/parserails

First public release of ParseRails — fast, light, cgo-free document parsing for Go.

ParseRails is the Go counterpart to run-llama/liteparse: it wraps Google's PDFium (the engine behind Chrome's PDF viewer) compiled to WebAssembly (wazero) — so the default build needs no cgo and no system libraries.

Highlights

Spatial text extraction — every word with its bounding box, page, and (opt-in) font size. word and line granularity.
Page rendering — RenderPage rasterizes any page to an image.Image.
Pluggable OCR with automatic fallback for scanned/image-only pages — bundled cgo-free Tesseract adapter (ocr/tesseract) + HTTP adapter (ocr/httpocr).
Office formats — DOCX/PPTX/XLSX/ODT/RTF via headless LibreOffice.
ExtractText — whole-page text fast path for RAG/search (no boxes), several times cheaper than full parsing.
Files & batches — ParseFile (PDF or office) and concurrent ParseFiles.
CLI — go install github.com/promptrails/parserails/cmd/parserails@latest (parse / render).
Two backends — cgo-free WASM by default, or native -tags parserails_cgo (links libpdfium) for max throughput on controlled hosts.

Docs & examples

Docs: https://promptrails.github.io/parserails/
Runnable examples (each its own module + Dockerfile): examples/extract-text, examples/parse-server, examples/native-cgo
Benchmarks vs ledongthuc/pdf, pdfcpu, unipdf under benchmark/

go get github.com/promptrails/parserails@v0.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Docs & examples

Uh oh!