GlyphPDF v1.0.0 — First Public Release
GlyphPDF v1.0.0 — First Public Release
GlyphPDF is an open-source, privacy-first PDF workstation for Windows. No telemetry. No cloud
upload. No subscription. Your documents stay on your machine.
Licensed under Apache-2.0.
What's in v1.0.0
Core PDF Engine
- Three-backend architecture: PoDoFo for editing/annotations/forms/encryption,
PDFium for high-quality rendering (3-tier LRU tile cache, 256 MB), qpdf for
linearization and repair-on-open. - Incremental save preserves digital signatures.
- Corruption recovery with graceful repair-on-open warnings.
Security & Redaction
- Edact-Ray defense: glyph-advance normalization closes the side-channel attack
(Bland et al., PETS 2023 / arXiv 2206.02285) — redacted text cannot be reconstructed
from cursor-position gaps. - Invisible OCR text scrub: Tr==3 invisible text within redaction rectangles is
excised from the content stream, not merely painted over. - PAdES B-LT/B-LTA digital signatures with real DSS/VRI embedding, trust-chain
verification against the Windows system root store, OCSP response verification, and
ByteRange overlap detection (PDF shadow attack prevention). - 15+ sanitization vectors: metadata, JavaScript, embedded files, and more.
- Content-literal injection fixed:
pdfEscapeLiteralStringhardens watermark,
annotation, header/footer, and Bates numbering against injection.
OCR — Workstream 1
- PP-DocLayoutV2 layout detector (ONNX, Apache-2.0): 11-region-type detection
(Title/Paragraph/Table/Figure/List/Header/Footer/Equation/Reference/Caption/Other). - Tesseract 5.5.2 + RapidOCR PP-OCRv5 dual-engine fanout via LaneScheduler.
- ROVER word-level fusion for multi-engine consensus — the default OCR engine
is the ROVER ensemble (Tesseract primary + RapidOCR secondary, IoU-matched,
confidence-weighted), with automatic fallback to Tesseract-only when the
bundled PP-OCRv5 models are unavailable. - OCR result overlay in the viewer, with re-OCR on demand from the Edit ribbon.
- All models bundled — the installer ships the PP-OCRv5 and layout ONNX
weights plus Tesseract language data, so OCR works offline with no downloads. - Cross-page pipeline: layout ‖ OCR ‖ fusion overlap via
CrossPagePipeline.
Djot Document Interchange — Workstream 2
- Dual-model architecture: Structural PDF object graph ↔ Semantic
SemanticDocument. - OCR→Djot mapping (
OcrDjotMapper): reading-order layout regions → block structure,
per-word provenance encoding for MRC sandwich-text alignment. - Annotation rich text in Djot with /RC XHTML + /Contents plain text + /PieceInfo
Djot sidecar — perfect GlyphPDF roundtrip plus Acrobat/Foxit interoperability. - Comment threading:
/IRTin-reply-to linking, review-state roundtrip, depth-indent
view with date/status/author filters. - ProvenanceGuard refuses Djot save-back to signed born-PDF documents.
- Vendored Lua 5.4 (MIT) + Djot reference parser for spec-conformant parsing.
MRC Layered Compression — Workstream 3
- 30× compression ratio measured on synthetic A4 scanned page.
- JBIG2 lossless foreground (agl/jbig2enc, Apache-2.0) — pattern-matching mode
disabled (Xerox 2013 / German BSI ban). - JPEG2000 background at configurable ratio (Lossless 10:1 / Balanced 30:1 /
Aggressive 50:1) via OpenJPEG 2.5.4 (BSD-2-Clause). - Invisible sandwich text from WS1 OCR word boxes — output is searchable PDF/A-2b.
- veraPDF validation gate before finalization.
AI Assistant
- Local AI via Ollama (llama3 default): real async HTTPS calls to a locally running Ollama
instance — no data leaves your machine. - API key storage via Windows Credential Manager; "Test connection" verifies reachability before saving.
- Cloud providers (Anthropic, OpenAI, Google Gemini) are planned for a future release.
Office & Image Import
- Office→PDF via LibreOffice subprocess (docx/xlsx/pptx/odt/ods/odp/rtf/csv/txt).
- Images→PDF via PoDoFo PdfImage XObject embedding (PNG/JPEG/TIFF).
Modes & Tools
- 7 mode tabs: Home, View, Edit, Pages, Convert, Forms, Security
- Pages mode: split (by-page / every-N / range expression), reorder with drag-drop.
- Batch mode: 7 operation types, per-file progress, continue-on-failure, CSV/JSON
error log export. - Form Builder: 7 field types (Text/Checkbox/Radio/Dropdown/ListBox/Date/Numeric)
with tab-order editor and undo/redo. Signature widget and pushbutton fields are planned
for a future release. - Compare mode: Myers LCS diff with move detection (green/red/orange), NEXT/PREV
navigation. - Signature mode: PAdES B-LT/B-LTA signing + validation with trust indicators.
- PDF/A mode: veraPDF subprocess validation, conformance levels, export.
- Compress mode: MRC Off/Lossless/Balanced/Aggressive selector + live size estimate.
Branding & UI
- App icon, splash screen, and system-tray icon (Show/Quit).
- Dark, Light, and High Contrast themes.
- WCAG-aligned accessibility: screen reader names, F6 region cycling, F1 help dialog.
- Autosave every 5 minutes (configurable 1-30 min); crash recovery on next launch.
Localization
GlyphPDF v1.0 ships in English. Arabic, French, and German translations are planned for a future release.
Translation framework wired for Arabic (RTL), French, German (1394 source strings each; 0 translated in v1.0). Human-translated packages are pending commissioning for a future release.
Installation
System requirements: Windows 10/11 x64, ~500 MB disk space.
Option A — Installer (recommended)
- Download
GlyphPDF-1.0.0-x64.msibelow. - Verify the SHA-256 checksum (see SHA-256 Verification below).
- Run the MSI installer.
- Launch GlyphPDF from the Start Menu or by double-clicking any
.pdffile.
Option B — Portable (no installation)
- Download
GlyphPDF-1.0.0-x64-portable.zipbelow. - Verify the SHA-256 checksum.
- Unzip to any folder (e.g.
C:\GlyphPDFor a USB drive). - Run
GlyphPDF.exedirectly from that folder.
The portable edition is fully self-contained — all runtimes, models, and language data are included. Nothing is written outside the unzipped folder.
Optional dependencies (not bundled):
- LibreOffice (for Office→PDF import)
- veraPDF CLI (for PDF/A live validation in the PDF/A panel)
- Ollama (for local AI assistant without an API key)
SHA-256 Verification
SHA-256: 79286BA3F8756B166C7ED4F8D1F2342CE0AE41EA9018C73A26E16BE010AF827B
File: GlyphPDF-1.0.0-x64.msi
SHA-256: 530D24541824362CCD2EAFD55F1BB15ABABE461EB1662E32D80B1CDEB6E17BB7
File: GlyphPDF-1.0.0-x64-portable.zip
PowerShell verification:
(Get-FileHash "GlyphPDF-1.0.0-x64.msi" -Algorithm SHA256).Hash
# Compare with the hash above — must match exactlyCommand prompt:
certutil -hashfile GlyphPDF-1.0.0-x64.msi SHA256Build From Source
# MSYS2 ucrt64 toolchain (GCC 16.1.0, Qt 6.11.0)
pacman -S mingw-w64-ucrt-x86_64-qt6-base mingw-w64-ucrt-x86_64-podofo \
mingw-w64-ucrt-x86_64-tesseract-ocr mingw-w64-ucrt-x86_64-openjpeg2
mkdir build && cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build . --parallel 8
QT_QPA_PLATFORM=offscreen ctest --output-on-failure -j4Full package list in README.md → "Build Instructions".
Acknowledgements
GlyphPDF would not exist without these open-source projects:
| Library | License | Use |
|---|---|---|
| Qt 6.11 | LGPL-3.0 | Application framework |
| PoDoFo 1.1.0 | LGPL-2.0 | PDF editing + annotations |
| PDFium | BSD-3-Clause | PDF rendering |
| qpdf 12.3.2 | Apache-2.0 | PDF linearization + repair |
| Tesseract 5.5.2 | Apache-2.0 | OCR engine |
| OpenJPEG 2.5.4 | BSD-2-Clause | JPEG2000 encoding |
| agl/jbig2enc | Apache-2.0 | JBIG2 lossless compression |
| Lua 5.4 | MIT | Djot reference parser host |
| ONNX Runtime | MIT | ML inference (layout + OCR) |
| OpenSSL | Apache-2.0 | Cryptographic operations |
Full dependency audit: docs/MRC-ENCODER-LICENSE-AUDIT.md
Contributions welcome — see CONTRIBUTING.md for the contribution guide and SECURITY.md for responsible disclosure.