Skip to content

v0.1.2 - Initial Release

Choose a tag to compare

@krockxz krockxz released this 17 Jun 23:48
· 2 commits to main since this release

First public release of pdf2docx-healer.

What's included:

  • Bullet/numbered list detection (Unicode + ASCII) with Word styles and OOXML numbering injection
  • Hyperlink injection for http/https/www./mailto URLs (multi-URL runs)
  • CJK font fallback mapping (SimSun, MS Gothic, Malgun Gothic, etc.)
  • OCR pipeline via PyMuPDF + Tesseract with graceful fallback
  • CLI: pdf2docx-heal with --ocr, --no-lists, --no-hyperlinks, --no-font-fix, --aggressive, --quiet flags
  • GitHub Actions trusted publishing workflow to PyPI