v1.1.0 — Native-resolution OCR-driven fax pipeline
A fax-only relaunch of the skill, rebuilt around a native-resolution, OCR-driven 1-bit pipeline. The size-optimization mode from v1.0 is gone (delegated to a separate companion skill); everything in this release is about getting a PDF onto a real fax line so the receiver can actually read it.
Highlights
- Rebrand to fax-only. The skill folder is now
pdf-fax-optimizer/(waspdf-optimizer/) and the release asset ispdf-fax-optimizer.zip. If you have v1.0 installed, remove it before installing v1.1. - Native-resolution OCR-driven bilevel pipeline. Pages render at their actual DPI (capped at 300 PPI),
rapidocr-onnxruntimeidentifies text both outside and inside images, and a#808080luma rule paints recovered glyphs black on light backgrounds, white on dark ones so text reads on either polarity. - 17 halftone screens via a
SCREENSregistry: clustered AM dots, square / diamond / ellipse screens, ordered Bayer, blue-noise, green-noise, Floyd-Steinberg, Atkinson, Jarvis-Judice-Ninke, Stucki, Sierra, EDD (edge-enhancing), line, crosshatch, mezzotint, plus thenonereference.floyd,jarvis, andeddare highlighted as optimal picks for forms-and-photo pages.
New features
preserve_text(default on). Detects small saturated-colour fills (slide highlight chips, dashboard badges, colored table cells, callout boxes, banners) carrying dark text, and lifts the fill to white before binarization so the dark text survives the 1-bit channel as crisp black-on-white. The threshold is background-relative, so dark-luma chips (navy, deep blue, forest green) rescue correctly instead of getting knocked out to solid black.recover_text(opt-in,--recover-text on). OCRs text baked into halftoned image regions (signs, billboards, captions on photos) and recolours each glyph to pure black or pure white per the#808080polarity rule, layered above the halftoned photo so the photograph itself stays intact.--basicmode. Minimal grayscale → Otsu → CCITT G4 fallback for when you want a predictable, compact baseline with no opinion.--fax-heavymode. Biases the auto-picker toward a clustered-dot screen that compresses tighter and survives noisy phone lines at the cost of photographic detail.--compare-page N. Renders one page through a curated 6-up subset of the screens (clustered / green-noise / blue-noise / atkinson / floyd / line) into one labelled contact sheet, each panel annotated with its real G4 size and transmission estimate, with the auto-recommended pick highlighted. Add--compare-originalto lead with original-color + true-grayscale references.--sample N. Emits a 4-panel diagnostic (original / grayscale / standard fax baseline / optimized output) so you can confirm legibility before transmission.--recover-text-preview N. Side-by-side PNG of a page faxed with vs without the within-image text recolour, for diffing the feature.- Office + image input.
.doc/.docx/.rtf/.odt/.txt,.ppt/.pptx/.odp,.xls/.xlsx/.ods/.csv,.png/.jpg/.tif/.bmp/.gif/.webpare all normalized to PDF first viato_pdf.py(LibreOffice headless when available, Pillow for images). - Cloud-fax sending.
send_fax.pyprovides a Phaxio integration so the optimized PDF can be transmitted in the same workflow. - References docs. Three new long-form docs ship inside the skill:
config-schema.md(every JSON-config key documented),fax-optimization.md(the why and how of every pass),sending.md(provider setup + retry semantics).
Performance and correctness
- Per-page processing for multi-page docs: each page is rendered, dithered, encoded, and concatenated independently, keeping peak memory bounded.
- DPI cap at 300 PPI for the bilevel raster, regardless of
--fax-resolution. The receiving fax modem won't go higher anyway, and the cap keeps large docs from spinning. - OCR is gated behind
--recover-text onso the default conversion path doesn't pay the rapidocr cost when the feature isn't requested. - Vectorised despeckle (
despeckle_bw) — a few hundred milliseconds saved per page on the cleanup pass. halftone_excludemask — OCR-identified text background regions are excluded from the halftone screen, so glyphs sit on flat white instead of a tone band that the binarizer would have to fight.
Breaking changes (vs v1.0)
- Skill folder renamed:
pdf-optimizer/→pdf-fax-optimizer/. Update your~/.claude/skills/or~/.codex/skills/install accordingly. - Size-optimization mode removed. Use a dedicated size-shrinking skill for that workflow.
- Earlier feature names renamed before this tag:
flatten_color_highlights→preserve_text,robust_text→recover_text. The CLI flags are--preserve-text/--no-preserve-textand--recover-text {auto,on,off}.
Install
Download pdf-fax-optimizer.zip from this release and either:
- Claude.ai — upload via Settings → Capabilities → Skills.
- Claude Code — unzip into
~/.claude/skills/. - Codex — unzip into
~/.codex/skills/.
Runtime requirements: Python 3.10+, qpdf, plus the Python deps in requirements.txt (PyMuPDF, Pillow, opencv-python, numpy, img2pdf, rapidocr-onnxruntime, optional python-docx etc. for Office inputs). LibreOffice headless is optional but recommended for .doc/.docx/.ppt/.pptx/.xls/.xlsx conversion.
Diff summary
26 commits since v1.0.0 across 31 files. The full reorganization migrated the skill from pdf-optimizer/ to pdf-fax-optimizer/, added ~5.3k lines of pipeline + references, and removed ~1.4k lines of the legacy size-mode skill.