Skip to content

Releases: petehottelet/pdf-fax-optimizer

v2.0.0 — installable PyPI package

20 Jun 08:04

Choose a tag to compare

pdf-fax-optimizer 2.0.0

First release as a proper, pip-installable package — alongside the existing agent skill.

pip install pdf-fax-optimizer
pdf-fax-optimizer input.pdf -o output.fax.pdf

Highlights

  • Installable CLI package. New pyproject.toml (hatchling) with console scripts pdf-fax-optimizer and pdf-fax-send, optional extras [ocr] / [send] / [all] / [dev], and bundled halftone assets shipped in the wheel.
  • One source tree, two distributions. Code now lives in an importable pdf_fax_optimizer package nested in the skill folder; thin shims keep python scripts/*.py and existing skill installs working.
  • Mixed-page DPI fix. A low-DPI embedded raster no longer drags down live vector text on the same page — the preset now acts as a floor for text-bearing pages. The JSON report records chosen_dpi and chosen_dpi_reason for auditability.
  • Safer check_deps. Detect-only by default; auto-install is gated behind --auto-install, and --break-system-packages is a separate explicit opt-in.
  • Tests + CI. New pytest suite over synthetic fixtures (smoke, pipeline, report, CLI, conversion, send, and a mixed-DPI regression guard) running on Python 3.10/3.11/3.12 with ruff, plus automated PyPI publishing via Trusted Publishing.

Docs

  • pip-first install + Development sections in the README, python -m pdf_fax_optimizer.* entry points in SKILL.md, recover_text clarified as opt-in, and the square vs. anisotropic rendering history reorganized into a Background note.

v1.2.0 - Vision-aware halftone picker, unified sample contact sheet, preserve_text fix

20 Jun 05:43

Choose a tag to compare

A maintenance release focused on making the halftone choice smarter, unifying every "look at one page through different settings" feature behind a single contact-sheet entry point, and fixing a legibility regression where saturated-coloured text on white paper was getting whitened away before binarization.

Highlights

  • Vision-aware halftone auto-picker. recommend_dither() is now a 9-branch decision tree fed by cheap content stats (mean/std luma, edge density, texture score, bimodality, dark/light fractions, connected-region count) extracted from the photo-mask pixels in _extract_photo_features(). Below the legacy gates (text-only → none, low-DPI → blue-noise, --fax-heavyclustered) the picker now selects within the detail family {clustered, atkinson, edd, floyd, jarvis, green-noise} based on what is actually on the page. The per-page features and the discriminating-feature values that drove the pick are surfaced in the --report JSON so you can always see WHY a page picked what it picked. Validated on Prestige_Estates_v3.pdf: now picks edd (correct for the billboard text-on-photo cover), instead of always defaulting to green-noise.

  • Unified --sample N --panels K contact sheet. A single entry point — render_contact_sheet — replaces the previous trio of competing flags. Render any page through 1 to ~20 panels, customise the recipe with --sample-include orig,gray,floyd,edd,line, and the strip above the grid shows every setting that produced the panels (input file, page number, resolution, preserve_text, ocr_text, recover_text, text_binarize, dither, auto-pick recommendation, photo_fraction). Default behaviour is still the 4-panel sample. The old --compare-page, --recover-text-preview, and --preview-page flags continue to work but are marked (legacy) in --help.

  • preserve_text legibility fix. preserve_text_mask was over-firing on saturated-coloured text on white paper (e.g. the gold "EXTRAORDINARY PROPERTIES" subhead beneath a logo lockup), classifying it as a "decorative accent" and whitening the strokes themselves. The fix adds a chroma-density gate: when a saturated component has no dark text strokes inside (frac < 0.005), it only counts as a decorative accent when the raw pre-morph chroma densely fills the closed component (density > 0.55). Text-shaped strokes have ~25 % chroma density (lots of paper between letters) and now fall through to the binarizer, which renders them as crisp black ink. The text_preserved warning on the Prestige cover dropped from 395 kpx to 2 kpx; the same chips-with-dark-text rescue case (the original use of preserve_text) is unaffected.

Documentation refresh

  • Halftone-styles table reframe. The "Noise robustness" column became "Channel character", with positive descriptive labels (long-run / line-tolerant, fine-grain / detail-first, edge-enhancing / type-friendly, scanline-aligned stripes, …) instead of negatively-loaded "worst"/"poor" terms. The previous wording made the best-fidelity error-diffusion screens (floyd, jarvis, edd) read as wrong picks, which is the opposite of what the auto-picker now chooses for the detail family. Synchronised across README.md, pdf-fax-optimizer/SKILL.md, and pdf-fax-optimizer/references/fax-optimization.md.

  • text_rescue.png cleanup. Dropped the redundant gray subhead between the title and the first section heading; rewrote the preserve_text and recover_text section headings into parallel, self-contained sentences that describe what each feature actually does. Tightened the title strip to remove the dead air left by the removed subhead.

  • halftone_grid.png regen. The "EXTRAORDINARY PROPERTIES" subhead beneath the Prestige logo now reads correctly in every halftoned panel (it was silently being whitened by the over-firing preserve_text before this release).

  • Reference docs. references/config-schema.md now marks --sample / --panels / --sample-include / --no-sample-header as the supported flags and the older flags as (legacy). requirements.txt notes the Python 3.10+ requirement up top.

CLI changes

  • New: --panels K (1, 2, 4, 6, 8, 12, 20, max) for --sample.
  • New: --sample-include orig,gray,clustered,floyd,line,… for custom recipes.
  • New: --no-sample-header to omit the settings strip above the grid.
  • Legacy and still working: --compare-page, --compare-methods, --compare-original, --recover-text-preview, --preview-page. Help text now points to the --sample equivalents.

JSON report

  • pages[i].photo_features is now present on any page that ran the full MRC pipeline: a dict of mean_luma, std_luma, dark_fraction, light_fraction, edge_density, texture_score, bimodal_score, photo_area_px, n_regions. This is what the auto-picker reasoned over to pick the dither.

No breaking changes

All v1.1.0 flags continue to work, including the legacy --compare-page / --recover-text-preview / --preview-page flags. The auto-picker change can surface a different default dither for any given page; if you want the v1.1.0 behaviour, pass --dither green-noise explicitly.

Install

Download pdf-fax-optimizer.zip from this release and either:

  • Claude.ai — upload via Settings → Capabilities → Skills.
  • Claude Code — unzip into ~/.claude/skills/.
  • Codex — unzip into ~/.codex/skills/.

Runtime requirements: Python 3.10+, qpdf, plus the deps in requirements.txt (PyMuPDF, Pillow, opencv-python, numpy, img2pdf, rapidocr-onnxruntime). LibreOffice headless is optional but recommended for .doc/.docx/.ppt/.pptx/.xls/.xlsx conversion.

Diff summary

2 commits since v1.1.0, 9 files changed (1137 insertions, 303 deletions). The bulk of the change is in pdf-fax-optimizer/scripts/fax_pipeline.py (smart picker + unified sampler + preserve_text fix); the rest is documentation and regenerated demo graphics.

v1.1.0 — Native-resolution OCR-driven fax pipeline

19 Jun 23:47

Choose a tag to compare

A fax-only relaunch of the skill, rebuilt around a native-resolution, OCR-driven 1-bit pipeline. The size-optimization mode from v1.0 is gone (delegated to a separate companion skill); everything in this release is about getting a PDF onto a real fax line so the receiver can actually read it.

Highlights

  • Rebrand to fax-only. The skill folder is now pdf-fax-optimizer/ (was pdf-optimizer/) and the release asset is pdf-fax-optimizer.zip. If you have v1.0 installed, remove it before installing v1.1.
  • Native-resolution OCR-driven bilevel pipeline. Pages render at their actual DPI (capped at 300 PPI), rapidocr-onnxruntime identifies text both outside and inside images, and a #808080 luma rule paints recovered glyphs black on light backgrounds, white on dark ones so text reads on either polarity.
  • 17 halftone screens via a SCREENS registry: clustered AM dots, square / diamond / ellipse screens, ordered Bayer, blue-noise, green-noise, Floyd-Steinberg, Atkinson, Jarvis-Judice-Ninke, Stucki, Sierra, EDD (edge-enhancing), line, crosshatch, mezzotint, plus the none reference. floyd, jarvis, and edd are highlighted as optimal picks for forms-and-photo pages.

New features

  • preserve_text (default on). Detects small saturated-colour fills (slide highlight chips, dashboard badges, colored table cells, callout boxes, banners) carrying dark text, and lifts the fill to white before binarization so the dark text survives the 1-bit channel as crisp black-on-white. The threshold is background-relative, so dark-luma chips (navy, deep blue, forest green) rescue correctly instead of getting knocked out to solid black.
  • recover_text (opt-in, --recover-text on). OCRs text baked into halftoned image regions (signs, billboards, captions on photos) and recolours each glyph to pure black or pure white per the #808080 polarity rule, layered above the halftoned photo so the photograph itself stays intact.
  • --basic mode. Minimal grayscale → Otsu → CCITT G4 fallback for when you want a predictable, compact baseline with no opinion.
  • --fax-heavy mode. Biases the auto-picker toward a clustered-dot screen that compresses tighter and survives noisy phone lines at the cost of photographic detail.
  • --compare-page N. Renders one page through a curated 6-up subset of the screens (clustered / green-noise / blue-noise / atkinson / floyd / line) into one labelled contact sheet, each panel annotated with its real G4 size and transmission estimate, with the auto-recommended pick highlighted. Add --compare-original to lead with original-color + true-grayscale references.
  • --sample N. Emits a 4-panel diagnostic (original / grayscale / standard fax baseline / optimized output) so you can confirm legibility before transmission.
  • --recover-text-preview N. Side-by-side PNG of a page faxed with vs without the within-image text recolour, for diffing the feature.
  • Office + image input. .doc/.docx/.rtf/.odt/.txt, .ppt/.pptx/.odp, .xls/.xlsx/.ods/.csv, .png/.jpg/.tif/.bmp/.gif/.webp are all normalized to PDF first via to_pdf.py (LibreOffice headless when available, Pillow for images).
  • Cloud-fax sending. send_fax.py provides a Phaxio integration so the optimized PDF can be transmitted in the same workflow.
  • References docs. Three new long-form docs ship inside the skill: config-schema.md (every JSON-config key documented), fax-optimization.md (the why and how of every pass), sending.md (provider setup + retry semantics).

Performance and correctness

  • Per-page processing for multi-page docs: each page is rendered, dithered, encoded, and concatenated independently, keeping peak memory bounded.
  • DPI cap at 300 PPI for the bilevel raster, regardless of --fax-resolution. The receiving fax modem won't go higher anyway, and the cap keeps large docs from spinning.
  • OCR is gated behind --recover-text on so the default conversion path doesn't pay the rapidocr cost when the feature isn't requested.
  • Vectorised despeckle (despeckle_bw) — a few hundred milliseconds saved per page on the cleanup pass.
  • halftone_exclude mask — OCR-identified text background regions are excluded from the halftone screen, so glyphs sit on flat white instead of a tone band that the binarizer would have to fight.

Breaking changes (vs v1.0)

  • Skill folder renamed: pdf-optimizer/pdf-fax-optimizer/. Update your ~/.claude/skills/ or ~/.codex/skills/ install accordingly.
  • Size-optimization mode removed. Use a dedicated size-shrinking skill for that workflow.
  • Earlier feature names renamed before this tag: flatten_color_highlightspreserve_text, robust_textrecover_text. The CLI flags are --preserve-text / --no-preserve-text and --recover-text {auto,on,off}.

Install

Download pdf-fax-optimizer.zip from this release and either:

  • Claude.ai — upload via Settings → Capabilities → Skills.
  • Claude Code — unzip into ~/.claude/skills/.
  • Codex — unzip into ~/.codex/skills/.

Runtime requirements: Python 3.10+, qpdf, plus the Python deps in requirements.txt (PyMuPDF, Pillow, opencv-python, numpy, img2pdf, rapidocr-onnxruntime, optional python-docx etc. for Office inputs). LibreOffice headless is optional but recommended for .doc/.docx/.ppt/.pptx/.xls/.xlsx conversion.

Diff summary

26 commits since v1.0.0 across 31 files. The full reorganization migrated the skill from pdf-optimizer/ to pdf-fax-optimizer/, added ~5.3k lines of pipeline + references, and removed ~1.4k lines of the legacy size-mode skill.

v1.0.0 — PDF Optimizer skill

15 Jun 23:20

Choose a tag to compare

Channel-aware PDF optimization agent skill (size + fax modes). Download pdf-optimizer.zip and upload it via Claude.ai Settings > Capabilities > Skills, or drop the pdf-optimizer/ folder into ~/.claude/skills/ (Claude Code) or ~/.codex/skills/ (Codex). See README for full install instructions and requirements (qpdf + the Python deps in requirements.txt).