Print-ready PDF pipeline (PowerPoint export)

This project defines a deterministic, folder-based workflow to convert a PDF exported from PowerPoint into a print-ready PDF for inside pages only. Covers are explicitly out of scope.

The workflow is designed for Linux, fully scriptable, and suitable for execution by an AI agent. It is written specifically for Ubuntu LTS; the install.sh installer relies on APT and will not work on other systems.

The core principles are:

Deterministic and reproducible output
One folder per step, no in-place modification
Filenames always derived from the original input filename
Full audit trail with reports per step
Vector pages stay vector whenever possible
Rasterization and upscaling only where strictly necessary

Assumptions

The PowerPoint source already uses the correct final page size.
Minimum safe margin of 5 mm is already respected inside PowerPoint.
The PDF in 00-input was exported directly from PowerPoint using high-quality settings.
Inside pages are delivered as single pages, not spreads.
Target effective resolution for raster content is ≥300 dpi at final size.

Non-goals

Editing layout, text, or margins
Fixing PowerPoint design mistakes
Cover, spine, bleed, or binding calculations
Reflowing or reconstructing vector content

Repository structure

00-input/
01-validate/
02-analyze-dpi/
03-extract-images/
04-upscale-images/
05-verify-images/
06-resize-images/
07-resize-smasks/
08-replace-images/
09-normalize-pdf/
10-pdf-x4/
11-output/
preflight.sh (stdout only)

Each step reads only from earlier steps and writes only to its own folder.

Running the full pipeline

./convert.sh 00-input/boek.pdf

Runs steps 1–11 in order and stops on the first failure. Then runs preflight on both the PDF/X-4 and PDF/X-1a outputs. All script output is streamed to the terminal.

Removing converted artifacts

./remove-converted.sh 00-input/boek.pdf

Removes all pipeline outputs for the specified input (e.g., folders like 03-extract-images/boek/ and files like 08-replace-images/boek.*), while keeping the original file in 00-input/.

Naming conventions

Input file:

00-input/boek.pdf

Derived names always preserve the base name boek.

Image-based outputs are grouped in a folder named after the document:

03-extract-images/boek/...
04-upscale-images/boek/...
05-verify-images/boek/...
06-resize-images/boek/...
07-resize-smasks/boek/...

Step-by-step workflow

00-input

Purpose Starting point. Contains the PDF exported from PowerPoint.

Rules

Treat as read-only.
No renaming after ingestion.

Contents

00-input/boek.pdf

01-validate

Purpose Sanity check and baseline metadata extraction. No mutation.

Checks

Page count
Page size consistency
Encryption
Basic color space detection
File hash

Outputs

01-validate/boek.validated.txt

Fail if

Page sizes differ
PDF is encrypted or unreadable
Page count is zero

02-analyze-dpi

Purpose Identify embedded raster images below the target DPI.

Definition

Effective DPI = pixel resolution of a raster image relative to the physical size it is placed at on the page.

Outputs

02-analyze-dpi/boek.dpi.csv
02-analyze-dpi/boek.lowdpi.images.csv

boek.lowdpi.images.csv contains one image per line with page number, object id, and DPI.

Fail if

DPI analysis cannot be computed

Control flow decision

If no low-DPI images are found in step 02:

Steps 03, 04, 05, 06, and 07 are skipped.
The pipeline continues directly with step 09 (normalize PDF) using the original PDF.

In other words:

02-analyze-dpi
   ├─ low-DPI images found → 03 → 04 → 05 → 06 → 07 → 08 → 09 → 10 → 11
   └─ no low-DPI images    → 09 → 10 → 11

After step 11, run `preflight.sh` on the output PDF you want to validate.

This ensures:

No unnecessary rasterization
Output remains fully vector where possible

03-extract-images (conditional)

Purpose Extract embedded raster images from pages that contain low-DPI content.

Inputs

Images listed in boek.lowdpi.images.csv

Outputs

03-extract-images/boek/obj-<object>-<id>.png
03-extract-images/boek.images.csv

Rules

Lossless output (PNG or TIFF)
Preserve original pixel dimensions

04-upscale-images (conditional)

Purpose Upscale only the extracted images so effective resolution meets or exceeds target DPI.

Strategy

Compute required scale factor per image
Clamp to reasonable bounds (e.g. max x4)
Do not blindly upscale everything

Outputs

04-upscale-images/boek/obj-<object>-<id>.up.png

Fail if

Upscaler fails
Output dimensions do not match expected scale

05-verify-images (conditional)

Purpose Verify that upscaled images meet the target DPI. Copies any that still fall short.

Outputs

05-verify-images/boek.verify.csv
05-verify-images/boek/obj-<object>-<id>.up.png

06-resize-images (conditional)

Purpose Non-AI resize for any images that still miss target DPI after AI upscaling.

Outputs

06-resize-images/boek/obj-<object>-<id>.up.png

07-resize-smasks (conditional)

Purpose Resize soft masks (SMask) to match the resized image dimensions.

Outputs

07-resize-smasks/boek/obj-<object>-<id>.up.png

08-replace-images (conditional)

Purpose Replace the original low-DPI image objects in the PDF with the upscaled versions, preserving vector content. Replacement images are converted to CMYK to avoid color conversion during normalization.

Outputs

08-replace-images/boek.pdf
08-replace-images/boek.replace.txt

09-normalize-pdf

Purpose Prepare final print-deliverable PDF.

For PDF/X-4, normalization embeds the required output intent (ICC profile) without rasterizing or resampling content.

Typical actions

Flatten transparency
Convert to CMYK
Export to required PDF standard (PDF/X-4)
Disable resampling

Outputs

09-normalize-pdf/boek.pdf
09-normalize-pdf/boek.normalize.txt

10-pdf-x4

Purpose Set print trim and bleed boxes. Uses a 3 mm trim inset by default.

Outputs

10-pdf-x4/boek.pdf
10-pdf-x4/boek.trim.txt

11-output

Purpose Convert PDF/X-4 output to PDF/X-1a.

Outputs

11-output/boek.pdf

preflight.sh

Purpose Final verification.

Checks

Page count and sizes
Color space
Remaining RGB objects
Remaining low-DPI issues
File integrity

Outputs

Prints to stdout only.

Fail if

Page sizes differ
Low-DPI issues remain

Use preflight directly on a concrete file:

./preflight.sh 10-pdf-x4/boek.pdf
./preflight.sh 11-output/boek.pdf

The default final deliverable is 10-pdf-x4/boek.pdf, with optional PDF/X-1a output in 11-output/.

Configuration

All tunable values must live in a single config file:

TARGET_DPI=300
RASTERIZE_DPI=400
MAX_UPSCALE=4.0
UPSCALER_MODEL=RealESRGAN_x4plus
IMAGE_FORMAT=png
COLOR_PROFILE=/usr/share/color/icc/colord/FOGRA39L_coated.icc

Each report must log the effective configuration used.

Print Specs Summary (New Energy)

These requirements are summarized from New Energy’s Dutch print delivery specifications and are provided for convenience (inside pages only; covers are out of scope here).

Images should be ≥300 dpi; below 240 dpi risks visible quality loss. Avoid web‑sourced images due to low quality/rights.
Add 3 mm bleed for inside pages; bleed artwork must extend into the bleed area.
Minimum line thickness: 0.1 mm (or 0.4 mm for foil finishes).
Export as PDF/X-4 for print delivery in this pipeline (transparency preserved).
Use CMYK only (no RGB); total ink coverage should not exceed 280%.
Deep black (typically for covers): C50 M40 Y40 K100. Text/line art in body should be K100 only.
Include trim marks on export; keep offsets outside the bleed.
Deliver cPDF (certified PDF) and export using PDF/X‑4 presets. Profile used by default: /usr/share/color/icc/colord/FOGRA39L_coated.icc (Coated FOGRA39 / ISO 12647-2:2004).

Agent instructions

An AI agent working on this repository must:

Never modify files in place
Always write outputs to the next step folder
Fail fast on invariant violations
Skip steps cleanly when no-op conditions apply
Produce human-readable reports at every step

Here is the clean, final list of required tools, aligned with the workflow as described. No extras, no overlap.

Required tools (core)

These are needed to run the pipeline end to end.

PDF inspection and manipulation

poppler-utils Used for:
- pdfinfo, page size and page count
- pdfimages, extracting embedded raster images
- pdftoppm, rasterizing pages
ghostscript Used for:
- transparency flattening
- CMYK conversion
- PDF/X export
- final print normalization
qpdf Used for:
- safe PDF object inspection and replacement
- sanity checks
- metadata inspection

Image inspection and processing

ImageMagick Used for:
- verifying image dimensions
- confirming DPI after rasterization and upscaling
- format conversion (PNG, TIFF)
exiftool Used for:
- inspecting image metadata
- validating DPI and color info in images

AI upscaling

Real-ESRGAN (external, not via apt) Used for:
- upscaling extracted raster images that fall below target DPI
- GPU-accelerated where available

Recommended model:

RealESRGAN_x4plus

Scripting and orchestration

bash Primary orchestration language.
coreutils (sed, awk, grep, cut, sort, uniq, wc) Used for:
- parsing reports
- page list generation
- control flow decisions
jq (optional but recommended) Used if DPI reports are emitted as JSON instead of CSV.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
00-input		00-input
01-validate		01-validate
02-analyze-dpi		02-analyze-dpi
03-extract-images		03-extract-images
04-upscale-images		04-upscale-images
05-verify-images		05-verify-images
06-resize-images		06-resize-images
07-resize-smasks		07-resize-smasks
08-replace-images		08-replace-images
09-normalize-pdf		09-normalize-pdf
10-pdf-x4		10-pdf-x4
11-output		11-output
bin		bin
.gitignore		.gitignore
01-validate.sh		01-validate.sh
02-analyze-dpi.sh		02-analyze-dpi.sh
03-extract-images.sh		03-extract-images.sh
04-upscale-images.sh		04-upscale-images.sh
05-verify-images.sh		05-verify-images.sh
06-resize-images.sh		06-resize-images.sh
07-resize-smasks.sh		07-resize-smasks.sh
08-replace-images.sh		08-replace-images.sh
09-normalize-pdf.sh		09-normalize-pdf.sh
10-set-trim.sh		10-set-trim.sh
11-pdf-x1a.sh		11-pdf-x1a.sh
AGENTS.md		AGENTS.md
README.md		README.md
alternative-upscale.sh		alternative-upscale.sh
convert.sh		convert.sh
install.sh		install.sh
preflight.sh		preflight.sh
remove-converted.sh		remove-converted.sh
remove-failed-upscales.sh		remove-failed-upscales.sh
replace-barcode.sh		replace-barcode.sh
test-cmyk-intents.sh		test-cmyk-intents.sh

Folders and files

Latest commit

History

Repository files navigation

Print-ready PDF pipeline (PowerPoint export)

Assumptions

Non-goals

Repository structure

Running the full pipeline

Removing converted artifacts

Naming conventions

Step-by-step workflow

00-input

01-validate

02-analyze-dpi

Control flow decision

03-extract-images (conditional)

04-upscale-images (conditional)

05-verify-images (conditional)

06-resize-images (conditional)

07-resize-smasks (conditional)

08-replace-images (conditional)

09-normalize-pdf

10-pdf-x4

11-output

preflight.sh

Configuration

Print Specs Summary (New Energy)

Agent instructions

Required tools (core)

PDF inspection and manipulation

Image inspection and processing

AI upscaling

Scripting and orchestration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages