Release v0.2.0 · ningzimu/image-to-editable-ppt-skill

Features

Add the installable skill-local editppt CLI package with setup, doctor, config, prepare, run, image, and formula command groups. (#3)
Add a unified image backend through editppt image, using Codex OAuth when available and OpenAI-compatible API fallback credentials from ~/.editppt/config.yaml. (#3)
Add concurrent editppt image batch support for generate/edit jobs, including reference-image edit inputs. (#3)
Add editppt formula render-latex for rendering LaTeX formulas into PPT image assets and manifest fragments. (#3)
Add source-aspect-preserving slide preparation with automatic custom slide canvases and content boxes for non-widescreen inputs. (#3)
Add editppt page hints: dependency-free text-line detection and measurement on source.png (per-tile binarization, bridge-tolerant XY-cut segmentation, ink metrics). It outputs advisory text_hints.json and a labeled text_hints.png overlay so the page author fills text_boxes positions and font sizes from measurement instead of visual estimation. (#5)
Distribute text hints during prepare: every page directory receives text_hints.json and the labeled text_hints.png overlay alongside its source.png, so page workers start with measurements in place. With a PaddleOCR-VL token (PADDLE_OCR_TOKEN env var or ~/.editppt/config.yaml), every input type is OCR'd in a single batch job — PDFs are submitted directly and image/PPTX page sources are bundled into a temporary PDF first — with text blocks locally re-measured and rescaled to each page's resolution. Without a token or on failure, the built-in offline detector runs per page; pages where the OCR layout model collapses (dense diagrams classified as one figure, <=2 text lines while the offline detector finds 6+) automatically fall back to the offline result. --no-text-hints skips the step. (#5)
Add editppt run hints to regenerate a prepared run's text hints in place (e.g. right after configuring a PaddleOCR token mid-run), and make the missing-token notice an explicit ask-the-user-once checkpoint before page reconstruction instead of a fire-and-forget tip. (#5)
Add editppt config --paddle-ocr-token and first-use guidance: doctor reports the active text-hints backend and, when no token is configured, prepare and doctor point to the token application page (https://aistudio.baidu.com/account/accessToken). The token is stored masked in ~/.editppt/config.yaml alongside the image API credentials. (#5)
Snap measured font sizes to design levels: detected lines are clustered into size groups (same-level text gets exactly one font size instead of per-line jitter), exposed as size_group in the hints output. (#5)
Trust measured font sizes in the deterministic builder: text boxes tagged "font_size_source": "measured" are clamped only at the geometric fit limit instead of the conservative 0.9 safety shrink, which made correctly sized text systematically smaller than the source. Hand-written boxes keep the existing conservative behavior. (#5)
Add editppt run reset: return a failed or stuck page to pending, clearing its dispatch and result records so a new worker can be dispatched. This closes the previously dead-ended failure paths — a worker returning passed: false, a rejected record, or a lost worker. (#5)
Add editppt page build, editppt page contact-sheet, and editppt page validate: page workers build page.pptx/preview.png from manifest.json, create the origin-versus-preview contact sheet, and pre-check the page with the same manifest-contract checks run record runs — through documented CLI commands instead of undocumented runtime scripts. (#5)

Improvements

Move deterministic runtime code from loose skill scripts into the self-contained editppt CLI package and remove legacy script entrypoints from the installable skill root. (#3)
Rework the workflow around CLI-managed run state: editppt prepare, editppt run next, prompt, dispatch, record, and finalize. (#3)
Dispatch multi-page inputs directly to page workers according to runtime concurrency slots, with a default concurrency of 6. (#3)
Rebuild the final PPTX from recorded page manifests during editppt run finalize, making manifest.json the authoritative final assembly source. (#3)
Validate each page PPTX against its page manifest during editppt run record so page-local outputs cannot bypass the manifest contract. (#3)
Require source-pixel coordinates for positioned manifest objects and reject manifests that omit required box_px, points_px, or polygon_px fields. (#3)
Add deterministic text fitting in the manifest builder to clamp oversized first-draft text boxes before preview and PPTX output. (#3)
Route foreground bitmap assets through source-faithful asset sheets and remove the public source-crop image workflow. (#3)
Store only page artifacts, hashes, and validation outputs in page result records. (#3)
Simplify page correction flow so page reconstructors fix page-local issues before record instead of creating repair queues. (#3)
Expose image backend usage, asset-sheet processing, formula rendering, and run orchestration guidance through agent-friendly CLI help. (#3)
Move page-worker prompt assembly out of the installable CLI and into a skill-local prompt builder script. (#5)
Remove the editppt run prompt subcommand so the CLI no longer reads skill prompt templates or reference files. (#5)
Keep CLI environment diagnostics scoped to CLI config, dependencies, and image backend readiness without requiring skill-root discovery. (#5)
Replace path-like prompt placeholders with explicit {{NAME}} tokens and fail the prompt build when any placeholder remains unfilled. (#5)
Dispatch every page to a page worker, including single-page inputs: editppt run next no longer returns a rebuild_page stage and editppt run record no longer accepts direct main-agent recording from pending, so single-page and multi-page runs follow one identical flow and one prompt contract. (#5)
Reject non-deliverable pages at record time: editppt run record fails with a run reset hint when validation.json does not contain top-level passed: true, so the recorded state always means deliverable and finalize can no longer be reached with failed pages aboard. (#5)

Fixes

Resolve editppt image process-sheet --asset-sheet-source relative paths from the page directory. (#3)
Accept structured text_inventory entries during PPTX validation. (#3)
Align single-page direct recording, page-worker prompt paths, and asset-sheet helper examples with the actual editppt runtime state machine. (#3)
Reject recorded or final page manifests whose positioned objects would otherwise fall back to default top-left locations. (#3)
Preserve custom deck size metadata when finalizing decks from manifests instead of forcing all outputs into widescreen mode. (#3)
Fix page-worker prompt truncation: a nested code fence in prompts/page-worker.md cut the generated worker prompt off before the manifest field requirements, the pre-return checklist, and the return format. The prompt builder now matches the last closing fence so nested fences cannot truncate the template. (#5)

Documentation

Translate installable skill documentation and agent metadata to English. (#3)
Rewrite the skill workflow and page-worker prompt around the editppt CLI-first contract. (#3)
Replace legacy architecture, state-machine, subagent, repair, and imagegen references with a shorter cli-helper.md, manifest schema, page decision tree, and QA rubric. (#3)
Document that page manifests must be sufficient to rebuild page PPTX files and final decks. (#3)
Document source-pixel coordinate requirements and deterministic text-fitting behavior for page manifests. (#3)
Require absolute worker prompt paths, real page-worker dispatch for multi-page runs, and top-level passed in page validation outputs. (#3)
Update Chinese and English README files for CLI installation, update instructions, backend configuration, multi-agent usage, and reconstruction limits. (#3)
Clarify that image API fallback configuration is AI-assisted, without manual CLI installation or key-configuration commands in the READMEs. (#5)
Document the OCR token in both READMEs: text size/position correction relies on a free PaddleOCR-VL token (application URL, config command, free-quota reassurance), replacing the outdated "no third-party OCR dependency" claim; the offline detector remains the degraded fallback. (#5)
Lock the three-step execution order in the worker prompt and decision tree: text hints belong to step 3 and are consumed only after background and foreground decisions, with step-1/2 image jobs submitted first so the order costs no wall-clock time. (#5)
Clarify that final deck assembly rebuilds from recorded page manifests rather than concatenating page-level PPTX files. (#5)
Document the skill-local page-worker prompt builder script in the skill workflow and CLI helper. (#5)
Restructure skill documentation around single-ownership: every rule lives in exactly one file, other files carry pointers. The page decision tree absorbs the QA rubric (now a Final Self-Check plus a Fix versus Warning section), the manifest schema becomes the only home for JSON field contracts, the CLI helper becomes a pure command manual, and the worker prompt shrinks to hard-rule reminders plus pointers with a mandatory read-references-first instruction. (#5)
Drop SKILL.md from the page-worker required reading list so workers load only page-level references instead of the parent orchestration contract. (#5)
Broaden skill description triggers, add a CLI availability check to the entry contract, and document non-pipx install fallbacks (uv tool install, pip install --user). (#5)
Document the failure-handling loop in SKILL.md Phase 3: never re-dispatch a page unchanged, diagnose repeated same-root-cause failures autonomously instead of pushing debugging questions to the user, and define the worker failure contract — a failed page returns at minimum validation.json (passed: false with the concrete reason) plus page_result.json, and leftover artifacts from a failed attempt are untrusted by the next worker. (#5)
Document the asset_provenance field contract (the five allowed source_type values, required source and provenance_note) and the validator's substring-level keyword scanning of inventory and provenance text. (#5)
Treat formula rendering blocked by missing local TeX tooling as a recorded warning with passed: true instead of an undefined pass state. (#5)
Add a skill documentation architecture spec to AGENTS.md: design principles (single-ownership, reader-role split, docs-and-CLI-move-together) and per-file responsibility boundaries for future contributors. (#5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Features

Improvements

Fixes

Documentation

Uh oh!