v0.2.0-beta.2
Pre-release
Pre-release
·
9 commits
to main
since this release
Features
- Add
editppt page hints: dependency-free text-line detection and measurement onsource.png(per-tile binarization, bridge-tolerant XY-cut segmentation, ink metrics). It outputs advisorytext_hints.jsonand a labeledtext_hints.pngoverlay so the page author fillstext_boxespositions and font sizes from measurement instead of visual estimation. (#5) - Distribute text hints during prepare: every page directory receives
text_hints.jsonand the labeledtext_hints.pngoverlay alongside itssource.png, so page workers start with measurements in place. With a PaddleOCR-VL token (PADDLE_OCR_TOKENenv var or~/.editppt/config.yaml), every input type is OCR'd in a single batch job — PDFs are submitted directly and image/PPTX page sources are bundled into a temporary PDF first — with text blocks locally re-measured and rescaled to each page's resolution. Without a token or on failure, the built-in offline detector runs per page; pages where the OCR layout model collapses (dense diagrams classified as one figure, <=2 text lines while the offline detector finds 6+) automatically fall back to the offline result.--no-text-hintsskips the step. (#5) - Add
editppt run hintsto regenerate a prepared run's text hints in place (e.g. right after configuring a PaddleOCR token mid-run), and make the missing-token notice an explicit ask-the-user-once checkpoint before page reconstruction instead of a fire-and-forget tip. (#5) - Add
editppt config --paddle-ocr-tokenand first-use guidance: doctor reports the active text-hints backend and, when no token is configured, prepare and doctor point to the token application page (https://aistudio.baidu.com/account/accessToken). The token is stored masked in~/.editppt/config.yamlalongside the image API credentials. (#5) - Snap measured font sizes to design levels: detected lines are clustered into size groups (same-level text gets exactly one font size instead of per-line jitter), exposed as
size_groupin the hints output. (#5) - Trust measured font sizes in the deterministic builder: text boxes tagged
"font_size_source": "measured"are clamped only at the geometric fit limit instead of the conservative 0.9 safety shrink, which made correctly sized text systematically smaller than the source. Hand-written boxes keep the existing conservative behavior. (#5) - Add
editppt run reset: return a failed or stuck page topending, clearing its dispatch and result records so a new worker can be dispatched. This closes the previously dead-ended failure paths — a worker returningpassed: false, a rejected record, or a lost worker. (#5) - Add
editppt page build,editppt page contact-sheet, andeditppt page validate: page workers buildpage.pptx/preview.pngfrommanifest.json, create the origin-versus-preview contact sheet, and pre-check the page with the same manifest-contract checksrun recordruns — through documented CLI commands instead of undocumented runtime scripts. (#5)
Fixes
- Fix page-worker prompt truncation: a nested code fence in
prompts/page-worker.mdcut the generated worker prompt off before the manifest field requirements, the pre-return checklist, and the return format. The prompt builder now matches the last closing fence so nested fences cannot truncate the template. (#5)
Improvements
- Move page-worker prompt assembly out of the installable CLI and into a skill-local prompt builder script. (#5)
- Remove the
editppt run promptsubcommand so the CLI no longer reads skill prompt templates or reference files. (#5) - Keep CLI environment diagnostics scoped to CLI config, dependencies, and image backend readiness without requiring skill-root discovery. (#5)
- Replace path-like prompt placeholders with explicit
{{NAME}}tokens and fail the prompt build when any placeholder remains unfilled. (#5) - Dispatch every page to a page worker, including single-page inputs:
editppt run nextno longer returns arebuild_pagestage andeditppt run recordno longer accepts direct main-agent recording frompending, so single-page and multi-page runs follow one identical flow and one prompt contract. (#5) - Reject non-deliverable pages at record time:
editppt run recordfails with arun resethint whenvalidation.jsondoes not contain top-levelpassed: true, so therecordedstate always means deliverable and finalize can no longer be reached with failed pages aboard. (#5)
Documentation
- Clarify that image API fallback configuration is AI-assisted, without manual CLI installation or key-configuration commands in the READMEs. (#5)
- Document the OCR token in both READMEs: text size/position correction relies on a free PaddleOCR-VL token (application URL, config command, free-quota reassurance), replacing the outdated "no third-party OCR dependency" claim; the offline detector remains the degraded fallback. (#5)
- Lock the three-step execution order in the worker prompt and decision tree: text hints belong to step 3 and are consumed only after background and foreground decisions, with step-1/2 image jobs submitted first so the order costs no wall-clock time. (#5)
- Clarify that final deck assembly rebuilds from recorded page manifests rather than concatenating page-level PPTX files. (#5)
- Document the skill-local page-worker prompt builder script in the skill workflow and CLI helper. (#5)
- Restructure skill documentation around single-ownership: every rule lives in exactly one file, other files carry pointers. The page decision tree absorbs the QA rubric (now a Final Self-Check plus a Fix versus Warning section), the manifest schema becomes the only home for JSON field contracts, the CLI helper becomes a pure command manual, and the worker prompt shrinks to hard-rule reminders plus pointers with a mandatory read-references-first instruction. (#5)
- Drop SKILL.md from the page-worker required reading list so workers load only page-level references instead of the parent orchestration contract. (#5)
- Broaden skill description triggers, add a CLI availability check to the entry contract, and document non-pipx install fallbacks (
uv tool install,pip install --user). (#5) - Document the failure-handling loop in SKILL.md Phase 3: never re-dispatch a page unchanged, diagnose repeated same-root-cause failures autonomously instead of pushing debugging questions to the user, and define the worker failure contract — a failed page returns at minimum
validation.json(passed: falsewith the concrete reason) pluspage_result.json, and leftover artifacts from a failed attempt are untrusted by the next worker. (#5) - Document the
asset_provenancefield contract (the five allowedsource_typevalues, requiredsourceandprovenance_note) and the validator's substring-level keyword scanning of inventory and provenance text. (#5) - Treat formula rendering blocked by missing local TeX tooling as a recorded warning with
passed: trueinstead of an undefined pass state. (#5) - Add a skill documentation architecture spec to AGENTS.md: design principles (single-ownership, reader-role split, docs-and-CLI-move-together) and per-file responsibility boundaries for future contributors. (#5)