Skip to content

v0.2.0-beta.2

Pre-release
Pre-release

Choose a tag to compare

@github-actions github-actions released this 11 Jun 04:36
· 9 commits to main since this release
fed4b78

Features

  • Add editppt page hints: dependency-free text-line detection and measurement on source.png (per-tile binarization, bridge-tolerant XY-cut segmentation, ink metrics). It outputs advisory text_hints.json and a labeled text_hints.png overlay so the page author fills text_boxes positions and font sizes from measurement instead of visual estimation. (#5)
  • Distribute text hints during prepare: every page directory receives text_hints.json and the labeled text_hints.png overlay alongside its source.png, so page workers start with measurements in place. With a PaddleOCR-VL token (PADDLE_OCR_TOKEN env var or ~/.editppt/config.yaml), every input type is OCR'd in a single batch job — PDFs are submitted directly and image/PPTX page sources are bundled into a temporary PDF first — with text blocks locally re-measured and rescaled to each page's resolution. Without a token or on failure, the built-in offline detector runs per page; pages where the OCR layout model collapses (dense diagrams classified as one figure, <=2 text lines while the offline detector finds 6+) automatically fall back to the offline result. --no-text-hints skips the step. (#5)
  • Add editppt run hints to regenerate a prepared run's text hints in place (e.g. right after configuring a PaddleOCR token mid-run), and make the missing-token notice an explicit ask-the-user-once checkpoint before page reconstruction instead of a fire-and-forget tip. (#5)
  • Add editppt config --paddle-ocr-token and first-use guidance: doctor reports the active text-hints backend and, when no token is configured, prepare and doctor point to the token application page (https://aistudio.baidu.com/account/accessToken). The token is stored masked in ~/.editppt/config.yaml alongside the image API credentials. (#5)
  • Snap measured font sizes to design levels: detected lines are clustered into size groups (same-level text gets exactly one font size instead of per-line jitter), exposed as size_group in the hints output. (#5)
  • Trust measured font sizes in the deterministic builder: text boxes tagged "font_size_source": "measured" are clamped only at the geometric fit limit instead of the conservative 0.9 safety shrink, which made correctly sized text systematically smaller than the source. Hand-written boxes keep the existing conservative behavior. (#5)
  • Add editppt run reset: return a failed or stuck page to pending, clearing its dispatch and result records so a new worker can be dispatched. This closes the previously dead-ended failure paths — a worker returning passed: false, a rejected record, or a lost worker. (#5)
  • Add editppt page build, editppt page contact-sheet, and editppt page validate: page workers build page.pptx/preview.png from manifest.json, create the origin-versus-preview contact sheet, and pre-check the page with the same manifest-contract checks run record runs — through documented CLI commands instead of undocumented runtime scripts. (#5)

Fixes

  • Fix page-worker prompt truncation: a nested code fence in prompts/page-worker.md cut the generated worker prompt off before the manifest field requirements, the pre-return checklist, and the return format. The prompt builder now matches the last closing fence so nested fences cannot truncate the template. (#5)

Improvements

  • Move page-worker prompt assembly out of the installable CLI and into a skill-local prompt builder script. (#5)
  • Remove the editppt run prompt subcommand so the CLI no longer reads skill prompt templates or reference files. (#5)
  • Keep CLI environment diagnostics scoped to CLI config, dependencies, and image backend readiness without requiring skill-root discovery. (#5)
  • Replace path-like prompt placeholders with explicit {{NAME}} tokens and fail the prompt build when any placeholder remains unfilled. (#5)
  • Dispatch every page to a page worker, including single-page inputs: editppt run next no longer returns a rebuild_page stage and editppt run record no longer accepts direct main-agent recording from pending, so single-page and multi-page runs follow one identical flow and one prompt contract. (#5)
  • Reject non-deliverable pages at record time: editppt run record fails with a run reset hint when validation.json does not contain top-level passed: true, so the recorded state always means deliverable and finalize can no longer be reached with failed pages aboard. (#5)

Documentation

  • Clarify that image API fallback configuration is AI-assisted, without manual CLI installation or key-configuration commands in the READMEs. (#5)
  • Document the OCR token in both READMEs: text size/position correction relies on a free PaddleOCR-VL token (application URL, config command, free-quota reassurance), replacing the outdated "no third-party OCR dependency" claim; the offline detector remains the degraded fallback. (#5)
  • Lock the three-step execution order in the worker prompt and decision tree: text hints belong to step 3 and are consumed only after background and foreground decisions, with step-1/2 image jobs submitted first so the order costs no wall-clock time. (#5)
  • Clarify that final deck assembly rebuilds from recorded page manifests rather than concatenating page-level PPTX files. (#5)
  • Document the skill-local page-worker prompt builder script in the skill workflow and CLI helper. (#5)
  • Restructure skill documentation around single-ownership: every rule lives in exactly one file, other files carry pointers. The page decision tree absorbs the QA rubric (now a Final Self-Check plus a Fix versus Warning section), the manifest schema becomes the only home for JSON field contracts, the CLI helper becomes a pure command manual, and the worker prompt shrinks to hard-rule reminders plus pointers with a mandatory read-references-first instruction. (#5)
  • Drop SKILL.md from the page-worker required reading list so workers load only page-level references instead of the parent orchestration contract. (#5)
  • Broaden skill description triggers, add a CLI availability check to the entry contract, and document non-pipx install fallbacks (uv tool install, pip install --user). (#5)
  • Document the failure-handling loop in SKILL.md Phase 3: never re-dispatch a page unchanged, diagnose repeated same-root-cause failures autonomously instead of pushing debugging questions to the user, and define the worker failure contract — a failed page returns at minimum validation.json (passed: false with the concrete reason) plus page_result.json, and leftover artifacts from a failed attempt are untrusted by the next worker. (#5)
  • Document the asset_provenance field contract (the five allowed source_type values, required source and provenance_note) and the validator's substring-level keyword scanning of inventory and provenance text. (#5)
  • Treat formula rendering blocked by missing local TeX tooling as a recorded warning with passed: true instead of an undefined pass state. (#5)
  • Add a skill documentation architecture spec to AGENTS.md: design principles (single-ownership, reader-role split, docs-and-CLI-move-together) and per-file responsibility boundaries for future contributors. (#5)