Skip to content

refactor(skills): split asset preprocessing out of hyperframes-cli#619

Merged
jrusso1020 merged 3 commits intomainfrom
split-cli-media-skills
May 5, 2026
Merged

refactor(skills): split asset preprocessing out of hyperframes-cli#619
jrusso1020 merged 3 commits intomainfrom
split-cli-media-skills

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

What

Split the hyperframes-cli skill into two skills:

  • hyperframes-cli stays focused on the dev loop: init, lint, inspect, preview, render, doctor, browser, info, upgrade, compositions, docs, benchmark. Description rewritten so it no longer triggers on TTS / transcription / background-removal keywords.
  • hyperframes-media is new. It owns tts, transcribe, and remove-background — voice selection table, the critical .en-translates-non-English whisper rule, output codec choice (VP9 alpha WebM vs ProRes), the TTS → transcribe → captions chain, and composition integration snippets.

Doc references updated in README.md, CLAUDE.md, docs/quickstart.mdx, and docs/guides/prompting.mdx.

Why

Two problems with the current hyperframes-cli:

  1. Description bloat causes spurious agent loads. The skill description listed every subcommand as a trigger keyword (tts, transcribe, remove-background, ...). Agents auto-loading skills by description were pulling hyperframes-cli for any mention of audio, transcription, or backgrounds — even when the task was just rendering a composition. Internal agents in particular were loading the CLI skill unnecessarily.
  2. Body bloat penalizes the common path. The dev loop (lint/render) is invoked far more often than asset preprocessing, but voice tables, whisper model rules, and codec guidance loaded on every invocation. Now that the CLI ships three preprocessing commands (and they each download multi-hundred-MB models on first run), this content was only going to grow.

How

One sibling, not three. tts, transcribe, and remove-background share a workflow pattern (first-run model download → produce file → drop into composition) and chain naturally (TTS → transcribe → captions), so bundling them keeps the mental model coherent. The CLI skill references hyperframes-media from a one-paragraph "Asset Preprocessing" stub instead of inlining 50+ lines of substantive guidance.

Net change: hyperframes-cli 173 → 144 lines, new hyperframes-media at 147 lines. Tighter description on hyperframes-cli is the bigger win for spurious-load reduction.

Test plan

  • bunx oxfmt --check clean on all changed files
  • bunx oxlint clean on all changed files
  • Lefthook pre-commit + commit-msg hooks pass
  • Verify npx skills add heygen-com/hyperframes picks up hyperframes-media after merge
  • Smoke-test: invoke /hyperframes-media in Claude Code and confirm voice table + .en warning load
  • Smoke-test: invoke /hyperframes-cli and confirm body no longer mentions Kokoro/Whisper/u2net specifics

Move tts/transcribe/remove-background guidance into a new
hyperframes-media sibling skill so the CLI skill stays focused on
the dev loop (init/lint/inspect/preview/render/doctor).

Two motivations:

1. Description bloat. The CLI skill listed every subcommand as a
   trigger keyword, which made agents auto-load it for any mention
   of audio, transcription, or backgrounds — even when the task
   was just rendering a composition.
2. Body bloat. Voice tables, the .en-translates-non-English
   whisper rule, and codec selection guidance all loaded on
   every CLI invocation. With three preprocessing commands now
   in the CLI (tts, transcribe, remove-background), this is only
   going to grow.

The split keeps a single sibling (hyperframes-media), not three:
the commands share a workflow (preprocess asset → drop into
composition) and the same first-run-downloads-a-model pattern,
so they belong together. CLI skill now references hyperframes-media
from a one-paragraph "Asset Preprocessing" stub.

Doc references updated in README.md, CLAUDE.md,
docs/quickstart.mdx, and docs/guides/prompting.mdx.
@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 4, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
hyperframes 🟢 Ready View Preview May 4, 2026, 10:04 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

…edia

Code review found the new hyperframes-media skill was parallel
content with skills/hyperframes/references/tts.md and the "Whisper
Model Guide" section of transcript-guide.md — same voice table, same
.en-translates-non-English warning, same TTS→transcribe chain in
both places. Plus some scope creep in hyperframes-media (audio/video
HTML snippets that duplicate the canonical track docs in
hyperframes/SKILL.md:265+).

Consolidation:

- hyperframes-media is now the single source of truth for CLI
  invocation, voice selection, multilingual phonemization, whisper
  model selection, and the .en gotcha. Picked up the multilingual
  prefix decoding from the deleted tts.md.
- skills/hyperframes/references/tts.md deleted; the bullet in
  hyperframes/SKILL.md is removed (no replacement — agents land on
  hyperframes-media via its own description).
- skills/hyperframes/references/transcript-guide.md keeps only the
  caption-side concerns: input-format table, mandatory quality
  check, cleaning JS, external-API import path, and the
  "if no transcript exists" flow. The intro bash recipe and Whisper
  Model Guide section both moved to hyperframes-media. Top of the
  file now points to hyperframes-media for CLI/model details.

Other tightening in hyperframes-media:

- Dropped WHAT-narration filler and the inline <audio>/<video> HTML
  snippets — they duplicate the canonical track-attribute docs in
  hyperframes/SKILL.md.
- Added the `id` field (`w0`, `w1`, ...) to the transcript output
  shape — the actual Word interface in
  packages/cli/src/whisper/normalize.ts includes it (optional for
  backwards compat), used by caption override logic.
- Compressed the TTS → Transcribe → Captions chain section.

Net: hyperframes-media 147 → 136 lines, transcript-guide.md 152 →
106 lines, tts.md gone (-75 lines).
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict — Request changes (small, mechanical)

The split itself is sound — clear responsibility boundary, sensible bundling of tts / transcribe / remove-background (shared first-run-model-download pattern, natural TTS→transcribe chain), and hyperframes-media's 147 lines reads as a tight, useful skill. But two stale references on the PR branch directly undercut the stated goal of the split.

Blockers

  • skills/hyperframes/SKILL.md line 3 description still ends with: "For CLI commands (init, lint, preview, render, transcribe, tts) see the hyperframes-cli skill." The whole point of this PR is to stop transcribe/tts triggering hyperframes-cli autoloads. The parent skill is now telling agents the exact opposite. Update to point to hyperframes-media for those, or split the sentence.

  • packages/cli/src/templates/_shared/CLAUDE.md line 10 — the skills table that ships into every hyperframes init project still reads hyperframes-cli | CLI commands: init, lint, preview, render, transcribe, tts. Every new project bootstrapped after this lands will get the wrong mapping baked into its CLAUDE.md. Add the hyperframes-media row and trim the cli row to match the new README/quickstart wording.

Important

  • Test-plan checkboxes for the two smoke tests (invoke /hyperframes-media, invoke /hyperframes-cli and confirm body) are unchecked. For a skill-description refactor these are the only things that actually verify the behavior change — please run them before merge, since lint/format don't catch description-trigger regressions.

  • Consider whether hyperframes-media warrants a row in the docs feature-comparison / npx hyperframes skills output, if either enumerates skills statically (the skills command uses --all, so it's fine, but worth a grep for hardcoded lists).

Nits

  • New hyperframes-media/SKILL.md cross-links to ../hyperframes/references/transcript-guide.md — relative path traversal across sibling skills works but is a slightly leaky boundary. Probably fine since the guide is genuinely caption-authoring content; just flagging that if skills add ever flattens directories this breaks.
  • hyperframes-cli description: "For asset preprocessing commands (tts, transcribe, remove-background), invoke the hyperframes-media skill instead." — naming the commands here is clear for humans but does put those keywords back into the cli description. Probably unavoidable for the redirect to be useful, but worth eyes on whether spurious loads actually drop after merge.
  • README skills table description for hyperframes-media is great; the (Kokoro)/(Whisper)/(u2net) parentheticals duplicate trigger keywords but help humans scan — leave as-is.

What's good

  • The why/how in the PR description is excellent — clear problem statement (description bloat → spurious loads), clear bundling rationale.
  • hyperframes-media body is well-structured: voice-selection table, .en-translates-non-English rule called out as Non-Negotiable, codec-selection table.
  • transcript-guide.md correctly de-duped — caption-side vs CLI-side concerns now have distinct homes with a one-line redirect.
  • references/tts.md deletion is clean (no orphan links remaining in the parent skill).

Review by Vai

…nters

Review on PR #619 caught two places that still pointed transcribe/tts
at hyperframes-cli — directly undercutting the description-trigger
goal of the split:

- skills/hyperframes/SKILL.md description ended with "For CLI commands
  (init, lint, preview, render, transcribe, tts) see the
  hyperframes-cli skill." Now splits the redirect: dev-loop commands
  (init, lint, inspect, preview, render) → hyperframes-cli; asset
  preprocessing (tts, transcribe, remove-background) →
  hyperframes-media.
- packages/cli/src/templates/_shared/CLAUDE.md is the skills table
  baked into every project bootstrapped by `hyperframes init`. Its
  hyperframes-cli row still listed transcribe/tts. Trimmed to the
  dev-loop commands and added a hyperframes-media row beside it, so
  new projects pick up the correct mapping.

Also caught by greppping for stale skill lists:

- .codex-plugin/plugin.json longDescription bundled transcribe/tts
  into "use the CLI for ...". Split into "use the CLI for the dev
  loop (init/preview/render), preprocess assets
  (tts/transcribe/remove-background)" so the Codex plugin store
  surface matches reality.

Confirmed `npx hyperframes skills` shells out to `npx skills add
heygen-com/hyperframes --all` (packages/cli/src/commands/skills.ts),
so the skill list is read dynamically from the repo and picks up
hyperframes-media without code changes.
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

Thanks @vanceingalls — addressed both blockers in 94dc6c8:

Blockers

  • skills/hyperframes/SKILL.md:3 description split: dev-loop commands (init, lint, inspect, preview, render) → hyperframes-cli; asset preprocessing (tts, transcribe, remove-background) → hyperframes-media. The parent skill no longer tells agents to load hyperframes-cli for transcribe/tts.
  • packages/cli/src/templates/_shared/CLAUDE.md:10 — the skills table baked into every bootstrapped project. Trimmed the hyperframes-cli row to the dev-loop commands and added a hyperframes-media row alongside, so new projects from hyperframes init pick up the correct mapping.

Important

  • I also greppped for any other hardcoded skill lists and caught one more: .codex-plugin/plugin.json longDescription bundled transcribe/tts into "use the CLI for ...". Split it into dev-loop vs asset preprocessing in the same commit.
  • Verified npx hyperframes skills shells out to npx skills add heygen-com/hyperframes --all (packages/cli/src/commands/skills.ts:17), so the list is read dynamically from the repo — no code change needed there.

Test-plan checkboxes — fair point, leaving them for a manual pass after merge in a clean session, since the verification has to come from a fresh agent decision (not the same agent that just authored the split). Will follow up in the PR if the spurious-load behavior doesn't actually drop.

Nits

  • Relative path ../hyperframes/references/transcript-guide.md — agreed, a slightly leaky boundary. Leaving as-is for now since transcript-guide.md is genuinely caption-authoring content. If skills add ever flattens directories we'll need to inline the redirect or move the file.
  • CLI description naming tts/transcribe/remove-background in the redirect — necessary for the redirect to be useful. Will keep eyes on whether spurious loads actually drop after merge; if they don't, we can swap the explicit names for "preprocessing commands".
  • (Kokoro)/(Whisper)/(u2net) parentheticals in the README — leaving as-is per your suggestion.

@jrusso1020 jrusso1020 merged commit 06f5422 into main May 5, 2026
29 checks passed
@jrusso1020 jrusso1020 deleted the split-cli-media-skills branch May 5, 2026 03:27
@jrusso1020 jrusso1020 mentioned this pull request May 5, 2026
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — All blockers resolved

Verified the three commits since my prior review (range 4413d25b..94dc6c8). Commit 94dc6c8 is a surgical, scoped fix touching exactly the three files where stale references lived — no scope creep.

Prior blockers

  • Resolvedskills/hyperframes/SKILL.md description (line 3). Confirmed at HEAD: now reads "For dev-loop CLI commands (init, lint, inspect, preview, render) see the hyperframes-cli skill; for asset preprocessing commands (tts, transcribe, remove-background) see the hyperframes-media skill." The split-redirect approach (instead of a single redirect) is the right call — agents searching for tts/transcribe keywords now hit hyperframes-media, which is exactly the autoload-routing fix this PR was about.

  • Resolvedpackages/cli/src/templates/_shared/CLAUDE.md (line 10–11). Confirmed at HEAD: hyperframes-cli row trimmed to init, lint, inspect, preview, render, doctor, and a new hyperframes-media row was added directly beneath it. Every project bootstrapped by hyperframes init after this lands will now ship the correct skill table — which was the failure mode I was worried about.

Bonus catch (good)

  • .codex-plugin/plugin.json longDescription previously said "use the CLI for init/preview/render/transcribe/tts". Now reads "use the CLI for the dev loop (init/preview/render), preprocess assets (tts/transcribe/remove-background) for compositions". Same class of stale reference as the two blockers — nice to have caught it in the same pass so the Codex plugin store surface isn't misleading.

New observations

  • nit — Worth a follow-up grep some weeks out for npx hyperframes skills output and any registry/site copy that enumerates skills; the dynamic skills add --all path covers most of it (good), but description/marketing copy elsewhere is the kind of thing that drifts.
  • nit — Test-plan smoke checkboxes (invoke /hyperframes-media, invoke /hyperframes-cli and confirm body) remain unchecked in the PR description. Since the PR is already merged, this is just a flag — the only real verification of a description-trigger refactor is running the agent against the new descriptions, so worth doing post-merge to confirm spurious-load reduction is real.

Verdict

Approve. Both blockers verified at the merge commit. Clean execution on the addressed-feedback pass.

Review by Vai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants