feat(skills): teach temporal thinking and visual variety#716
feat(skills): teach temporal thinking and visual variety#716miguel-heygen wants to merge 2 commits into
Conversation
Addresses the gap where LLMs default to slide-like layouts (centered text over dark background repeated for every scene). The main skill now teaches: - Temporal map: write what the viewer sees per second before any HTML - Slideshow trap: explicit anti-patterns and how to break them - Scene variety: table of layout types to rotate between - One focus per frame: billboard-per-beat principle - Beat duration guide: impact/content/atmosphere timing - Easing vocabulary: intent-based ease selection instead of power2.out Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents default to CSS rectangles for illustrations, producing amateur visuals. The skill now: - Mandates inline SVG over CSS shapes for any non-text visual - Provides a table of SVG patterns per visual need (diagrams, node graphs, data viz, icons, decoratives, waveforms) - Requires 3-layer depth per scene (background + content + accent) - Includes the stroke draw-on pattern inline since it's the most commonly needed SVG animation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vanceingalls
left a comment
There was a problem hiding this comment.
First review at e466b8d3. CI is mostly "skipping" — Format, Lint, Build, Test, Typecheck, CLI smoke, all windows checks — because the PR only touches skills/hyperframes/SKILL.md, which doesn't trigger those code paths. Required checks that did run (Analyze, CodeQL, Detect changes, Format, Semantic PR title) are green.
Audited
skills/hyperframes/SKILL.mdend-to-end (+109/-4)
Strengths
- Temporal map step is the right shape. Step 3 in the Plan (
SKILL.md:51-52) explicitly demands a one-line-per-second viewer description before any HTML. The example block (0.0s Black → title fades up…) is concrete enough that an agent can copy it; abstract enough that it adapts to any subject. This is the kind of forcing function that the system-prompt rules can enforce — the agent CAN'T write HTML without first emitting the map. - "Slideshow trap" anti-patterns are calibrated specifically. "Same layout repeated → restructure" / "Same animation repeated → each scene needs its own entrance character" / "Same color temp" / "No surprise" — these are the four most common LLM defaults this skill is fighting. Naming them by their failure mode is more effective than abstract "make it dynamic" advice.
- Easing vocabulary table maps intent → ease instead of "default to
power2.outeverywhere."snap/overshoot/soft land/mechanical/spring/dramaticis the right level of abstraction — an agent can pick from six named affects without memorizing ease curves. <HARD-GATE>block at:64is preserved — the existing "verify you have a visual identity" gate stays, and the new step 3 doesn't slip past it. Good additive layering.- Beat duration guide (impact 0.7-1.8s / content 2-4s / atmosphere 4-8s) gives the agent timing anchors. Without these the default of "2s per beat" averages everything to slideshow rhythm.
Important — this PR has been superseded by #762
#762 ("fix(cli): add source discriminator to telemetry events") includes the same two commits as this PR (3073d0ab + e466b8d3) plus one additional commit (33f809f0, the telemetry fix). #762's history is a strict superset of #716's.
If #762 merges first, this PR becomes a no-op. If this PR merges first, #762's skills-portion vanishes from the diff (becomes telemetry-only). Either flow works, but the merge queue should know — pick a target and close the other.
My recommendation: land this PR first (skills changes have separate review-and-rollback risk from telemetry; ship them independent). Then split #762 down to telemetry-only, fix its three failing required checks, and land that separately.
Important — no positive-pin test on the prompt-text changes (Rule 9)
This PR changes the prompt text the LLM agent reads to plan compositions. Per Rule 9, prompt-text changes need a positive-pin test that asserts on the specific wording — generic "the skill loads" coverage isn't enough.
Concrete asks:
- A test that asserts
"Write a temporal map first"is present in the loaded skill. - A test that asserts
"slideshow trap"(lowercase, exact phrase) is present. - A test that asserts the easing-vocabulary section has the six named affects.
This is the kind of regression that ships silently otherwise — a future wording polish or merge conflict could drop the temporal-map gate and no one would catch it until the agent's output regresses to slideshows. The HF skill is the agent's primary input — pin it.
Carve-out caveat: if the team treats the hyperframes skill as still finding its voice and is doing wording polish per merge, scope the pins to the concept (temporal map, slideshow trap, easing vocabulary) rather than exact phrases. That trades brittleness for survival across polish passes.
Nit
- The
## Think in Frames, Not Pagessection starts at:64but the cross-reference from step 3 isSee "Think in Frames" below(different wording). Either rename the section or update the reference for grep-findability.
Verdict
Verdict: APPROVE
Reasoning: The temporal-map step + slideshow-trap anti-patterns + easing-vocabulary are exactly the right shape for fighting the LLM's default-to-slides bias. The Rule 9 prompt-text pinning is the only material gap. PR is a strict subset of #762 — pick one to merge and close the other.
— Vai
Summary
Addresses the core gap where LLMs default to slide-like layouts (centered text over dark background, same layout every scene). The main hyperframes skill now teaches agents to think through frames in time rather than composing pages.
What's new in SKILL.md
Temporal map requirement (Step 3 in Plan)
"Think in Frames, Not Pages" section
Easing vocabulary table
power2.outon everythingWhat's NOT changed
Test plan
🤖 Generated with Claude Code