feat(skills): teach temporal thinking and visual variety by miguel-heygen · Pull Request #716 · heygen-com/hyperframes

miguel-heygen · 2026-05-11T16:37:46Z

Summary

Addresses the core gap where LLMs default to slide-like layouts (centered text over dark background, same layout every scene). The main hyperframes skill now teaches agents to think through frames in time rather than composing pages.

What's new in SKILL.md

Temporal map requirement (Step 3 in Plan)

Write a one-line-per-second description of what the viewer sees before any HTML
Forces the agent to think about visual interest at each moment, not just content structure

"Think in Frames, Not Pages" section

Slideshow trap: explicit anti-patterns (same layout, same animation, same color temp, no surprise)
Scene variety checklist: 7 layout types to rotate between (statement, full-bleed image, split frame, kinetic type, data beat, terminal/code, atmospheric)
One focus per frame: billboard-per-beat principle
Beat duration guide: impact (0.7-1.8s), content (2-4s), atmosphere (4-8s)

Easing vocabulary table

Intent-based ease selection (snap, overshoot, soft land, mechanical, spring, dramatic) instead of defaulting to power2.out on everything

What's NOT changed

All existing rules, data attributes, composition structure, transition rules — untouched
House style and motion principles refs stay as-is (the new sections complement, not replace)

Test plan

A/B comparison: same prompt rendered with old vs new skill (in progress, will post results)
Verify temporal map step doesn't slow down simple edits (the "skip straight to rules" escape hatch is preserved)

🤖 Generated with Claude Code

Addresses the gap where LLMs default to slide-like layouts (centered text over dark background repeated for every scene). The main skill now teaches: - Temporal map: write what the viewer sees per second before any HTML - Slideshow trap: explicit anti-patterns and how to break them - Scene variety: table of layout types to rotate between - One focus per frame: billboard-per-beat principle - Beat duration guide: impact/content/atmosphere timing - Easing vocabulary: intent-based ease selection instead of power2.out Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Agents default to CSS rectangles for illustrations, producing amateur visuals. The skill now: - Mandates inline SVG over CSS shapes for any non-text visual - Provides a table of SVG patterns per visual need (diagrams, node graphs, data viz, icons, decoratives, waveforms) - Requires 3-layer depth per scene (background + content + accent) - Includes the stroke draw-on pattern inline since it's the most commonly needed SVG animation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vanceingalls

First review at e466b8d3. CI is mostly "skipping" — Format, Lint, Build, Test, Typecheck, CLI smoke, all windows checks — because the PR only touches skills/hyperframes/SKILL.md, which doesn't trigger those code paths. Required checks that did run (Analyze, CodeQL, Detect changes, Format, Semantic PR title) are green.

Audited

skills/hyperframes/SKILL.md end-to-end (+109/-4)

Strengths

Temporal map step is the right shape. Step 3 in the Plan (SKILL.md:51-52) explicitly demands a one-line-per-second viewer description before any HTML. The example block (0.0s Black → title fades up…) is concrete enough that an agent can copy it; abstract enough that it adapts to any subject. This is the kind of forcing function that the system-prompt rules can enforce — the agent CAN'T write HTML without first emitting the map.
"Slideshow trap" anti-patterns are calibrated specifically. "Same layout repeated → restructure" / "Same animation repeated → each scene needs its own entrance character" / "Same color temp" / "No surprise" — these are the four most common LLM defaults this skill is fighting. Naming them by their failure mode is more effective than abstract "make it dynamic" advice.
Easing vocabulary table maps intent → ease instead of "default to power2.out everywhere." snap / overshoot / soft land / mechanical / spring / dramatic is the right level of abstraction — an agent can pick from six named affects without memorizing ease curves.
<HARD-GATE> block at :64 is preserved — the existing "verify you have a visual identity" gate stays, and the new step 3 doesn't slip past it. Good additive layering.
Beat duration guide (impact 0.7-1.8s / content 2-4s / atmosphere 4-8s) gives the agent timing anchors. Without these the default of "2s per beat" averages everything to slideshow rhythm.

Important — this PR has been superseded by #762

#762 ("fix(cli): add source discriminator to telemetry events") includes the same two commits as this PR (3073d0ab + e466b8d3) plus one additional commit (33f809f0, the telemetry fix). #762's history is a strict superset of #716's.

If #762 merges first, this PR becomes a no-op. If this PR merges first, #762's skills-portion vanishes from the diff (becomes telemetry-only). Either flow works, but the merge queue should know — pick a target and close the other.

My recommendation: land this PR first (skills changes have separate review-and-rollback risk from telemetry; ship them independent). Then split #762 down to telemetry-only, fix its three failing required checks, and land that separately.

Important — no positive-pin test on the prompt-text changes (Rule 9)

This PR changes the prompt text the LLM agent reads to plan compositions. Per Rule 9, prompt-text changes need a positive-pin test that asserts on the specific wording — generic "the skill loads" coverage isn't enough.

Concrete asks:

A test that asserts "Write a temporal map first" is present in the loaded skill.
A test that asserts "slideshow trap" (lowercase, exact phrase) is present.
A test that asserts the easing-vocabulary section has the six named affects.

This is the kind of regression that ships silently otherwise — a future wording polish or merge conflict could drop the temporal-map gate and no one would catch it until the agent's output regresses to slideshows. The HF skill is the agent's primary input — pin it.

Carve-out caveat: if the team treats the hyperframes skill as still finding its voice and is doing wording polish per merge, scope the pins to the concept (temporal map, slideshow trap, easing vocabulary) rather than exact phrases. That trades brittleness for survival across polish passes.

Nit

The ## Think in Frames, Not Pages section starts at :64 but the cross-reference from step 3 is See "Think in Frames" below (different wording). Either rename the section or update the reference for grep-findability.

Verdict

Verdict: APPROVE
Reasoning: The temporal-map step + slideshow-trap anti-patterns + easing-vocabulary are exactly the right shape for fighting the LLM's default-to-slides bias. The Rule 9 prompt-text pinning is the only material gap. PR is a strict subset of #762 — pick one to merge and close the other.

— Vai

miguel-heygen requested review from aszala-hg, jrusso1020 and vanceingalls May 11, 2026 16:39

vanceingalls mentioned this pull request May 13, 2026

fix(cli): add source discriminator to telemetry events #762

Closed

vanceingalls approved these changes May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): teach temporal thinking and visual variety#716

feat(skills): teach temporal thinking and visual variety#716
miguel-heygen wants to merge 2 commits into
mainfrom
feat/skill-temporal-thinking

miguel-heygen commented May 11, 2026

Uh oh!

vanceingalls left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

miguel-heygen commented May 11, 2026

Summary

What's new in SKILL.md

What's NOT changed

Test plan

Uh oh!

vanceingalls left a comment

Choose a reason for hiding this comment

Audited

Strengths

Important — this PR has been superseded by #762

Important — no positive-pin test on the prompt-text changes (Rule 9)

Nit

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants