-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context
These are field-tested lessons from regenerating 18 demo segments for tekton-dag using docgen. Every issue here was hit in production and required manual intervention. When docgen absorbs auto-scene generation (#1) and visual validation (#2), these lessons should inform the defaults and guardrails.
Lesson 1: Manim's default font is unusable
Problem: Manim uses Pango's default font which renders text with terrible kerning — characters appear individually placed, like ransom-note typography. At 720p the text is blurry and nearly unreadable.
Fix applied: Text.set_default(font="Lato") at the start of construct(). Any clean sans-serif (Lato, Inter, Roboto, Liberation Sans) is dramatically better.
Recommendation for docgen:
- Set a default font in the Manim config/template — never ship Pango defaults
- Add
manim.fonttodocgen.yamlso users can override (default:"Lato"or"Liberation Sans"for maximum availability) - Document font requirements in
docgen initscaffold output - CI smoke test should verify the configured font is installed on the system
Lesson 2: Hardcoded coordinates cause overlap — use Manim's layout system
Problem: Positioning text with absolute coordinates (move_to(UP * 1.3 + LEFT * 2.5)) is fragile. Every font, size, and content change shifts bounding boxes, causing text-on-text collisions. We went through 3 full rewrites of a single scene (RoadmapScene) trying to fix overlaps with coordinate adjustments.
Fix applied: Replaced all absolute positioning with Manim's layout primitives:
VGroup.arrange(DOWN, buff=0.25, aligned_edge=LEFT)for vertical listsmob.next_to(anchor, DOWN, buff=0.3)to chain elements below each other- Never compute Y positions manually — let Manim measure actual text bounds
Recommendation for docgen:
- Auto-generated scenes must ONLY use
arrange()andnext_to()for layout — ban absolute coordinates except for the top-level anchor points (strip Y, content area Y) - Define 2–3 layout zones (nav strip, title, content area) with fixed Y anchors; everything inside a zone uses relative positioning
- The layout engine should enforce a maximum content height per section and warn if content would overflow below the visible frame
Lesson 3: Font sizes must be minimum 14pt for video
Problem: Font sizes of 8–11pt are unreadable in video, even at 1080p. This is not print — viewers watch on screens at normal viewing distance, often in browser video players with compression artifacts.
Fix applied: Minimum body text 16pt, titles 20pt, section headings 36pt, pillar card labels 14pt.
Recommendation for docgen:
- Enforce minimum font sizes in the layout engine: body ≥ 14pt, subtitle ≥ 16pt, heading ≥ 20pt
-
docgen validateshould sample frames and flag text regions where OCR confidence is low (proxy for too-small or blurry text) -
docgen.yamlshould allowmanim.min_font_sizeoverride
Lesson 4: Render at 1080p minimum — 720p is too blurry for text-heavy content
Problem: 720p30 renders look blurry when the video contains dense text (bullet lists, code snippets, multi-line labels). Compression artifacts at 720p make small text illegible.
Fix applied: Render at 1920×1080 (or 2560×1440 for production quality). The compose step handles resolution normalization.
Recommendation for docgen:
- Default
manim.qualityindocgen.yamlshould be1080p30not720p30 -
docgen manimshould render at the configured quality and warn if below 1080p - Document that 720p is only suitable for terminal recordings (VHS), not Manim text scenes
Lesson 5: _wait_until is essential but easy to miscalculate
Problem: Manim animations must fill the exact audio duration. _wait_until(self, target_t, current_t) is the mechanism, but timing errors accumulate. If any section runs over its allocated window, subsequent _wait_until calls become no-ops and the scene desynchronizes.
Fix applied: Conservative timing — each pillar section ends ~1s before the next starts, with an explicit fade-out transition to absorb timing drift.
Recommendation for docgen:
- Auto-generated scenes should budget animation time per section from Whisper segments, with 1–2s buffer between sections
- Add a
_wait_untilwrapper that logs a warning (not crash) iftarget_t < current_t— this catches timing overflows during development - After rendering, compare actual scene duration to audio duration and warn if drift exceeds 2%
Lesson 6: Pillar/section pattern should be a reusable template
Problem: Every pillar in segment 18 follows the same visual pattern: highlight card → show title → reveal bullet list → clear. We wrote this 7 times with slight variations, which is error-prone.
Fix applied: Extracted _show_pillar(), _add_subtitle(), _reveal_list(), _clear() helper methods.
Recommendation for docgen:
- Provide a
SectionScenebase class or mixin with built-in support for: nav strip, title card, bullet reveals, key-value pairs, flow diagrams, and section transitions - Auto-scene generation should detect section boundaries from narration paragraphs and apply this pattern automatically
- Users can customize by overriding specific sections rather than writing the full
construct()method
Lesson 7: The overview "strip" of cards must scale dynamically
Problem: We started with 5 pillar cards at width=1.8. When expanding to 7 cards, they overflowed the frame width. Manual resizing to width=1.4 was needed.
Fix applied: Used VGroup.arrange(RIGHT, buff=0.15) to auto-space cards, then positioned the group as a unit.
Recommendation for docgen:
- Nav strip should auto-calculate card width based on
(frame_width - margins) / num_cards - If labels are too long, truncate or reduce font size automatically
- Max 8–10 cards before switching to a two-row layout
Lesson 8: Contrast rules for dark backgrounds
Problem: Colored text on dark backgrounds (e.g., color=C_WARN on C_BG) has insufficient contrast when the element is dimmed to 0.2 opacity. Inactive elements become invisible.
Fix applied: White text with colored accents (icon, background fill at 0.25 opacity). Inactive opacity floor of 0.4 (not 0.2).
Recommendation for docgen:
- Default text color should always be WHITE on dark backgrounds; use color only for accents (icons, borders, fills)
- Minimum opacity for dimmed elements: 0.35–0.40
-
docgen validateshould check frame-level contrast ratios (WCAG AA: 4.5:1 for text)
Lesson 9: docgen compose path conventions must match render output
Problem: docgen compose looks for Manim output at animations/media/videos/scenes/720p30/<Scene>.mp4, but programmatic renders (via Python scripts) output to animations/media/videos/720p30/<Scene>.mp4 (no scenes/ subdirectory). This caused "FREEZE GUARD" failures that were actually just file-not-found falling through to stale cached files.
Fix applied: Manual cp to the expected path after each render.
Recommendation for docgen:
-
docgen manimshould handle the render and place the output in the canonical path — users should never need to run Manim directly - If a stale file is found at the expected path, compare its duration to the audio and warn if they differ by more than 10%
- Support multiple resolution directories: look for
1080p30/,1440p60/,720p30/in priority order
Lesson 10: TTS regeneration invalidates everything downstream
Problem: After updating narration and regenerating TTS, the new audio has a different duration. This silently breaks all existing Manim timing, but nothing warns you. The compose step either pads with freeze frames (if shorter) or clips (if longer).
Fix applied: Full pipeline re-run: tts → timestamps → rewrite scene → render → compose → validate.
Recommendation for docgen:
-
docgen ttsshould emit a duration-change summary: "18-roadmap: 205.0s → 314.6s (+53%)" - If duration changed by more than 5%, print a WARNING that scenes and timestamps need regeneration
-
docgen composeshould refuse to compose if the scene MP4 was last modified before the audio MP4 (stale visual) - Add a
docgen rebuild <segment>command that runs the full pipeline: tts → timestamps → manim → compose → validate
Summary
The core theme: docgen should make the easy path the correct path. Good fonts, relative layout, adequate font sizes, proper resolution, and duration-aware validation should all be defaults — not things a user discovers after 5 hours of debugging blurry overlapping text.