In [1]:
import os
import pathlib
import pickle
import subprocess

import httpx
from dotenv import load_dotenv
from google import genai
from google.genai import types as gemini_types
from langchain_google_genai import ChatGoogleGenerativeAI

load_dotenv(dotenv_path="../../../.env", override=True)

aistudio_gemini_api_key = os.environ['GOOGLE_API_KEY']
print(aistudio_gemini_api_key[:3] + '...' + aistudio_gemini_api_key[-1:])
ai_studio_gemini_client = genai.Client(api_key=aistudio_gemini_api_key)


vertex_gemini_key = os.environ["VERTEX_AI_KEY"]
headers = {"Authorization": f"Basic {vertex_gemini_key}"}
base_url = os.environ["VERTEX_AI_BASE_URL"]

http_options=gemini_types.HttpOptions(
        headers=headers,
        api_version="",
        base_url=base_url,
        httpx_client=httpx.Client(verify=False)
    )


gemini_client = genai.Client(
    api_key="dummy_value_to_by_pass_validation",
    http_options=http_options
)

MODEL_NAME = "gemini-3-pro-preview"

langchain_client = ChatGoogleGenerativeAI(
    model=MODEL_NAME,
    temperature=1.0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)
langchain_client.client = gemini_client



AIz...8


In [2]:

from deepagents import create_deep_agent
from deepagents.backends import FilesystemBackend

manim_coding_agent_prompt = """You are an Expert Manim Animator with access to documentation and a workspace.

## Your Goal
Complete the `workspace/scene.py` file with Manim code that implements all the requested animations in a single Scene class.

## Workspace Structure
You have access to two folders:
- `manim_docs/` - **READ-ONLY** Manim documentation (tutorials, guides, API reference)
- `workspace/` - Contains `scene.py` file to complete (already has boilerplate code)

Your task is to **add a single Scene class with all animations below this boilerplate**. All storyboard scenes should be implemented as sequential animations within one `construct()` method.

## CRITICAL: Vertical Video Format (9:16)
The videos are in **vertical format** (1080*1920, portrait orientation). The scene measures **8 units in width and 14 units in height** (9:16 ratio). Keep this in mind:
- **Frame is tall and narrow** — you have much more vertical space than horizontal
- **Stack elements vertically** rather than spreading horizontally
- **Use more vertical space** — UP/DOWN positioning is preferred over LEFT/RIGHT
- **Text and objects should be sized appropriately** for the narrower width, use smaller fonts and objects if needed
- **Avoid wide layouts** — they will be cut off or cramped

## CRITICAL: Research Before Coding
You have documentation tools available and you MUST use them before writing any Manim code:
- **ALWAYS** look up the exact API for classes, methods, and parameters you plan to use
- **NEVER** guess or rely on memory—verify everything in the docs first
- **RESEARCH FIRST**, then write code based on what you find

## Your Tools

| Tool | Description |
|------|-------------|
| `ls` | List files in a directory with metadata (size, modified time) |
| `read_file` | Read file contents with line numbers, supports offset/limit for large files |
| `write_file` | Create new files |
| `edit_file` | Perform exact string replacements in files (with global replace mode) |
| `glob` | Find files matching patterns (e.g., `**/*.py`) |
| `grep` | Search file contents with multiple output modes (files only, content with context, or counts) |

## Documentation Structure (in `manim_docs/`)
- `tutorials/` - Getting started tutorials with code examples
  - `quickstart.md` - Basic Scene patterns, Circle, Square, Transform, .animate syntax
  - `building_blocks.md` - Mobjects, animations, scenes fundamentals
  - `output_and_config.md` - Rendering settings and CLI options
- `guides/` - In-depth how-to guides
  - `using_text.md` - Text, MarkupText, Tex, MathTex rendering
  - `configuration.md` - Config options and customization
  - `deep_dive.md` - Manim internals and advanced concepts
  - `add_voiceovers.md` - Adding audio narration
- `reference_index/` - Category indices (start here to find which module you need)
  - `animations.md` - Animation classes by category (creation, fading, transform, etc.)
  - `mobjects.md` - Mobject classes by category (geometry, text, graph, table, etc.)
  - `scenes.md` - Scene types (Scene, ThreeDScene, MovingCameraScene, etc.)
  - `cameras.md`, `configuration.md`, `utilities_misc.md`
- `reference/` - Detailed API docs for individual classes/functions (385 .md files)
  - Named as `manim.<module>.<class>.md` (e.g., `manim.animation.creation.Create.md`)
  - Contains parameters, methods, attributes, and usage examples
- `reference.md` - Full hierarchical module index (lists all classes organized by module)

## Required Workflow
1. **Read the scene file**: Use `read_file workspace/scene.py` to see the existing boilerplate
2. **Identify what you need**: What classes/methods will implement the storyboard animations?
3. **Search the docs**: Use `glob` or `grep` to find relevant documentation in `manim_docs/`
4. **Read the details**: Use `read_file` to get exact parameters, signatures, and examples
5. **Complete the scene**: Use `edit_file` to add your Scene class code to `workspace/scene.py`

## Key Manim Patterns (for reference)
- Scenes: Subclass `Scene`, implement `construct()`, use `self.play()` for animations
- Mobjects: Circle, Square, Text, MathTex, Arrow, VGroup, Matrix, Table, etc.
- Animations: Create, FadeIn, FadeOut, Transform, ReplacementTransform, Write, GrowArrow
- Positioning: `.move_to()`, `.next_to()`, `.shift()`, constants like UP, DOWN, LEFT, RIGHT
- Animate syntax: `mobject.animate.method()` to animate property changes

## Vertical Layout Tips
- Place titles at the top (UP * 5 or higher)
- Stack content vertically with consistent spacing
- Use `.scale()` to ensure objects fit within width=8 units
- Consider using `VGroup` with `.arrange(DOWN)` for vertical stacking

Remember: ALWAYS use your tools to research the documentation before writing code. Complete `workspace/scene.py` by adding your Scene class with all animations below the existing boilerplate."""


manim_animator_agent = create_deep_agent(
    model=langchain_client,
    system_prompt=manim_coding_agent_prompt,
    backend=FilesystemBackend(root_dir="./agent_workspace", virtual_mode=True),
)


In [32]:
with open('./attention_is_all_you_need_gemini_3_flash_storyboard_1.pkl', 'rb') as f:
    storyboard = pickle.load(f)


In [33]:
from eduly.models import TopicStoryboard

storyboard = TopicStoryboard.model_validate_json(storyboard.text)

In [34]:
# List all available scenes in the storyboard
print(f"Storyboard: {storyboard.topic_name}")
print(f"Total scenes: {len(storyboard.scenes)}")
print("=" * 60)
for i, scene in enumerate(storyboard.scenes):
    print(f"  [{i}] ({scene.scene_type}) {scene.title}")


Storyboard: The Core Mechanism: Scaled Dot-Product Attention
Total scenes: 5
  [0] (hook) The Library of Vectors
  [1] (mid) Measuring Similarity: The Dot Product
  [2] (mid) The Scaling Factor: Safety Valve
  [3] (mid) Softmax and the Weighted Sum
  [4] (closing) The Full Engine


In [36]:
subprocess.run(["rm -rf ./workspace/*"], shell=True, capture_output=True, text=True, cwd=os.path.dirname(os.path.abspath("./agent_workspace/workspace/")))

boilerplate_code = """# The videos are meant to be in vertical format (1080*1920, portrait orientation).
# Manually set the frame size, height, and width to ensure the scene is rendered correctly.
# Note: Now the scene measures 8 units in width and 14 units in height (9:16 ratio).

from manim import *

config.frame_size = [1080, 1920]
config.frame_height = 14
config.frame_width = 8
"""


with open('./agent_workspace/workspace/scene.py', 'w') as f:
    f.write(boilerplate_code)


In [6]:

storyboard_text = ""
for i, scene in enumerate(storyboard.scenes):
    storyboard_text += f"# Scene {i+1}\n"
    storyboard_text += scene.to_text() + '\n'


storyboard_prompt = f"""## Storyboard

{storyboard_text}
## Scene File

The following file in `workspace/` is ready for you to complete:
- `scene.py`

Complete the file by adding your class and animation code below the existing boilerplate.
"""
print(storyboard_prompt)



## Storyboard

# Scene 1
## [hook] The Library of Vectors
On a dark background, three distinct clusters of colorful boxes appear. The first cluster, labeled 'Queries' (Q), is a set of deep blue 1x4 horizontal rectangles. The second, labeled 'Keys' (K), is a set of golden yellow 1x4 horizontal rectangles. The third, 'Values' (V), consists of emerald green 1x4 horizontal rectangles. A single blue Query vector (q1) floats to the center. Below it, three golden Key vectors (k1, k2, k3) are lined up horizontally. The Query vector pulses, and faint gray arrows point from it to each Key, suggesting a search process. Text appears: 'What are we looking for? (Query) vs. What information is available? (Key)'.

# Scene 2
## [mid] Measuring Similarity: The Dot Product
The blue Query (q1) and the first golden Key (k1) align vertically. They expand slightly to show their numerical components: q1 = [0.5, 0.1, 0.8, -0.2] and k1 = [0.4, 0.0, 0.9, 0.1]. As they overlap, the corresponding elements multiply

In [37]:
result = manim_animator_agent.invoke({"messages": [{"role": "user", "content": storyboard_prompt}]})

In [38]:
for x in result['messages']:
    print(x)

content=[] additional_kwargs={'function_call': {'name': 'read_file', 'arguments': '{"file_path": "workspace/scene.py"}'}, '__gemini_function_call_thought_signatures__': {'112a9d5a-dec3-4c1d-9332-6cf5afed23e3': 'CuYQAY89a1+7gddfkDiF7nD1+hJ6sDFVHtm/aXBZsgBbrpPvIJWcyrTw4XL9n+mn91W2n7+Aqr86DuPnK3L/6FlC57EfU74SfFcCXIJwwMmPh9UCik1sJWoyOWJKwgEKVD6xM40dtPYZ7ArH6fl1xwiv/sEYSGAHJLYlsL6RSsChbGEkXTajdrcRD+rSXUeucRUbRDTf9YN4SkDX72qRPkBaO/XR7jORX4pa1J/CDdcFhdopkYKZvv9o0VCXb3BgA4DKcz7V1jM+CmRXm2DNLMwBmbnZ2h0DSyIgdsS4K349xAhZg2z0+wkesgk8xThhD+YeJqQUjg0vMqMR8yLxCjXXWyC70yklYbiVjx/wr0u6Dk0EkmbIHrq17gski1yshrBoSjP6xsot2sAWfDUoYCZljJqT0hAAw9aE+Qa+QFbusHz1nlEyr9ZzIKP9a7uszFkFkbWYaa++b3a0Om6IJo9TwQRv7zBITbj/NElnmYNXdGJgxkO6yjuVXMyeSOSkCFVNSjM/tCcMu7+MwAaHMpJqPHut65kro6+uJ7UPqveVLxCtsW7ptsBKxpmgmaWNTvjoj0038SkfOwEBLUoavayJ0FCOnCzh79vPqmfAb4oPuIIwCBFi44RQKHGpHfJog4NBU+8R8Cjld8csCOQl8894E7xV5wnnvudPE5gTZoLyyUfio5URnOXzBppsXugIOtYD9YJFHEFAErJsYVpx7a5YfH+5Kgro75L1rX2JreAL9LkIP47vCXOMgxKvS629BNQ3WxgCbmtUxa1tM938N

In [40]:
print(result['messages'][-1].content[0]['text'])

The `workspace/scene.py` file has been completed with the `AttentionMechanism` class. The scene implements all 5 requested animations in a vertical format (9:16), including:

1.  **The Library of Vectors**: Visualizes Q, K, and V clusters and the initial query search.
2.  **Measuring Similarity**: Animates the dot product calculation between Q1 and K vectors, showing vector expansion to matrices and score summation.
3.  **The Scaling Factor**: Demonstrates the vanishing gradient problem with large scores using an `Axes` graph and sigmoid curve, and applies the scaling factor `1/sqrt(d_k)`.
4.  **Softmax and Weighted Sum**: Shows the softmax transformation of scores to probabilities and uses these weights to modulate the opacity of Value vectors before merging them.
5.  **The Full Engine**: Displays the full attention formula and visualizes FlashAttention tiling on large matrices.

The code includes the necessary imports (`numpy`) and configures the frame size for vertical video.


In [41]:
agent_workspace_dir = './agent_workspace/workspace'

scene_files = os.listdir(agent_workspace_dir)
scene_files


['scene.py']

In [42]:
import subprocess

def render_scene():
    my_env = os.environ.copy()
    my_env["PATH"] = "/Library/TeX/texbin:" + os.environ.get("PATH", "")

    manim_result = subprocess.run(
        ['uv', 'run', 'manim', '-ql', 'scene.py'],
        capture_output=True,
        text=True,
        cwd=os.path.dirname(os.path.abspath(agent_workspace_dir)) + '/workspace',
        env=my_env
    )

    return manim_result

In [43]:
manim_result = render_scene()

In [44]:
print(manim_result.stdout)

Manim Community [32mv0.[0m[32m19.1[0m

[2;36m[01/01/26 23:46:38][0m[2;36m [0m[32mINFO    [0m Animation [32m0[0m : Partial      ]8;id=464671;file:///Users/nbilla/Documents/GitHub/eduly-ai/eduly/.venv/lib/python3.12/site-packages/manim/scene/scene_file_writer.py\[2mscene_file_writer.py[0m]8;;\[2m:[0m]8;id=964578;file:///Users/nbilla/Documents/GitHub/eduly-ai/eduly/.venv/lib/python3.12/site-packages/manim/scene/scene_file_writer.py#590\[2m590[0m]8;;\
[2;36m                    [0m         movie file written in      [2m                        [0m
[2;36m                    [0m         [32m'/Users/nbilla/Documents/G[0m [2m                        [0m
[2;36m                    [0m         [32mitHub/eduly-ai/eduly/noteb[0m [2m                        [0m
[2;36m                    [0m         [32mooks/prototyping/agent_wor[0m [2m                        [0m
[2;36m                    [0m         [32mkspace/workspace/media/vid[0m [2m               

In [45]:
print(manim_result.stderr)


Animation 0: FadeIn(VGroup of 2 submobjects), etc.:   0%|          | 0/15 [00:00<?, ?it/s]
                                                                                          

Animation 2: _MethodAnimation(Rectangle), etc.:   0%|          | 0/15 [00:00<?, ?it/s]
                                                                                      

Animation 3: TransformFromCopy(VGroup of 3 submobjects), etc.:   0%|          | 0/15 [00:00<?, ?it/s]
                                                                                                     

Animation 4: Indicate(Rectangle):   0%|          | 0/15 [00:00<?, ?it/s]
                                                                        

Animation 5: Create(VGroup of 3 submobjects):   0%|          | 0/15 [00:00<?, ?it/s]
                                                                                    

Animation 6: Write(Text('What are we looking for?')), etc.:   0%|          | 0/30 [00:00<?, ?it/s]
Animation 6: Write(

In [46]:
import copy

history = copy.deepcopy(result['messages'])
history.append({"role": "assistant", "content": manim_result.stderr})

result2 = manim_animator_agent.invoke({"messages": history})

In [47]:
result2['messages']

 AIMessage(content=[], additional_kwargs={'function_call': {'name': 'read_file', 'arguments': '{"file_path": "workspace/scene.py"}'}, '__gemini_function_call_thought_signatures__': {'112a9d5a-dec3-4c1d-9332-6cf5afed23e3': 'CuYQAY89a1+7gddfkDiF7nD1+hJ6sDFVHtm/aXBZsgBbrpPvIJWcyrTw4XL9n+mn91W2n7+Aqr86DuPnK3L/6FlC57EfU74SfFcCXIJwwMmPh9UCik1sJWoyOWJKwgEKVD6xM40dtPYZ7ArH6fl1xwiv/sEYSGAHJLYlsL6RSsChbGEkXTajdrcRD+rSXUeucRUbRDTf9YN4SkDX72qRPkBaO/XR7jORX4pa1J/CDdcFhdopkYKZvv9o0VCXb3BgA4DKcz7V1jM+CmRXm2DNLMwBmbnZ2h0DSyIgdsS4K349xAhZg2z0+wkesgk8xThhD+YeJqQUjg0vMqMR8yLxCjXXWyC70yklYbiVjx/wr0u6Dk0EkmbIHrq17gski1yshrBoSjP6xsot2sAWfDUoYCZljJqT0hAAw9aE+Qa+QFbusHz1nlEyr9ZzIKP9a7uszFkFkbWYaa++b3a0Om6IJo9TwQRv7zBITbj/NElnmYNXdGJgxkO6yjuVXMyeSOSkCFVNSjM/tCcMu7+MwAaHMpJqPHut65kro6+uJ7UPqveVLxCtsW7ptsBKxpmgmaWNTvjoj0038SkfOwEBLUoavayJ0FCOnCzh79vPqmfAb4oPuIIwCBFi44RQKHGpHfJog4NBU+8R8Cjld8csCOQl8894E7xV5wnnvudPE5gTZoLyyUfio5URnOXzBppsXugIOtYD9YJFHEFAErJsYVpx7a5YfH+5Kgro75L1rX2JreAL9LkIP47vCXOMgxKvS629BNQ3WxgCb

In [50]:
print(result2['messages'][-1].content[0]['text'])

The `workspace/scene.py` file is now complete and fixed. It implements the Attention Mechanism animation sequence in a vertical video format. I've corrected a bug in accessing matrix entries in Scene 4.

The scene includes:
1.  **Scene 1**: Library of vectors (Q, K, V) and query search.
2.  **Scene 2**: Dot product calculation and score generation.
4.  **Scene 4**: Softmax probability calculation and weighted aggregation of values.
5.  **Scene 5**: Full formula and FlashAttention tiling visualization.

The code is ready for rendering.
Animation 0: FadeIn(VGroup of 2 submobjects), etc.:   0%|          | 0/15 [00:00<?, ?it/s]
                                                                                          

Animation 2: _MethodAnimation(Rectangle), etc.:   0%|          | 0/15 [00:00<?, ?it/s]
                                                                                      

Animation 3: TransformFromCopy(VGroup of 3 submobjects), etc.:   0%|          | 0/15 [00:00<?, ?it/s]

In [51]:
manim_result = render_scene()

In [52]:
print(manim_result.stdout)

Manim Community [32mv0.[0m[32m19.1[0m

[2;36m[01/01/26 23:48:43][0m[2;36m [0m[32mINFO    [0m Animation [32m0[0m : Using cached     ]8;id=368785;file:///Users/nbilla/Documents/GitHub/eduly-ai/eduly/.venv/lib/python3.12/site-packages/manim/renderer/cairo_renderer.py\[2mcairo_renderer.py[0m]8;;\[2m:[0m]8;id=633423;file:///Users/nbilla/Documents/GitHub/eduly-ai/eduly/.venv/lib/python3.12/site-packages/manim/renderer/cairo_renderer.py#94\[2m94[0m]8;;\
[2;36m                    [0m         data [1m([0mhash :                   [2m                    [0m
[2;36m                    [0m         1639470855_3834372253_22313245 [2m                    [0m
[2;36m                    [0m         7[1m)[0m                             [2m                    [0m
[2;36m                   [0m[2;36m [0m[32mINFO    [0m Animation [32m1[0m : Using cached     ]8;id=164654;file:///Users/nbilla/Documents/GitHub/eduly-ai/eduly/.venv/lib/python3.12/site-packages/manim

In [53]:
print(manim_result.stderr)


Animation 0: FadeIn(VGroup of 2 submobjects), etc.:   0%|          | 0/1 [00:00<?, ?it/s]
                                                                                         

Animation 2: _MethodAnimation(Rectangle), etc.:   0%|          | 0/1 [00:00<?, ?it/s]
                                                                                     

Animation 3: TransformFromCopy(VGroup of 3 submobjects), etc.:   0%|          | 0/1 [00:00<?, ?it/s]
                                                                                                    

Animation 4: Indicate(Rectangle):   0%|          | 0/1 [00:00<?, ?it/s]
                                                                       

Animation 5: Create(VGroup of 3 submobjects):   0%|          | 0/1 [00:00<?, ?it/s]
                                                                                   

Animation 6: Write(Text('What are we looking for?')), etc.:   0%|          | 0/1 [00:00<?, ?it/s]
                              

In [54]:
video_path = agent_workspace_dir + '/media/videos/scene/1920p15/' + [x for x in os.listdir(os.path.dirname(os.path.abspath(agent_workspace_dir)) + '/workspace/media/videos/scene/1920p15') if x.endswith('.mp4')][0]
video_path

'./agent_workspace/workspace/media/videos/scene/1920p15/AttentionMechanism.mp4'

In [None]:
# # Retrieve and encode the PDF byte
# file_path = pathlib.Path(video_path)

# # Upload the PDF using the File API
# uploaded_video = ai_studio_gemini_client.files.upload(
#   file=file_path,
# )

In [None]:
# video_analysis_prompt = f"""
# You are a meticulous QA reviewer for educational Manim animations. Your job is to identify every discrepancy between the video and the storyboard specification. Be thorough—vague feedback is not useful. The development team needs actionable, detailed information to fix issues.

# ## Storyboard Reference

# {storyboard_text}

# ## Your Task

# Watch the video carefully and systematically compare it against the storyboard above. Examine each scene for:

# 1. **Content Accuracy** - Missing text, wrong equations, incorrect labels, missing visual elements
# 2. **Visual Fidelity** - Wrong colors, sizes, positions, fonts, or styling compared to storyboard specs
# 3. **Animation Quality** - Timing issues, abrupt transitions, overlapping animations, elements appearing/disappearing incorrectly
# 4. **Layout & Composition** - Text overlap, elements cut off screen, poor spacing, crowded visuals
# 5. **Scene Flow** - Wrong order of scenes, missing transitions, pacing problems

# ---

# ## Output Format

# Give a detailed summary of the issues found and what should be changed to fix them
# """

In [None]:
# response = ai_studio_gemini_client.models.generate_content(
#     model="gemini-2.5-flash",
#     # model="gemini-3-flash-preview", # only 20 req per day
#     config=gemini_types.GenerateContentConfig(
#         thinking_config=gemini_types.ThinkingConfig(thinking_level="high"),
#     ),
#     contents=[uploaded_video, video_analysis_prompt],
# )

In [None]:
# refine_prompt = f"""
# Update the ./workspace/scene.py file to fix the issues found in the video.

# ## Issues Summary

# {response.text}

# """

# refined_response = manim_animator_agent.invoke({"messages": [{"role": "user", "content": refine_prompt}]})

In [None]:
# refined_response['messages']

[HumanMessage(content='\nUpdate the ./workspace/scene.py file to fix the issues found in the video.\n\n## Issues Summary\n\nHere\'s a detailed QA review of the Manim animation against the provided storyboard:\n\n---\n\n**General Observation:** The overall aesthetic and clarity of the individual mathematical steps are good, but there are several discrepancies in visual elements, animation flow, and content as specified in the storyboard.\n\n---\n\n**Scene 1: The Library of Vectors**\n\n*   **Content Accuracy/Visual Fidelity (0:02):**\n    *   **Issue:** The storyboard states, "The first cluster, labeled \'Queries\' (Q), is a set of deep blue 1x4 horizontal rectangles. The second, labeled \'Keys\' (K), is a set of golden yellow 1x4 horizontal rectangles. The third, \'Values\' (V), consists of emerald green 1x4 horizontal rectangles."\n    *   **Discrepancy:** The `Keys (K)` and `Values (V)` clusters initially appear as dark brown and dark green, respectively, before transitioning to gold

In [None]:
# print(refined_response['messages'][-1].content[0]['text'])

The `workspace/scene.py` file has been updated to address all the feedback points from the QA review.

**Summary of Changes:**

1.  **Scene 1: The Library of Vectors**
    *   **Colors:** Updated clusters and vectors to use vivid colors: `BLUE` (Queries), `GOLD` (Keys), and `#50C878` (Emerald Green Values).
    *   **Layout & Animation:**
        *   `k_vectors` now originate from the `Keys (K)` cluster and animate to their position below the `Query` vector.
        *   Added a pulsing animation (`Indicate` with scale 1.2) to the `Query` vector.
        *   Corrected text to "What are we looking for?" and positioned it near the Query vector.

2.  **Scene 4: Softmax and Weighted Sum**
    *   **Weights Visualization:** Replaced the numeric matrix with **semi-transparent orange overlays**.
    *   **Animation:**
        *   Overlays now animate moving from the Softmax output down to the Value vectors.
        *   **Value Vectors Interaction:** The Value vectors now visually react to the 