## Code Generation Workflow

Um aus dem generierten Plan das entgültige Video zu generieren, wird ein LlamaIndex Workflow verwendet, in dem ein Code LLM (Qwen2.5-Coder-7B-Instruct) iterativ validen Manim Code generiert und ein Vision Language Model (Llama-3.2-11B-Vision-Instruct) einzelne Aspekte des Videos analysiert und Feedback an das Code LLM zurückgibt.

### Code Generation Workflow Settings

In [1]:
# Settings for the Code Generation Workflow
code_generation_settings = {
    "project_name": "project-1",
    "max_generations": 10,
    "code_llm": "qwen2.5-coder:latest",
    "vlm": "llama3.2-vision:latest",
    "debug_mode": True,
    "show_user_prompts": True
}

In [2]:
EXAMPLE_PLAN = """1. **Create Grid**: Draw a coordinate system on paper or screen. Mark the x and y axes for orientation. The grid should be large enough to display all given points.

2. **Mark Points**:
   - **Point A**: Locate (1, 1) on the coordinate system. Move 1 unit along the x-axis from the origin to the right and then 1 unit up along the y-axis. Mark this point as A.
   - **Point B**: Find (5, 1) on the coordinate system. Move 5 units along the x-axis from the origin to the right and stay at the same height (y=1). Mark this point as B.
   - **Point C**: Find (3, 4) on the coordinate system. Move 3 units along the x-axis from the origin to the right and then 4 units up along the y-axis. Mark this point as C.

3. **Label Points**: Label each marked point with its corresponding letter to clearly identify them: Label the point at (1, 1) as A, at (5, 1) as B, and at (3, 4) as C.

4. **Draw Lines**:
   - Connect Point A and Point B with a straight line.
   - Connect Point B and Point C with a straight line.
   - Connect Point C and Point A with a straight line.

5. **Verification**:
   - Verify that each triangle segment is drawn correctly: AB horizontal (y-coordinates equal), AC and BC diagonal in their respective directions based on their x and y coordinates.

6. **Summary**: You should now have a closed triangle, labeled with A(1, 1), B(5, 1), and C(3, 4), within your coordinate system."""

### Imports

In [3]:
# LlamaIndex Workflow
from llama_index.core.workflow import Event, StartEvent, StopEvent, Workflow, step, Context
from llama_index.utils.workflow import draw_all_possible_flows, draw_most_recent_execution # to visualize the Workflow

# Ollama LLM
import ollama

# Utility Libraries
import ast # for Python Code Parsing
import subprocess
import os
import json
import cv2 # for Video Screenshot Extraction

# Typing and Structured Output Models
from typing import List, Tuple
from pydantic import BaseModel

### Utility Functions

Funktionen, die während des Workflows die LLM Outputs bearbeiten, zum Debugging werden oder als Input für das Function Calling (Tool Use in Ollama) dienen.

In [4]:
# Code Utility Functions
def strip_markdown_from_code(code: str) -> str:
    """
    Removes markdown formatting from generated code.

    Args:
        code (str): The code to strip markdown from.

    Returns:
        str: The code with markdown formatting removed.
    """
    return code.lstrip("`python\n").rstrip("`\n").strip()

def validate_generated_python_code(code: str) -> Tuple(ast.AST, SyntaxError):
    """
    Validates generated Python code.

    Args:
        code (str): The generated Python code.

    Returns:
        tuple: A tuple containing:
            * `bool`: Whether the code is valid.
            * `ast.AST`: The AST object if the code is valid.
            * `SyntaxError`: The syntax error if the code is invalid.
    """
    try:
        tree = ast.parse(code)
        return tree, None
    except SyntaxError as e:
        return None, e

def extract_class_name(tree: ast.AST) -> str:
    """
    Extracts the class name from a Python code.

    Args:
        tree (ast.AST): The AST object of the code.

    Returns:
        str: The class name if found, otherwise None.
    """
    class_name = None
    for node in tree:
        if isinstance(node, ast.ClassDef):
            class_name = node.name
            break
    return class_name

# Function Calling Tool
def extract_screenshot(timestamp: float, class_name: str, code_iteration: int) -> str:
    """
    Extracts a screenshot for significant moments in a video.

    Args:
        timestamp (float): Timestamp of a significant moment in a video.
        class_name (str): Name of the class of the generated scene.
        code_iteration (int): Current iteration of the generated code used to generate the video.

    Returns:
        str: The whole path where the image file was saved.
    """
    video_folder_path = f"video_generation/{code_generation_settings['project_name']}/media/videos/{class_name}_{str(code_iteration)}/480p15/"
    
    if not os.path.exists(video_folder_path):
        print("Video folder does not exist!")
        return None
    
    videos = os.listdir(video_folder_path)
    class_name_videos = [video for video in videos if class_name in video]
    video_path = video_folder_path + class_name_videos[-1]
    
    output_dir: str = "screenshots"
    
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    video = cv2.VideoCapture(video_path)
    if not video.isOpened():
        raise ValueError(f"Could not open video file: {video_path}")
    
    fps = video.get(cv2.CAP_PROP_FPS)
    total_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
    duration = total_frames / fps
    
    if timestamp < 0 or timestamp > duration:
        return None
    
    frame_number = int(timestamp * fps)
    
    video.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
    
    ret, frame = video.read()
    if not ret:
        return None
    
    output_path = os.path.join(output_dir, f"screenshot_{timestamp:.3f}s.png")
    
    cv2.imwrite(output_path, frame)
    
    video.release()
    
    return output_path

# Debugging Utility Functions
def debug_log(title: str, content: str) -> None:
    """
    Prints a debug log message.

    Args:
        title (str): The title of the debug log message.
        content (List[str]): A list of strings containing the debug log message.
    """
    if content:
        print()
        print(f"\n{title}")
        print("=" * 100)
        print(content)
        print("=" * 100)
        print()

def show_user_prompt(prompt: str) -> None:
    """
    Prints the formatted user prompt.

    Args:
        prompt (str): The prompt formatted with all the inputs.
    """
    if prompt:
        print()
        print("=" * 100)
        print(prompt)
        print("=" * 100)
        print()

In [5]:
# Ollama LLM Utility Function with optional image input, function calling and structured output generation
def call_ollama_model(
    model="qwen2.5-coder:latest",
    system_prompt=None,
    user_prompt=None,
    temperature=0.1,
    context_size=16384,
    images=None,
    tools=None,
    json_schema=None
):
    """
    Calls an Ollama model with the given parameters.

    Args:
        model (str): The name of the Ollama model to use.
        system_prompt (str): The system prompt to use in the generation.
        user_prompt (str): The user prompt to use in the generation.
        temperature (float): The temperature to use in the generation.
        context_size (int): The context size to use in the generation.
        images (list): A list of images to send with the request (for VLMs).
        tools (list): A list of tools to use (for function calling).
        json_schema (str): The JSON schema to use for the response (for structured output generation).

    Returns:
        ChatResponse: The message from the Ollama model.
    """
    response = ollama.chat(
        model=model,
        messages=[
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": user_prompt,
                "images": images
            }
        ],
        options={
            "temperature": temperature,
            "num_ctx": context_size
        },
        format=json_schema,
        tools=tools
    )
    
    return response.message

### Model and Event Definition

LlamaIndex Workflows funktionieren durch getriggerte Events. Für jeden step im Workflow muss mindestens ein Event als Ausgabe vorhanden sein, wodurch dann die steps mit dem Event als Input getriggert werden. Events sind Pydantic Modelle, die von der Event Klasse erben und können beliebige Attribute wie normale Pydantic Klassen modellieren und an den nächsten step weitergeben. Die Events und steps können in der HTML Visualisierung in `Pipeline-Visualizations/manim_code_generation_flow.html` eingesehen werden.

Die Pydantic Models können außerdem zur Generierung von Structured Output verwendet werden.

In [6]:
# Events for Video Generation
class ManimCodeGeneratedEvent(Event):
    generated_code: str


class GeneratedCodeIsValidEvent(Event):
    generated_code: str


class GeneratedCodeIsInvalidEvent(Event):
    generated_code: str
    error_msg: str


# Events for Video Generation
class VideoGenerationSuccessfulEvent(Event):
    generated_code: str


class VideoGenerationFailedEvent(Event):
    generated_code: str
    error_msg: str


# Model and Event for Screenshot Extraction
class Screenshot(BaseModel):
    timestamp: float
    file_name: str


class VideoScreenshotsExtractedEvent(Event):
    generated_code: str
    screenshots: List[Screenshot]


# Models and Events for Video Issue Identification
class IssueFormat(BaseModel):
    has_issues: bool
    issue_description: str


class VideoIssue(BaseModel):
    timestamp: float
    has_issues: bool
    issue_description: str


class VideoHasIssuesEvent(Event):
    generated_code: str
    video_issues: List[VideoIssue]


class VideoHasNoIssuesEvent(Event):
    generated_code: str

### Prompt Definitions

Um die Workflows übersichtlich zu halten, wurden die Prompts in einer seperaten Datei innerhalb von zwei Python Klassen (SystemPrompts und UserPrompts) definiert und können über das jeweilige Attribut referenziert werden.

In [7]:
# Import custom prompts
import code_generation_prompts

In [8]:
# Example system prompt
system_prompts = code_generation_prompts.SystemPrompts()
print(system_prompts.INITIAL_CODE_GENERATION)

'\nYou are a Python coding expert specializing in creating educational animations using the Manim library. Your role is to:\n    1. Generate complete, production-ready Python code using the latest Manim Community version \n    2. Follow instructional plans, converting the provided steps into animated scenes \n    3. Implement best practices for: \n        - Scene composition and organization \n        - Smooth transitions between mathematical elements \n        - Clear and readable text animations \n        - Consistent styling and theming \n        - Proper timing and pacing of animations \nWhen provided with an instructional plan, you will:\n    1. Create a ManimScene class structure \n    2. Convert each instruction into appropriate Manim objects and animations \n    3. Use appropriate animation methods (Create, Transform, FadeIn, etc.) \n    4. Include proper positioning, scaling, and grouping of elements \n    5. Add appropriate waiting times between animations \n    6. Implement 

In [9]:
# Example user prompt
user_prompts = code_generation_prompts.UserPrompts()
print(user_prompts.INITIAL_CODE_GENERATION)

'Generate Python Manim code based on the following plan:\n{video_plan}'

In [None]:
# Example user prompt
user_prompts = code_generation_prompts.UserPrompts()
print(user_prompts.get_params("INITIAL_CODE_GENERATION"))

### Workflow Definition

Der Workflow ist eine Klasse mit mehreren Methoden, die die steps des Workflows darstellen. Beginnend mit dem StartEvent löst jeder step des Workflows ein weiteres zuvor definiertes Event und damit den nächsten step bis hin zum StopEvent aus, welches das Resultat, in diesem Fall den Pfad des generierten Videos, zurückgibt.

In [10]:
class ManimCodeGenerationFlow(Workflow):
    @step
    async def generate_initial_manim_code(
        self, ctx: Context, ev: StartEvent
    ) -> ManimCodeGeneratedEvent:
        # Get input and settings from StartEvent
        video_plan = ev.video_plan
        settings = ev.settings
        debug_mode = settings.get("debug_mode", False)
        show_user_prompts = settings.get("show_user_prompts", False)
        max_generations = settings.get("max_generations", 10)
        project_name = settings.get("project_name", "default")
        code_llm = settings.get("code_llm", "qwen2.5-coder:latest")
        vlm = settings.get("vlm", "llama3.2-vision:latest")

        # Set initial global context
        await ctx.set("video_plan", video_plan)
        await ctx.set("max_generations", max_generations)
        await ctx.set("debug_mode", debug_mode)
        await ctx.set("show_user_prompts", show_user_prompts)
        await ctx.set("project_name", project_name)
        await ctx.set("code_llm", code_llm)
        await ctx.set("vlm", vlm)
        await ctx.set("code_iteration", 0) # To implement a maximum number of iterations before aborting the generation

        system_prompt = system_prompts.INITIAL_CODE_GENERATION
        user_prompt = user_prompts.INITIAL_CODE_GENERATION.format(video_plan=video_plan)

        if show_user_prompts:
            show_user_prompt(user_prompt)
            
        response_msg = call_ollama_model(
            model=code_llm,
            system_prompt=system_prompt,
            user_prompt=user_prompt,
            temperature=0.2,
            context_size=16384,
        )

        # Strip markdown from generated response
        generated_code = strip_markdown_from_code(response_msg.content)

        if debug_mode:
            debug_log("Initial Manim Code Generation Step", generated_code)

        return ManimCodeGeneratedEvent(generated_code=str(generated_code))

    @step
    async def validate_generated_code(
        self, ctx: Context, ev: ManimCodeGeneratedEvent
    ) -> GeneratedCodeIsValidEvent | GeneratedCodeIsInvalidEvent | StopEvent:
        # Get input from ManimCodeGeneratedEvent
        generated_code = ev.generated_code
        
        # Get debug mode from global context
        debug_mode = await ctx.get("debug_mode")

        # Validate the generated Python code (returns either a syntax tree or error message, the other one is None)
        syntax_tree, error_msg = validate_generated_python_code(
            generated_code
        )

        # Check if current code iterations exceed the maximum generations
        max_generations = await ctx.get("max_generations")
        code_iteration = await ctx.get("code_iteration")
        new_code_iteration = code_iteration + 1
        await ctx.set("code_iteration", new_code_iteration)

        if debug_mode:
            print("Number of code generations/changes: " + str(new_code_iteration))

        if new_code_iteration > max_generations:
            print("Too many tries to generate working code. Stopping generation now.")
            return StopEvent(result=str("No result"))

        if syntax_tree:
            try:
                # Extract class name and set global context class_name
                class_name = extract_class_name(syntax_tree)
                await ctx.set("class_name", class_name)
                return GeneratedCodeIsValidEvent(
                    generated_code=generated_code
                )
            except Exception as e:
                print(f"Error extracting class name: {str(e)}")
                return GeneratedCodeIsInvalidEvent(
                    generated_code=generated_code,
                    error_msg=f"Class extraction error: {str(e)}",
                )
        else:
            return GeneratedCodeIsInvalidEvent(
                generated_code=generated_code,
                error_msg=str(error_msg),
            )

    @step
    async def generate_video(
        self, ctx: Context, ev: GeneratedCodeIsValidEvent
    ) -> VideoGenerationSuccessfulEvent | VideoGenerationFailedEvent:
        # Get input from GeneratedCodeIsValidEvent
        generated_code = ev.generated_code
        
        # Get relevant data from global context
        class_name = await ctx.get("class_name")
        project_name = await ctx.get("project_name")
        code_iteration = await ctx.get("code_iteration")

        try:
            if not os.path.exists("video_generation"):
                os.makedirs("video_generation")

            if not os.path.exists("video_generation/" + project_name):
                os.makedirs("video_generation/" + project_name)

            # Construct the Docker volume path to store the generated videos
            current_folder = os.getcwd()
            python_file_name = class_name + "_" + str(code_iteration) + ".py"
            project_path = current_folder + "/video_generation/" + project_name
            docker_volume_path = project_path + ":/manim"

            # Save generated Manim code to python file
            try:
                with open(project_path + "/" + python_file_name, "w") as file:
                    file.write(generated_code)
            except IOError as e:
                return VideoGenerationFailedEvent(
                    generated_code=str(generated_code),
                    error_msg=f"Failed to write scene file: {str(e)}",
                )

            # Construct Docker command to generate the video
            docker_cmd = [
                "docker",
                "run",
                "--rm",
                "-v",
                docker_volume_path,
                "manimcommunity/manim",
                "manim",
                "-ql",
                python_file_name,
                class_name,
            ]

            # Call Docker to generate the video
            try:
                result = subprocess.run(
                    docker_cmd, capture_output=True, text=True, check=True
                )
                return VideoGenerationSuccessfulEvent(
                    generated_code=str(generated_code)
                )
            except subprocess.CalledProcessError as e:
                return VideoGenerationFailedEvent(
                    generated_code=str(generated_code), error_msg=str(e.stderr)
                )
        except Exception as e:
            print(f"Error in generate_video: {str(e)}")
            return VideoGenerationFailedEvent(
                generated_code=str(generated_code),
                error_msg=f"Unexpected error: {str(e)}",
            )

    @step
    async def fix_invalid_python_code(
        self, ctx: Context, ev: GeneratedCodeIsInvalidEvent
    ) -> ManimCodeGeneratedEvent:
        # Get input from GeneratedCodeIsInvalidEvent
        generated_code = ev.generated_code
        error_msg = ev.error_msg

        # Get data from global context
        debug_mode = await ctx.get("debug_mode")
        show_user_prompts = await ctx.get("show_user_prompts")
        code_llm = await ctx.get("code_llm")

        system_prompt = system_prompts.INVALID_PYTHON_CODE_FIXING
        user_prompt = user_prompts.INVALID_PYTHON_CODE_FIXING.format(
            generated_code=generated_code, error_msg=error_msg
        )

        if show_user_prompts:
            show_user_prompt(user_prompt)
            
        response_msg = call_ollama_model(
            model=code_llm,
            system_prompt=system_prompt,
            user_prompt=user_prompt,
            temperature=0.2,
            context_size=16384,
        )

        fixed_generated_code = strip_markdown_from_code(response_msg.content)

        if debug_mode:
            debug_log("Invalid Python Code Fixing Step", fixed_generated_code)

        return ManimCodeGeneratedEvent(generated_code=str(fixed_generated_code))

    @step
    async def fix_manim_errors_in_code(
        self, ctx: Context, ev: VideoGenerationFailedEvent
    ) -> ManimCodeGeneratedEvent:
        debug_mode = await ctx.get("debug_mode")
        show_user_prompts = await ctx.get("show_user_prompts")
        code_llm = await ctx.get("code_llm")

        generated_code = ev.generated_code
        error_msg = ev.error_msg

        system_prompt = system_prompts.MANIM_ERROR_FIXING
        user_prompt = user_prompts.MANIM_ERROR_FIXING.format(
            generated_code=generated_code, error_msg=error_msg
        )

        if show_user_prompts:
            show_user_prompt(user_prompt)

        response_msg = call_ollama_model(
            model=code_llm,
            system_prompt=system_prompt,
            user_prompt=user_prompt,
            temperature=0.1,
            context_size=16384,
        )

        # Strip markdown from generated response
        fixed_generated_code = strip_markdown_from_code(response_msg.content)

        if debug_mode:
            debug_log("Manim Code Fixing Step", fixed_generated_code)

        return ManimCodeGeneratedEvent(generated_code=str(fixed_generated_code))

    @step
    async def extract_screenshots_from_video(
        self, ctx: Context, ev: VideoGenerationSuccessfulEvent
    ) -> VideoScreenshotsExtractedEvent:
        # Get input from VideoGenerationSuccessfulEvent
        generated_code = ev.generated_code
        class_name = await ctx.get("class_name")

        # Get data from global context
        debug_mode = await ctx.get("debug_mode")
        show_user_prompts = await ctx.get("show_user_prompts")
        code_llm = await ctx.get("code_llm")
        code_iteration = await ctx.get("code_iteration")

        system_prompt = system_prompts.SCREENSHOT_EXTRACTION_FUNCTION_CALLING
        user_prompt = user_prompts.SCREENSHOT_EXTRACTION_FUNCTION_CALLING.format(
            class_name=class_name,
            code_iteration=str(code_iteration),
            generated_code=generated_code,
        )
        
        if show_user_prompts:
            show_user_prompt(user_prompt)

        # Call the Code LLM model to create function calls for screenshot extraction
        try:
            response_msg = call_ollama_model(
                model=code_llm,
                system_prompt=system_prompt,
                user_prompt=user_prompt,
                temperature=0.1,
                context_size=16384,
                tools=[extract_screenshot],
            )
        except Exception as e:
            print(f"Error in LLM response processing: {str(e)}")
            return VideoScreenshotsExtractedEvent(
                generated_code=str(generated_code), screenshots=[]
            )

        screenshots = []

        available_functions = {
            "extract_screenshot": extract_screenshot,
        }

        # Extract screenshots via function calls
        for tool in response_msg.tool_calls or []:
            function_to_call = available_functions.get(tool.function.name)
            if function_to_call:
                # Call the function
                arguments = tool.function.arguments
                file_name = function_to_call(**arguments)
                if file_name:
                    screenshots.append(
                        {
                            "timestamp": float(arguments["timestamp"]),
                            "file_name": file_name,
                        }
                    )
                    
                    if debug_mode:
                        print("Extracted screenshot", file_name)
            else:
                if debug_mode:
                    print("Function not found:", tool.function.name)

        return VideoScreenshotsExtractedEvent(
                generated_code=str(generated_code), screenshots=screenshots
            )

    @step
    async def evaluate_video_screenshots(
        self, ctx: Context, ev: VideoScreenshotsExtractedEvent
    ) -> VideoHasIssuesEvent | VideoHasNoIssuesEvent:
        # Get input from VideoScreenshotsExtractedEvent
        screenshots = ev.screenshots
        generated_code = ev.generated_code

        # Get data from global context
        video_plan = await ctx.get("video_plan")
        show_user_prompts = await ctx.get("show_user_prompts")
        vlm = await ctx.get("vlm")

        video_issues = []

        system_prompt = system_prompts.VIDEO_ISSUE_FINDING

        # Analyze every generated screenshot for issues
        for screenshot in screenshots:
            screenshot = dict(screenshot)
            
            user_prompt = user_prompts.VIDEO_ISSUE_FINDING.format(
                generated_code=generated_code,
                video_plan=video_plan,
                timestamp=screenshot["timestamp"],
            )

            
            if show_user_prompts:
                show_user_prompt(user_prompt)
            
            # Vision Language Model call with structured output generation to identify issues in the video
            try:
                response_msg = call_ollama_model(
                    model=vlm,
                    system_prompt=system_prompt,
                    user_prompt=user_prompt,
                    images=[screenshot["file_name"]],
                    temperature=0.1,
                    context_size=16384,
                    json_schema=IssueFormat.model_json_schema(),
                )

                # Parse the response JSON
                try:
                    video_issue = json.loads(response_msg.content)
                    video_issue["timestamp"] = screenshot["timestamp"]
                    video_issues.append(video_issue)
                except json.JSONDecodeError as e:
                    print(f"Error parsing response JSON: {str(e)}")
                    video_issues.append(
                        {
                            "timestamp": screenshot["timestamp"],
                            "has_issues": False,
                            "issue_description": "Failed to parse response",
                        }
                    )
            except Exception as e:
                print(
                    f"Error processing screenshot {screenshot['file_name']}: {str(e)}"
                )
                video_issues.append(
                    {
                        "timestamp": screenshot["timestamp"],
                        "has_issues": False,
                        "issue_description": f"Processing error: {str(e)}",
                    }
                )

        video_has_issues = any(video_issue.get("has_issues", False) for video_issue in video_issues)

        if video_has_issues:
            return VideoHasIssuesEvent(
                generated_code=str(generated_code), video_issues=video_issues
            )
        else:
            return VideoHasNoIssuesEvent(generated_code=str(generated_code))

    @step
    async def fix_semantic_video_issues_in_code(
        self, ctx: Context, ev: VideoHasIssuesEvent
    ) -> ManimCodeGeneratedEvent:
        # Get input from VideoHasIssuesEvent
        video_issues = ev.video_issues
        generated_code = ev.generated_code

        # Get data from global context
        debug_mode = await ctx.get("debug_mode")
        show_user_prompts = await ctx.get("show_user_prompts")
        code_llm = await ctx.get("code_llm")

        # Combine video issues into a single context string
        video_issue_strings = []
        for video_issue in video_issues:
            video_issue = dict(video_issue)
            if video_issue["has_issues"]:
                video_issue_strings.append(
                    f"- Timestamp {video_issue['timestamp']} | Issue description: {video_issue['issue_description']}"
                )

        video_issue_context = "\n".join(video_issue_strings)

        system_prompt = system_prompts.VIDEO_ISSUE_FIXING
        user_prompt = user_prompts.VIDEO_ISSUE_FIXING.format(
            video_issue_context=video_issue_context, generated_code=generated_code
        )

        if show_user_prompts:
            show_user_prompt(user_prompt)
            
        # Call the Code LLM to fix the semantic issues in the video
        response_msg = call_ollama_model(
            model=code_llm,
            system_prompt=system_prompt,
            user_prompt=user_prompt,
            temperature=0.1,
            context_size=16384,
        )

        # Strip markdown from generated response
        fixed_generated_code = strip_markdown_from_code(response_msg.content)

        if debug_mode:
            debug_log("Semantic Video Issue Fixing Step", fixed_generated_code)

        return ManimCodeGeneratedEvent(generated_code=str(fixed_generated_code))

    @step
    async def generate_final_video(
        self, ctx: Context, ev: VideoHasNoIssuesEvent
    ) -> StopEvent:
        # Get input from VideoHasIssuesEvent
        generated_code = ev.generated_code

        # Get data from global context
        class_name = await ctx.get("class_name")
        project_name = await ctx.get("project_name")
        code_iteration = await ctx.get("code_iteration")

        # Generate the final manim video in higher resolution
        try:
            if not os.path.exists("video_generation"):
                os.makedirs("video_generation")

            if not os.path.exists("video_generation/" + project_name):
                os.makedirs("video_generation/" + project_name)

            current_folder = os.getcwd()
            python_file_name = class_name + "_final.py"
            project_path = current_folder + "/video_generation/" + project_name
            docker_volume_path = project_path + ":/manim"

            try:
                with open(project_path + "/" + python_file_name, "w") as file:
                    file.write(generated_code)
            except IOError as e:
                return VideoGenerationFailedEvent(
                    generated_code=str(generated_code),
                    error_msg=f"Failed to write scene file: {str(e)}",
                )

            docker_cmd = [
                "docker",
                "run",
                "--rm",
                "-v",
                docker_volume_path,
                "manimcommunity/manim",
                "manim",
                "-qh",
                python_file_name,
                class_name,
            ]

            try:
                result = subprocess.run(
                    docker_cmd, capture_output=True, text=True, check=True
                )

                final_video_path = project_path + "/media/videos/" + class_name + "_final/1080p60/" + class_name + ".mp4"
                return StopEvent(result=str(final_video_path))
            except subprocess.CalledProcessError as e:
                return StopEvent(result=project_path + "/media/videos/" + class_name + "_" + str(code_iteration) + "/480p15/" + class_name + ".mp4")
        except Exception as e:
            print(f"Error in generate_final_video: {str(e)}")
            return StopEvent(result=project_path + "/media/videos/" + class_name + "_" + str(code_iteration) + "/480p15/" + class_name + ".mp4")

In [12]:
# Initialize workflow
manim_code_generation_workflow = ManimCodeGenerationFlow(timeout=None, verbose=True)

# Initialize Context object to share data between steps within the workflow
manim_code_generation_workflow_ctx = Context(manim_code_generation_workflow)

# Visualize all possible paths in workflow
draw_all_possible_flows(ManimCodeGenerationFlow, filename="manim_code_generation_flow.html")

<class 'NoneType'>
<class '__main__.VideoHasIssuesEvent'>
<class '__main__.VideoHasNoIssuesEvent'>
<class '__main__.VideoScreenshotsExtractedEvent'>
<class '__main__.ManimCodeGeneratedEvent'>
<class '__main__.ManimCodeGeneratedEvent'>
<class '__main__.ManimCodeGeneratedEvent'>
<class 'llama_index.core.workflow.events.StopEvent'>
<class '__main__.ManimCodeGeneratedEvent'>
<class '__main__.VideoGenerationSuccessfulEvent'>
<class '__main__.VideoGenerationFailedEvent'>
<class '__main__.GeneratedCodeIsValidEvent'>
<class '__main__.GeneratedCodeIsInvalidEvent'>
<class 'llama_index.core.workflow.events.StopEvent'>
manim_code_generation_flow.html


In [13]:
# Execute workflow
result = await manim_code_generation_workflow.run(video_plan=EXAMPLE_PLAN, settings=code_generation_settings)

# Save workflow execution path to HTML file
draw_most_recent_execution(manim_code_generation_workflow, filename="manim_code_generation_last_execution.html")

Running step generate_initial_manim_code

Generate Python Manim code based on the following plan:
1. **Create Grid**: Draw a coordinate system on paper or screen. Mark the x and y axes for orientation. The grid should be large enough to display all given points.

2. **Mark Points**:
   - **Point A**: Locate (1, 1) on the coordinate system. Move 1 unit along the x-axis from the origin to the right and then 1 unit up along the y-axis. Mark this point as A.
   - **Point B**: Find (5, 1) on the coordinate system. Move 5 units along the x-axis from the origin to the right and stay at the same height (y=1). Mark this point as B.
   - **Point C**: Find (3, 4) on the coordinate system. Move 3 units along the x-axis from the origin to the right and then 4 units up along the y-axis. Mark this point as C.

3. **Label Points**: Label each marked point with its corresponding letter to clearly identify them: Label the point at (1, 1) as A, at (5, 1) as B, and at (3, 4) as C.

4. **Draw Lines**:
   -

In [14]:
# Print the resulting final video path (or last generated video path if final video generation failed)
print(result)

'/home/ubuntu/video_generation/project-1media/videos/TriangleScene_final/1080p60/TriangleScene.mp4'