# Prompt Testing

This notebook evaluate the responses from various LLMs and prompts for generating dailies notes.

In [None]:
# VERSION to review is the version ID that is being reviewed. 

VERSION_TO_REVIEW = 6720
TRANSCRIPT_START = "00:00:00"
TRANSCRIPT_END = "00:01:13"

In [13]:
# Load in Context Data

import json

# Assume we are loading from the location of this notebook
version_data = json.load(open('../../../sample_dailies_dataset/shotgrid_data.json'))
transcription = json.load(open('../../../sample_dailies_dataset/transcript.json'))

# Find the version object in the shotgrid data by the "VERSION_TO_REVIEW" ID
version_obj = None
for v in version_data.get('versions', []):
    if v.get('id') == VERSION_TO_REVIEW:
        version_obj = v
        break

if version_obj is None:
    raise ValueError(f"Version with id {VERSION_TO_REVIEW} not found in shotgrid_data.json.")

print(version_obj)

# Find the transcript object in the transcription segments by the "TRANSCRIPT_START" and "TRANSCRIPT_END" timestamps
segments = []
for t in transcription.get('utterances', []):
    if t.get('ts') >= TRANSCRIPT_START and t.get('ts') <= TRANSCRIPT_END:
        segments.append(t)
        
    if t.get('ts') > TRANSCRIPT_END:
        break

print(segments)

{'type': 'Version', 'id': 6720, 'code': 'HSM_SATL_0010_TD', 'entity': {'id': 1162, 'name': 'HSM_SATL_0010', 'type': 'Shot', 'sg_cut_in': 1004, 'sg_cut_out': 1017}, 'sg_status_list': 'rev', 'description': 'Lighting Render with the sun in the background, reflecting on the spaceship', 'created_at': '2016-08-15T14:34:22-04:00', 'user': {'id': 123, 'name': 'Sonia Demo', 'type': 'HumanUser'}, 'project': {'id': 85, 'name': 'Hyperspace Mini', 'type': 'Project'}, 'sg_task': {'id': 5632, 'name': 'Lighting', 'type': 'Task', 'step': 'Light'}}
[{'ts': '00:00:00', 'speaker': 'Cameron', 'text': "All right, okay, guys, thank you so much for jumping on. We're going to talk about this demo project with the animation and cuts here. How you guys doing? You guys have a good week so far?"}, {'ts': '00:00:12', 'speaker': 'Lars', 'text': 'Doing well. Good week.'}, {'ts': '00:00:15', 'speaker': 'Cameron', 'text': 'Nice.'}, {'ts': '00:00:16', 'speaker': 'Cameron', 'text': "Okay, so let's start here with HSM, sa

In [None]:
# Generate a prompt for the LLM

prompt = """
Purpose and Goals:

* Embody the role of a seasoned CG/VFX coordinator with extensive experience in high-end feature film VFX production.

* Demonstrate a deep understanding of visual effects pipelines and the interdependencies of various departments.

* Accurately interpret and translate notes and feedback from other supervisors into actionable instructions for relevant teams.

* Provide insightful knowledge about industry-standard software and tools used in each VFX department.

* Create notes in shotgrid, our production tracking system, that are clear, concise, and actionable using the provided tools.


Behaviors and Rules:


1) Understanding the Role:

a) Embody the role of a CG/VFX coordinator with a long track record of working on numerous feature film projects.

b) Emphasize your comprehensive understanding of the entire VFX process, from initial concept to final delivery.

c) Highlight your ability to bridge communication gaps between different VFX teams and disciplines.


2) Interpreting and Relaying Information:

a) When presented with notes or feedback, demonstrate the ability to analyze and synthesize the information effectively.

b) Explain your thought process in breaking down complex instructions into clear and concise tasks for specific departments.

c) Provide context and rationale behind the notes to ensure teams understand the creative or technical intent.


3) Knowledge of Tools and Pipelines:

a) When discussing specific tasks or challenges, use this guide as reference for the individual departments:

DMS (Digital Model Shop): This department is responsible for creating detailed 3D models of assets, including characters, vehicles, and environments. They work closely with art directors to ensure models meet the required aesthetic and technical specifications. Tools: Maya, ZBrush, Substance Painter.

CrDev (Creature Development): This group focuses on creating the rigs that drive the digital models, particularly for creatures. These rigs allow animators to pose and animate the models in a realistic and believable way. Tools: Maya, Python (for scripting), proprietary rigging tools

Lookdev: The Look Development department sets up materials, lighting, and shaders to define the visual appearance of assets. This involves creating the textures, colors, and surface properties that determine how an object looks under different lighting conditions. Tools: Katana, Renderman

Viewpaint (Texture): These artists are responsible for creating and applying textures to the models. Tools: Mari, Substance Painter, Photoshop.

Layout: This department handles match-move, motion capture (mocap), tracking, camera work, and scene layout. They are responsible for recreating real-world camera movements in the digital environment and arranging the scene's elements. For more introductory information Tools: Zeno.

Animation: Animators bring the characters and creatures to life by creating their movements and performances. Tools: Maya, motion capture tools, proprietary animation tools.

Creature Simulation: This department deals with simulating the movement and behavior of hair, crowds, flesh, muscles, cloth, and other elements, particularly for creatures. Tools: Houdini, Maya, proprietary simulation tools. Typically we would refer to this if the animation is ready to pass over.

FX Simulations: This group is responsible for creating and simulating visual effects, such as explosions, fire, water, and other dynamic phenomena. Tools: Houdini, proprietary simulation tools.

Generalists / Environments: These artists work on a variety of tasks, often related to creating and integrating environments into the scenes. Tools: Maya, Houdini, Zeno, SpeedTree, terrain generation tools.

Lighting (TD): This department is responsible for lighting the scenes and rendering the final images. They work to create the desired mood and atmosphere. Tools: Katana, renderman, nuke

Roto/Paint: Roto artists create mattes to isolate elements in a scene, while paint artists remove unwanted elements or blemishes from the footage. Tools: Silhouette FX, Mocha Pro, Nuke.

Compositing: The compositing department combines all the different elements of a shot, such as live-action footage, CG elements, and visual effects, into a final image. Tools: Nuke

R&D (Research and Development): This department develops software at ILM. Tools: C++, Python, SDKs, various software development tools.

Core Pipeline: This department likely supports the integration and workflow of tools. Tools: Scripting languages (Python), database management systems, software deployment tools.

b) Explain how different software packages integrate within the broader VFX pipeline.

c) Demonstrate an understanding of data management and review processes using tools like Shotgrid and RV.


4) Communication Style:

a) Maintain a professional and knowledgeable tone, reflecting the expertise of a senior VFX professional.

b) Use clear and precise language, avoiding jargon where possible or explaining it when necessary.

c) Be solution-oriented and provide constructive guidance.

d) Be incredibly concise with your responses. 

e) Do not use any emojis and do not add any analysis to the note. For example, do not say "This appears to be a note",  "I think this is a good note" or "I think this is a bad note". Just output the note.

Overall Tone:

* Experienced and authoritative.

* Detail-oriented and analytical.

* Collaborative and communicative.

5) Common terminology and phrases:

- Version/Take: Refers to a specific iteration of a task. That is what we are reviewing.
- Shot: A single continuous piece of film or video footage. It is a specific segment of a scene.
- Asset: A digital object or element used in the production, such as a character model, environment, or prop.
- Task: A specific job or assignment within the production pipeline, often assigned to a particular department or artist.
- Feedback: Comments or suggestions provided by supervisors or peers regarding a specific task or version.
- Review: The process of evaluating a version or task to provide feedback and determine if it meets the required standards.
- Pipeline_step: A specific stage in the production process where tasks are completed and reviewed. Also sometimes referred to as a department.
- Unity: Our internal api for querying production data.
- UnityQL: A GraphQL API for accessing production data.
- Shotgrid: Our production tracking system.
- RV: A review tool used for viewing and annotating video footage.

6) Additional Rules:
- You are not allowed to make up any information. You must only use the information provided to you.
- You are not allowed to use any information that is not provided to you.
- If you are not provided with a transcript, you are not allowed to create a note. Return an empty string.
- If you are not provided with a version, you are not allowed to create a note. Return an empty string.
- Never output any other text then the notes. For example do not output your thought process or analysis of the note. Just output the note.
- When you write notes, you always include the department(s) the note is addressing - plus the artist(s) assigned to their respective task if the note is concerning them.
There may be times when elements inside of tasks are laid out. eg. Tattoos on an asset. Use your best judgement to pair the comment with the asset in the shot.
- Never introduce yourself in the note. Just get straight to the point and generate the note.
- Do not add soft requests to the note. for example, "Keep me up to date on...." or "Let me know if...". Only add hard requests such as specific actions that need to be taken on the version. For example, what to fix. 
- Prioritizes actionable feedback over general comments on work quality.

7) Formatting:
- Always format the note in the following format:
<department (pipeline step derived from task)> | <Asset/Shot> | <ALL assigned user(s) assigned to the task>\n
<the note you are generating>

8) Acceptance Criteria:
- The note must be in the format of the example note provided.
- The note must be accurate and relevant to the transcript and version data.
- The note must be clear and concise.
- The note must be complete and not missing any information.
- The note must be free of any errors.
- The note must be free of any emojis.
- The note must be free of any analysis. For example, do not say "This appears to be a note",  "I think this is a good note" or "I think this is a bad note". Just output the header and note.
- The note should follow the guidelines for formatting. 
- The note highlights things that are working well and action items that need to be addressed.



Now, here is the transcript and the version data:
"""




In [61]:
# Setup the LLMs and make the requests

import os
import dotenv
import requests
import openai
from pprint import pprint


dotenv.load_dotenv()

PROXY_URL = os.environ.get('PROXY_URL')
PROXY_API_KEY = os.environ.get('PROXY_API_KEY')


LLM_MAP = {
    "gpt-4o": {
        "model": "gpt-4o",
        "api_key": os.environ.get('OPENAI_API_KEY'),
        "type": "openai"
    },
    "gemini-2.5-flash": {
        "model": "gemini-2.5-flash",
        "type": "litellm"
    },
    "gemini-2.5-pro": {
        "model": "gemini-2.5-pro",
        "type": "litellm"
    }
    
}


responses = {}

prompt = f"""
{prompt}

Transcript: 
{json.dumps(segments)}

Version:
{json.dumps(version_obj)}

"""


for model_name, settings in LLM_MAP.items():
    print(f"\n{'='*50}")
    print(f"Model: {settings.get('model')}")
    
    if settings.get('type') == "openai":
        openai_client = openai.OpenAI(api_key=settings.get('api_key'))
        response = openai_client.chat.completions.create(
            model=settings.get('model'),
            messages=[{"role": "user", "content": prompt}]
        )
        
        responses[model_name] = response.choices[0].message.content
    elif settings.get('type') == "litellm":
        print(f"URL: ${PROXY_URL}")
        response = requests.post(
            f"{PROXY_URL}",
            headers={
                "Content-Type": "application/json", 
                "Authorization": f"Bearer {PROXY_API_KEY}"
            },
            json={"model": settings.get('model'), "messages": [{"role": "user", "content": prompt}]},
            verify=False
        )
        response = response.json()
        responses[model_name] = response.get('choices', [{}])[0].get('message', {}).get('content', '')
        

pprint(responses)





Model: gpt-4o

Model: gemini-2.5-flash
URL: $https://litellm.k8s-prod.ilm-sf.lucasfilm.com/v1/chat/completions





Model: gemini-2.5-pro
URL: $https://litellm.k8s-prod.ilm-sf.lucasfilm.com/v1/chat/completions




{'gemini-2.5-flash': 'Lighting (TD) | HSM_SATL_0010 | Sonia Demo\n'
                     '\n'
                     'Sun, background stars, and droid animation are looking '
                     'great. Team is still working on reflections, '
                     'specifically determining appropriate size. Reflection is '
                     'currently static; needs to be animated.',
 'gemini-2.5-pro': 'Lighting | HSM_SATL_0010 | Sonia Demo\n'
                   '\n'
                   'The sun and stars are looking great. The reflection on the '
                   'spaceship is currently static. Please animate the '
                   'reflection to move correctly with the ship and camera.',
 'gpt-4o': 'Lighting | HSM_SATL_0010 | Sonia Demo\n'
           '\n'
           'Great job on the sun lighting and the background stars. Continue '
           'refining the reflection to determine the appropriate size and '
           'ensure it moves realistically, as its current static nature '


In [62]:
review_prompt = f"""
We have generated a note for a version and transcript being reviewed from multiple LLMs. 

Please review the notes and provide a score for each note on a scale of 1-10, where 1 is the worst and 10 is the best.

Please provide a score for each note and a brief explanation for the score.

Grade notes based on the following criteria:

- Accuracy of the note
- Relevance to the version and transcript
- Clarity of the note
- Completeness of the note
- Adherence to the prompt. 
- That the note is not making up any information and matches the transcript and no addition information that was not mentioned in the transcript is provided. 

The original prompt was:

{prompt}

The notes generated were:

{json.dumps(responses)}

Return the scores and explanations in a JSON object.

After you have returned the scores and explanations, provide suggestion of improving the prompt to generate better notes.


"""

openai_client = openai.OpenAI(api_key=settings.get('api_key'))

response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": review_prompt}]
)

pprint(response.choices[0].message.content)

('```json\n'
 '{\n'
 '  "gpt-4o": {\n'
 '    "score": 9,\n'
 '    "explanation": "The note is accurate, relevant, and directly aligned '
 'with the transcript. It clearly addresses the lighting aspect of the shot '
 'and highlights the need for refinement of the reflection. The note is '
 'complete and adheres to the prompt guidelines, using precise language '
 'without unnecessary additions. The only minor drawback is the inclusion of '
 'the comment about the sun, which could be considered outside the actionable '
 'request, but it still relates to feedback given in the meeting."\n'
 '  },\n'
 '  "gemini-2.5-flash": {\n'
 '    "score": 8,\n'
 '    "explanation": "This note accurately reflects the transcript and is '
 'relevant to the feedback discussed. It addresses key points such as the '
 'reflections needing animation. However, it repeats the un-actionable phrase '
 "'sun, background stars, and droid animation are looking great,' which "
 "doesn't set clear tasks without evident 