# Compare Model Performance using the Generative AI Evaluation Service: Challenge Lab

GENAI063

## Task 1

In [1]:
%pip install --upgrade --quiet google-genai nest-asyncio==1.5.9

In [2]:
import pandas as pd
from inspect import cleandoc
from IPython.display import display, Markdown

import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig
from vertexai.evaluation import (
    MetricPromptTemplateExamples,
    EvalTask,
    PairwiseMetric,
    PairwiseMetricPromptTemplate,
    PointwiseMetric,
    PointwiseMetricPromptTemplate,
)

pd.set_option("display.max_colwidth", None)

In [3]:
project_id = 'qwiklabs-gcp-00-ba11daa0a661'
region = 'us-central1'

vertexai.init(project=project_id, location=region)

## Task 2

In [4]:
hourly_rates = cleandoc("""
  Screenwriter: $40
  Actor: $25
  Director: $30
  Camera Operator: $35
  Sound Engineer: $20
  Editor: $30
  """)

planning_notes = cleandoc("""
 Phases of Production:
   Writing:
   The Screenwriter will write the script.
   They need 72 hours to do so.


   Pre-Production:
   The Director needs time to analyze the script.
   They will work on it for 36 hours.
   The Camera Operator will join the director for 24 hours of planning.


   Production Phase 1
   The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer


   Production Phase 2
   The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer


   Post-Production
   The editor will take 64 hours to edit the film.
   The director will work with the editor for 24 hours during this phase.
""")

In [5]:
tasks = [
    """What is the cost of each phase of production?
    If days are mentioned, assume an 8 hour work day.""",

    """How many days will each phase require? Assume an
    8 hour work day. If multiple people are working in parallel,
    do not add those times together, but only use the longest time.
    Also include a count of the total number of days of the entire
    project.""",

    """Prepare a text schedule for all phases of the film starting
    on Feb 3, 2025. The whole crew should be off Saturdays
    and Sundays."""
]

In [6]:
prompt_template = cleandoc("""
  <instructions>
  Prepare a document to fulfill the task based on the context provided.
  </instructions>
<task>
  {task}
  </task>
<context>
  {context}
  </context>
  """)

In [7]:
llm_pro = GenerativeModel(
  "gemini-2.5-pro-preview-05-06",
  generation_config=GenerationConfig(temperature=0)
)

llm_flash = GenerativeModel(
  "gemini-2.0-flash-001",
  generation_config=GenerationConfig(temperature=0)
)

In [8]:
context = hourly_rates + "\n\n" + planning_notes

prompt = prompt_template.format(task=tasks[1], context=context)

In [9]:
Markdown(llm_pro.generate_content(prompt).text)

Based on the context provided, here is the breakdown of the number of days required for each phase and the total project duration, assuming an 8-hour workday.

### **Writing**

The Screenwriter requires 72 hours to write the script.

*   72 hours / 8 hours per day = **9 days**

### **Pre-Production**

The Director will work for 36 hours, and the Camera Operator will work for 24 hours. As these tasks are performed in parallel, the longest duration is used.

*   Longest time: 36 hours
*   36 hours / 8 hours per day = **4.5 days**

### **Production Phase 1**

The context explicitly states this phase will take **3 days**.

### **Production Phase 2**

The context explicitly states this phase will take **3 days**.

### **Post-Production**

The Editor will work for 64 hours, and the Director will work in parallel for 24 hours. The longest duration is used.

*   Longest time: 64 hours
*   64 hours / 8 hours per day = **8 days**

---

### **Total Project Duration**

The total number of days for the project is the sum of all phases.

*   **Writing:** 9 days
*   **Pre-Production:** 4.5 days
*   **Production Phase 1:** 3 days
*   **Production Phase 2:** 3 days
*   **Post-Production:** 8 days
*   **Total:** 9 + 4.5 + 3 + 3 + 8 = **27.5 days**

In [10]:
Markdown(llm_flash.generate_content(prompt).text)

**Phase Durations (8-hour workday):**

*   **Writing:** 72 hours / 8 hours/day = 9 days (Screenwriter)
*   **Pre-Production:**
    *   Director: 36 hours / 8 hours/day = 4.5 days
    *   Camera Operator: 24 hours / 8 hours/day = 3 days
    *   Longest time: 4.5 days
*   **Production Phase 1:** 3 days (Director, 4 Actors, Camera Operator, Sound Engineer)
*   **Production Phase 2:** 3 days (Director, 8 Actors, Camera Operator, Sound Engineer)
*   **Post-Production:**
    *   Editor: 64 hours / 8 hours/day = 8 days
    *   Director: 24 hours / 8 hours/day = 3 days
    *   Longest time: 8 days

**Total Project Duration:**

9 days (Writing) + 4.5 days (Pre-Production) + 3 days (Production Phase 1) + 3 days (Production Phase 2) + 8 days (Post-Production) = **27.5 days**


## Task 3

In [11]:
prompts = []
baseline_model_responses = []
responses = []

for task in tasks:
  prompt = prompt_template.format(task=task, context=context)
  prompts.append(prompt)
  baseline_model_responses.append(llm_pro.generate_content(prompt).text)
  responses.append(llm_flash.generate_content(prompt).text)


In [12]:
eval_dataset = pd.DataFrame(
  {
    "prompt":  prompts,
    "response":  responses,
    "baseline_model_response": baseline_model_responses,
  }
)

eval_dataset

Unnamed: 0,prompt,response,baseline_model_response
0,"<instructions>\n Prepare a document to fulfill the task based on the context provided.\n </instructions>\n<task>\n What is the cost of each phase of production?\n If days are mentioned, assume an 8 hour work day.\n </task>\n<context>\n Screenwriter: $40\nActor: $25\nDirector: $30\nCamera Operator: $35\nSound Engineer: $20\nEditor: $30\n\nPhases of Production:\n Writing:\n The Screenwriter will write the script.\n They need 72 hours to do so.\n\n\n Pre-Production:\n The Director needs time to analyze the script.\n They will work on it for 36 hours.\n The Camera Operator will join the director for 24 hours of planning.\n\n\n Production Phase 1\n The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer\n\n\n Production Phase 2\n The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer\n\n\n Post-Production\n The editor will take 64 hours to edit the film.\n The director will work with the editor for 24 hours during this phase.\n </context>\n",Here's a breakdown of the cost for each phase of production:\n\n**Rates:**\n\n* Screenwriter: $40/hour\n* Actor: $25/hour\n* Director: $30/hour\n* Camera Operator: $35/hour\n* Sound Engineer: $20/hour\n* Editor: $30/hour\n\n**Phase Costs:**\n\n* **Writing:**\n * Screenwriter: 72 hours * $40/hour = $2880\n * **Total Writing Cost: $2880**\n\n* **Pre-Production:**\n * Director: 36 hours * $30/hour = $1080\n * Camera Operator: 24 hours * $35/hour = $840\n * **Total Pre-Production Cost: $1920**\n\n* **Production Phase 1 (3 days = 24 hours):**\n * Director: 24 hours * $30/hour = $720\n * Actors (4): 24 hours * $25/hour * 4 actors = $2400\n * Camera Operator: 24 hours * $35/hour = $840\n * Sound Engineer: 24 hours * $20/hour = $480\n * **Total Production Phase 1 Cost: $4440**\n\n* **Production Phase 2 (3 days = 24 hours):**\n * Director: 24 hours * $30/hour = $720\n * Actors (8): 24 hours * $25/hour * 8 actors = $4800\n * Camera Operator: 24 hours * $35/hour = $840\n * Sound Engineer: 24 hours * $20/hour = $480\n * **Total Production Phase 2 Cost: $6840**\n\n* **Post-Production:**\n * Editor: 64 hours * $30/hour = $1920\n * Director: 24 hours * $30/hour = $720\n * **Total Post-Production Cost: $2640**\n,"Based on the context provided, here is the cost for each phase of production.\n\n### **Writing**\n\nThis phase involves the Screenwriter creating the script.\n\n* Screenwriter: 72 hours @ $40/hour = $2,880.00\n\n**Total Writing Cost: $2,880.00**\n\n---\n\n### **Pre-Production**\n\nThis phase involves the Director and Camera Operator planning and analyzing the script.\n\n* Director: 36 hours @ $30/hour = $1,080.00\n* Camera Operator: 24 hours @ $35/hour = $840.00\n\n**Total Pre-Production Cost: $1,920.00**\n\n---\n\n### **Production Phase 1**\n\nThis phase covers the first three days of filming (3 days x 8 hours/day = 24 hours).\n\n* Director: 24 hours @ $30/hour = $720.00\n* 4 Actors: 24 hours x 4 @ $25/hour = $2,400.00\n* Camera Operator: 24 hours @ $35/hour = $840.00\n* Sound Engineer: 24 hours @ $20/hour = $480.00\n\n**Total Production Phase 1 Cost: $4,440.00**\n\n---\n\n### **Production Phase 2**\n\nThis phase covers the next three days of filming (3 days x 8 hours/day = 24 hours).\n\n* Director: 24 hours @ $30/hour = $720.00\n* 8 Actors: 24 hours x 8 @ $25/hour = $4,800.00\n* Camera Operator: 24 hours @ $35/hour = $840.00\n* Sound Engineer: 24 hours @ $20/hour = $480.00\n\n**Total Production Phase 2 Cost: $6,840.00**\n\n---\n\n### **Post-Production**\n\nThis phase involves the Editor and Director finalizing the film.\n\n* Editor: 64 hours @ $30/hour = $1,920.00\n* Director: 24 hours @ $30/hour = $720.00\n\n**Total Post-Production Cost: $2,640.00**"
1,"<instructions>\n Prepare a document to fulfill the task based on the context provided.\n </instructions>\n<task>\n How many days will each phase require? Assume an\n 8 hour work day. If multiple people are working in parallel,\n do not add those times together, but only use the longest time.\n Also include a count of the total number of days of the entire\n project.\n </task>\n<context>\n Screenwriter: $40\nActor: $25\nDirector: $30\nCamera Operator: $35\nSound Engineer: $20\nEditor: $30\n\nPhases of Production:\n Writing:\n The Screenwriter will write the script.\n They need 72 hours to do so.\n\n\n Pre-Production:\n The Director needs time to analyze the script.\n They will work on it for 36 hours.\n The Camera Operator will join the director for 24 hours of planning.\n\n\n Production Phase 1\n The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer\n\n\n Production Phase 2\n The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer\n\n\n Post-Production\n The editor will take 64 hours to edit the film.\n The director will work with the editor for 24 hours during this phase.\n </context>\n","**Phase Durations (8-hour workday):**\n\n* **Writing:** 72 hours / 8 hours/day = 9 days (Screenwriter)\n* **Pre-Production:**\n * Director: 36 hours / 8 hours/day = 4.5 days\n * Camera Operator: 24 hours / 8 hours/day = 3 days\n * Longest time: 4.5 days\n* **Production Phase 1:** 3 days (Director, 4 Actors, Camera Operator, Sound Engineer)\n* **Production Phase 2:** 3 days (Director, 8 Actors, Camera Operator, Sound Engineer)\n* **Post-Production:**\n * Editor: 64 hours / 8 hours/day = 8 days\n * Director: 24 hours / 8 hours/day = 3 days\n * Longest time: 8 days\n\n**Total Project Duration:**\n\n9 days (Writing) + 4.5 days (Pre-Production) + 3 days (Production Phase 1) + 3 days (Production Phase 2) + 8 days (Post-Production) = **27.5 days**\n","Based on the context provided, here is a breakdown of the days required for each phase and the total project duration, assuming an 8-hour workday.\n\n### **Phase Durations**\n\n**Writing:**\nThe Screenwriter requires 72 hours to complete the script.\n* 72 hours / 8 hours per day = **9 days**\n\n**Pre-Production:**\nThe Director will work for 36 hours, and the Camera Operator will work for 24 hours. Since these activities overlap, we use the longest duration.\n* 36 hours / 8 hours per day = **4.5 days**\n\n**Production Phase 1:**\nThe duration for the first phase of filming is explicitly stated.\n* **3 days**\n\n**Production Phase 2:**\nThe duration for the second phase of filming is explicitly stated.\n* **3 days**\n\n**Post-Production:**\nThe Editor will work for 64 hours, and the Director will work alongside them for 24 of those hours. We use the longest duration.\n* 64 hours / 8 hours per day = **8 days**\n\n---\n\n### **Total Project Duration**\n\nTo find the total number of days for the project, we add the duration of each phase.\n\n* **Writing:** 9 days\n* **Pre-Production:** 4.5 days\n* **Production Phase 1:** 3 days\n* **Production Phase 2:** 3 days\n* **Post-Production:** 8 days\n\n**Total Project Days: 27.5 days**"
2,"<instructions>\n Prepare a document to fulfill the task based on the context provided.\n </instructions>\n<task>\n Prepare a text schedule for all phases of the film starting\n on Feb 3, 2025. The whole crew should be off Saturdays\n and Sundays.\n </task>\n<context>\n Screenwriter: $40\nActor: $25\nDirector: $30\nCamera Operator: $35\nSound Engineer: $20\nEditor: $30\n\nPhases of Production:\n Writing:\n The Screenwriter will write the script.\n They need 72 hours to do so.\n\n\n Pre-Production:\n The Director needs time to analyze the script.\n They will work on it for 36 hours.\n The Camera Operator will join the director for 24 hours of planning.\n\n\n Production Phase 1\n The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer\n\n\n Production Phase 2\n The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer\n\n\n Post-Production\n The editor will take 64 hours to edit the film.\n The director will work with the editor for 24 hours during this phase.\n </context>\n","Okay, here's a production schedule for the film, starting February 3rd, 2025, with weekends off:\n\n**Film Production Schedule (Starting Feb 3, 2025)**\n\n**Key:**\n\n* SC = Screenwriter\n* D = Director\n* CO = Camera Operator\n* SE = Sound Engineer\n* E = Editor\n* A = Actor(s)\n\n**Phase 1: Writing (Screenwriter - 72 hours)**\n\n* **Week 1:**\n * Mon, Feb 3: SC - 8 hours\n * Tue, Feb 4: SC - 8 hours\n * Wed, Feb 5: SC - 8 hours\n * Thu, Feb 6: SC - 8 hours\n * Fri, Feb 7: SC - 8 hours\n* **Week 2:**\n * Mon, Feb 10: SC - 8 hours\n * Tue, Feb 11: SC - 8 hours\n * Wed, Feb 12: SC - 8 hours\n * Thu, Feb 13: SC - 8 hours\n\n**Phase 2: Pre-Production (Director - 36 hours, Camera Operator - 24 hours)**\n\n* **Week 2:**\n * Fri, Feb 14: D - 8 hours\n* **Week 3:**\n * Mon, Feb 17: D - 8 hours, CO - 8 hours\n * Tue, Feb 18: D - 8 hours, CO - 8 hours\n * Wed, Feb 19: D - 8 hours, CO - 8 hours\n * Thu, Feb 20: D - 4 hours\n\n**Phase 3: Production Phase 1 (Director, 4 Actors, Camera Operator, Sound Engineer)**\n\n* **Week 3:**\n * Fri, Feb 21: D, 4A, CO, SE - 8 hours\n* **Week 4:**\n * Mon, Feb 24: D, 4A, CO, SE - 8 hours\n * Tue, Feb 25: D, 4A, CO, SE - 8 hours\n\n**Phase 4: Production Phase 2 (Director, 8 Actors, Camera Operator, Sound Engineer)**\n\n* **Week 4:**\n * Wed, Feb 26: D, 8A, CO, SE - 8 hours\n * Thu, Feb 27: D, 8A, CO, SE - 8 hours\n * Fri, Feb 28: D, 8A, CO, SE - 8 hours\n\n**Phase 5: Post-Production (Editor - 64 hours, Director - 24 hours)**\n\n* **Week 5:**\n * Mon, Mar 3: E - 8 hours\n * Tue, Mar 4: E - 8 hours\n * Wed, Mar 5: E - 8 hours\n * Thu, Mar 6: E - 8 hours\n * Fri, Mar 7: E - 8 hours\n* **Week 6:**\n * Mon, Mar 10: E - 8 hours\n * Tue, Mar 11: E - 8 hours\n * Wed, Mar 12: E - 8 hours\n * Thu, Mar 13: D, E - 8 hours\n * Fri, Mar 14: D, E - 8 hours\n* **Week 7:**\n * Mon, Mar 17: D, E - 8 hours\n\n**Notes:**\n\n* This schedule assumes 8-hour workdays.\n* This schedule is a template and can be adjusted based on the specific needs of the production.\n* Travel time for cast and crew is not factored in.\n* Buffer days are not included. Consider adding buffer days between phases to account for potential delays.\n* The number of actors needed each day is specified (4A, 8A).\n","Of course. Here is the text schedule for all phases of the film, starting on February 3, 2025.\n\n***\n\n### **Film Production Schedule**\n\n**Start Date:** Monday, February 3, 2025\n\n**General Notes:**\n* This schedule is based on a standard 8-hour workday.\n* The entire crew is off on Saturdays and Sundays.\n* All phases are scheduled sequentially.\n\n---\n\n### **Phase 1: Writing**\nThe Screenwriter will write the script.\n\n* **Personnel:** Screenwriter\n* **Duration:** 72 hours (9 workdays)\n* **Schedule:** Monday, Feb 3, 2025 – Thursday, Feb 13, 2025\n\n---\n\n### **Phase 2: Pre-Production**\nThe Director and Camera Operator will analyze the script and plan the shoot. The Director's time is the longest, determining the phase length.\n\n* **Personnel:** Director, Camera Operator\n* **Duration:** 36 hours (4.5 workdays)\n* **Schedule:** Friday, Feb 14, 2025 – Thursday, Feb 20, 2025\n * **Director (36 hours):** Works all 4.5 days of this phase.\n * **Camera Operator (24 hours):** Joins the Director for the final 3 days (Tuesday, Feb 18 – Thursday, Feb 20).\n\n---\n\n### **Phase 3: Production Phase 1**\nThe first block of filming.\n\n* **Personnel:** Director, 4 Actors, Camera Operator, Sound Engineer\n* **Duration:** 3 workdays\n* **Schedule:** Friday, Feb 21, 2025 – Tuesday, Feb 25, 2025\n * *Note: Skips the weekend of Feb 22-23.*\n\n---\n\n### **Phase 4: Production Phase 2**\nThe second block of filming.\n\n* **Personnel:** Director, 8 Actors, Camera Operator, Sound Engineer\n* **Duration:** 3 workdays\n* **Schedule:** Wednesday, Feb 26, 2025 – Friday, Feb 28, 2025\n\n---\n\n### **Phase 5: Post-Production**\nThe Editor assembles the film, with the Director joining for key review sessions. The Editor's time is the longest, determining the phase length.\n\n* **Personnel:** Editor, Director\n* **Duration:** 64 hours (8 workdays)\n* **Schedule:** Monday, Mar 3, 2025 – Wednesday, Mar 12, 2025\n * **Editor (64 hours):** Works all 8 days of this phase.\n * **Director (24 hours):** Joins the Editor for the final 3 days for review (Monday, Mar 10 – Wednesday, Mar 12).\n\n---\n**Project Conclusion:** Wednesday, March 12, 2025"


## Task 4

In [13]:
eval_task = EvalTask(
    dataset=eval_dataset,
    metrics= [MetricPromptTemplateExamples.Pairwise.QUESTION_ANSWERING_QUALITY],
    experiment="indie-film-planning",
)

eval_result = eval_task.evaluate()

INFO:vertexai.evaluation._evaluation:Computing metrics with a total of 3 Vertex Gen AI Evaluation Service API requests.
100%|██████████| 3/3 [00:01<00:00,  2.11it/s]
INFO:vertexai.evaluation._evaluation:All 3 metric requests are successfully computed.
INFO:vertexai.evaluation._evaluation:Evaluation Took:1.436258837999958 seconds


In [14]:
from vertexai.preview.evaluation import notebook_utils
notebook_utils.display_eval_result(eval_result)

### Summary Metrics

Unnamed: 0,row_count,pairwise_question_answering_quality/candidate_model_win_rate,pairwise_question_answering_quality/baseline_model_win_rate
0,3.0,0.333333,0.0


### Row-based Metrics

Unnamed: 0,prompt,response,baseline_model_response,pairwise_question_answering_quality/explanation,pairwise_question_answering_quality/pairwise_choice
0,"<instructions>\n Prepare a document to fulfill the task based on the context provided.\n </instructions>\n<task>\n What is the cost of each phase of production?\n If days are mentioned, assume an 8 hour work day.\n </task>\n<context>\n Screenwriter: $40\nActor: $25\nDirector: $30\nCamera Operator: $35\nSound Engineer: $20\nEditor: $30\n\nPhases of Production:\n Writing:\n The Screenwriter will write the script.\n They need 72 hours to do so.\n\n\n Pre-Production:\n The Director needs time to analyze the script.\n They will work on it for 36 hours.\n The Camera Operator will join the director for 24 hours of planning.\n\n\n Production Phase 1\n The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer\n\n\n Production Phase 2\n The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer\n\n\n Post-Production\n The editor will take 64 hours to edit the film.\n The director will work with the editor for 24 hours during this phase.\n </context>\n",Here's a breakdown of the cost for each phase of production:\n\n**Rates:**\n\n* Screenwriter: $40/hour\n* Actor: $25/hour\n* Director: $30/hour\n* Camera Operator: $35/hour\n* Sound Engineer: $20/hour\n* Editor: $30/hour\n\n**Phase Costs:**\n\n* **Writing:**\n * Screenwriter: 72 hours * $40/hour = $2880\n * **Total Writing Cost: $2880**\n\n* **Pre-Production:**\n * Director: 36 hours * $30/hour = $1080\n * Camera Operator: 24 hours * $35/hour = $840\n * **Total Pre-Production Cost: $1920**\n\n* **Production Phase 1 (3 days = 24 hours):**\n * Director: 24 hours * $30/hour = $720\n * Actors (4): 24 hours * $25/hour * 4 actors = $2400\n * Camera Operator: 24 hours * $35/hour = $840\n * Sound Engineer: 24 hours * $20/hour = $480\n * **Total Production Phase 1 Cost: $4440**\n\n* **Production Phase 2 (3 days = 24 hours):**\n * Director: 24 hours * $30/hour = $720\n * Actors (8): 24 hours * $25/hour * 8 actors = $4800\n * Camera Operator: 24 hours * $35/hour = $840\n * Sound Engineer: 24 hours * $20/hour = $480\n * **Total Production Phase 2 Cost: $6840**\n\n* **Post-Production:**\n * Editor: 64 hours * $30/hour = $1920\n * Director: 24 hours * $30/hour = $720\n * **Total Post-Production Cost: $2640**\n,"Based on the context provided, here is the cost for each phase of production.\n\n### **Writing**\n\nThis phase involves the Screenwriter creating the script.\n\n* Screenwriter: 72 hours @ $40/hour = $2,880.00\n\n**Total Writing Cost: $2,880.00**\n\n---\n\n### **Pre-Production**\n\nThis phase involves the Director and Camera Operator planning and analyzing the script.\n\n* Director: 36 hours @ $30/hour = $1,080.00\n* Camera Operator: 24 hours @ $35/hour = $840.00\n\n**Total Pre-Production Cost: $1,920.00**\n\n---\n\n### **Production Phase 1**\n\nThis phase covers the first three days of filming (3 days x 8 hours/day = 24 hours).\n\n* Director: 24 hours @ $30/hour = $720.00\n* 4 Actors: 24 hours x 4 @ $25/hour = $2,400.00\n* Camera Operator: 24 hours @ $35/hour = $840.00\n* Sound Engineer: 24 hours @ $20/hour = $480.00\n\n**Total Production Phase 1 Cost: $4,440.00**\n\n---\n\n### **Production Phase 2**\n\nThis phase covers the next three days of filming (3 days x 8 hours/day = 24 hours).\n\n* Director: 24 hours @ $30/hour = $720.00\n* 8 Actors: 24 hours x 8 @ $25/hour = $4,800.00\n* Camera Operator: 24 hours @ $35/hour = $840.00\n* Sound Engineer: 24 hours @ $20/hour = $480.00\n\n**Total Production Phase 2 Cost: $6,840.00**\n\n---\n\n### **Post-Production**\n\nThis phase involves the Editor and Director finalizing the film.\n\n* Editor: 64 hours @ $30/hour = $1,920.00\n* Director: 24 hours @ $30/hour = $720.00\n\n**Total Post-Production Cost: $2,640.00**","Both responses followed the prompt and produced the exact same information. They are both equally well, and there is no reason to prefer one over the other.",TIE
1,"<instructions>\n Prepare a document to fulfill the task based on the context provided.\n </instructions>\n<task>\n How many days will each phase require? Assume an\n 8 hour work day. If multiple people are working in parallel,\n do not add those times together, but only use the longest time.\n Also include a count of the total number of days of the entire\n project.\n </task>\n<context>\n Screenwriter: $40\nActor: $25\nDirector: $30\nCamera Operator: $35\nSound Engineer: $20\nEditor: $30\n\nPhases of Production:\n Writing:\n The Screenwriter will write the script.\n They need 72 hours to do so.\n\n\n Pre-Production:\n The Director needs time to analyze the script.\n They will work on it for 36 hours.\n The Camera Operator will join the director for 24 hours of planning.\n\n\n Production Phase 1\n The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer\n\n\n Production Phase 2\n The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer\n\n\n Post-Production\n The editor will take 64 hours to edit the film.\n The director will work with the editor for 24 hours during this phase.\n </context>\n","**Phase Durations (8-hour workday):**\n\n* **Writing:** 72 hours / 8 hours/day = 9 days (Screenwriter)\n* **Pre-Production:**\n * Director: 36 hours / 8 hours/day = 4.5 days\n * Camera Operator: 24 hours / 8 hours/day = 3 days\n * Longest time: 4.5 days\n* **Production Phase 1:** 3 days (Director, 4 Actors, Camera Operator, Sound Engineer)\n* **Production Phase 2:** 3 days (Director, 8 Actors, Camera Operator, Sound Engineer)\n* **Post-Production:**\n * Editor: 64 hours / 8 hours/day = 8 days\n * Director: 24 hours / 8 hours/day = 3 days\n * Longest time: 8 days\n\n**Total Project Duration:**\n\n9 days (Writing) + 4.5 days (Pre-Production) + 3 days (Production Phase 1) + 3 days (Production Phase 2) + 8 days (Post-Production) = **27.5 days**\n","Based on the context provided, here is a breakdown of the days required for each phase and the total project duration, assuming an 8-hour workday.\n\n### **Phase Durations**\n\n**Writing:**\nThe Screenwriter requires 72 hours to complete the script.\n* 72 hours / 8 hours per day = **9 days**\n\n**Pre-Production:**\nThe Director will work for 36 hours, and the Camera Operator will work for 24 hours. Since these activities overlap, we use the longest duration.\n* 36 hours / 8 hours per day = **4.5 days**\n\n**Production Phase 1:**\nThe duration for the first phase of filming is explicitly stated.\n* **3 days**\n\n**Production Phase 2:**\nThe duration for the second phase of filming is explicitly stated.\n* **3 days**\n\n**Post-Production:**\nThe Editor will work for 64 hours, and the Director will work alongside them for 24 of those hours. We use the longest duration.\n* 64 hours / 8 hours per day = **8 days**\n\n---\n\n### **Total Project Duration**\n\nTo find the total number of days for the project, we add the duration of each phase.\n\n* **Writing:** 9 days\n* **Pre-Production:** 4.5 days\n* **Production Phase 1:** 3 days\n* **Production Phase 2:** 3 days\n* **Post-Production:** 8 days\n\n**Total Project Days: 27.5 days**","Both responses accurately calculate the durations for each phase and the total project duration based on the provided context, demonstrating a clear understanding of the instructions and constraints.",TIE
2,"<instructions>\n Prepare a document to fulfill the task based on the context provided.\n </instructions>\n<task>\n Prepare a text schedule for all phases of the film starting\n on Feb 3, 2025. The whole crew should be off Saturdays\n and Sundays.\n </task>\n<context>\n Screenwriter: $40\nActor: $25\nDirector: $30\nCamera Operator: $35\nSound Engineer: $20\nEditor: $30\n\nPhases of Production:\n Writing:\n The Screenwriter will write the script.\n They need 72 hours to do so.\n\n\n Pre-Production:\n The Director needs time to analyze the script.\n They will work on it for 36 hours.\n The Camera Operator will join the director for 24 hours of planning.\n\n\n Production Phase 1\n The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer\n\n\n Production Phase 2\n The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer\n\n\n Post-Production\n The editor will take 64 hours to edit the film.\n The director will work with the editor for 24 hours during this phase.\n </context>\n","Okay, here's a production schedule for the film, starting February 3rd, 2025, with weekends off:\n\n**Film Production Schedule (Starting Feb 3, 2025)**\n\n**Key:**\n\n* SC = Screenwriter\n* D = Director\n* CO = Camera Operator\n* SE = Sound Engineer\n* E = Editor\n* A = Actor(s)\n\n**Phase 1: Writing (Screenwriter - 72 hours)**\n\n* **Week 1:**\n * Mon, Feb 3: SC - 8 hours\n * Tue, Feb 4: SC - 8 hours\n * Wed, Feb 5: SC - 8 hours\n * Thu, Feb 6: SC - 8 hours\n * Fri, Feb 7: SC - 8 hours\n* **Week 2:**\n * Mon, Feb 10: SC - 8 hours\n * Tue, Feb 11: SC - 8 hours\n * Wed, Feb 12: SC - 8 hours\n * Thu, Feb 13: SC - 8 hours\n\n**Phase 2: Pre-Production (Director - 36 hours, Camera Operator - 24 hours)**\n\n* **Week 2:**\n * Fri, Feb 14: D - 8 hours\n* **Week 3:**\n * Mon, Feb 17: D - 8 hours, CO - 8 hours\n * Tue, Feb 18: D - 8 hours, CO - 8 hours\n * Wed, Feb 19: D - 8 hours, CO - 8 hours\n * Thu, Feb 20: D - 4 hours\n\n**Phase 3: Production Phase 1 (Director, 4 Actors, Camera Operator, Sound Engineer)**\n\n* **Week 3:**\n * Fri, Feb 21: D, 4A, CO, SE - 8 hours\n* **Week 4:**\n * Mon, Feb 24: D, 4A, CO, SE - 8 hours\n * Tue, Feb 25: D, 4A, CO, SE - 8 hours\n\n**Phase 4: Production Phase 2 (Director, 8 Actors, Camera Operator, Sound Engineer)**\n\n* **Week 4:**\n * Wed, Feb 26: D, 8A, CO, SE - 8 hours\n * Thu, Feb 27: D, 8A, CO, SE - 8 hours\n * Fri, Feb 28: D, 8A, CO, SE - 8 hours\n\n**Phase 5: Post-Production (Editor - 64 hours, Director - 24 hours)**\n\n* **Week 5:**\n * Mon, Mar 3: E - 8 hours\n * Tue, Mar 4: E - 8 hours\n * Wed, Mar 5: E - 8 hours\n * Thu, Mar 6: E - 8 hours\n * Fri, Mar 7: E - 8 hours\n* **Week 6:**\n * Mon, Mar 10: E - 8 hours\n * Tue, Mar 11: E - 8 hours\n * Wed, Mar 12: E - 8 hours\n * Thu, Mar 13: D, E - 8 hours\n * Fri, Mar 14: D, E - 8 hours\n* **Week 7:**\n * Mon, Mar 17: D, E - 8 hours\n\n**Notes:**\n\n* This schedule assumes 8-hour workdays.\n* This schedule is a template and can be adjusted based on the specific needs of the production.\n* Travel time for cast and crew is not factored in.\n* Buffer days are not included. Consider adding buffer days between phases to account for potential delays.\n* The number of actors needed each day is specified (4A, 8A).\n","Of course. Here is the text schedule for all phases of the film, starting on February 3, 2025.\n\n***\n\n### **Film Production Schedule**\n\n**Start Date:** Monday, February 3, 2025\n\n**General Notes:**\n* This schedule is based on a standard 8-hour workday.\n* The entire crew is off on Saturdays and Sundays.\n* All phases are scheduled sequentially.\n\n---\n\n### **Phase 1: Writing**\nThe Screenwriter will write the script.\n\n* **Personnel:** Screenwriter\n* **Duration:** 72 hours (9 workdays)\n* **Schedule:** Monday, Feb 3, 2025 – Thursday, Feb 13, 2025\n\n---\n\n### **Phase 2: Pre-Production**\nThe Director and Camera Operator will analyze the script and plan the shoot. The Director's time is the longest, determining the phase length.\n\n* **Personnel:** Director, Camera Operator\n* **Duration:** 36 hours (4.5 workdays)\n* **Schedule:** Friday, Feb 14, 2025 – Thursday, Feb 20, 2025\n * **Director (36 hours):** Works all 4.5 days of this phase.\n * **Camera Operator (24 hours):** Joins the Director for the final 3 days (Tuesday, Feb 18 – Thursday, Feb 20).\n\n---\n\n### **Phase 3: Production Phase 1**\nThe first block of filming.\n\n* **Personnel:** Director, 4 Actors, Camera Operator, Sound Engineer\n* **Duration:** 3 workdays\n* **Schedule:** Friday, Feb 21, 2025 – Tuesday, Feb 25, 2025\n * *Note: Skips the weekend of Feb 22-23.*\n\n---\n\n### **Phase 4: Production Phase 2**\nThe second block of filming.\n\n* **Personnel:** Director, 8 Actors, Camera Operator, Sound Engineer\n* **Duration:** 3 workdays\n* **Schedule:** Wednesday, Feb 26, 2025 – Friday, Feb 28, 2025\n\n---\n\n### **Phase 5: Post-Production**\nThe Editor assembles the film, with the Director joining for key review sessions. The Editor's time is the longest, determining the phase length.\n\n* **Personnel:** Editor, Director\n* **Duration:** 64 hours (8 workdays)\n* **Schedule:** Monday, Mar 3, 2025 – Wednesday, Mar 12, 2025\n * **Editor (64 hours):** Works all 8 days of this phase.\n * **Director (24 hours):** Joins the Editor for the final 3 days for review (Monday, Mar 10 – Wednesday, Mar 12).\n\n---\n**Project Conclusion:** Wednesday, March 12, 2025","Model A is superior as it accurately computes the end dates for each phase, adhering to the constraints of only scheduling work on weekdays, which Model B fails to do.",CANDIDATE


In [15]:
summary_table = eval_result.summary_metrics
display(summary_table)

{'row_count': 3,
 'pairwise_question_answering_quality/candidate_model_win_rate': np.float64(0.3333333333333333),
 'pairwise_question_answering_quality/baseline_model_win_rate': np.float64(0.0)}

In [16]:
metrics_table = eval_result.metrics_table

display(metrics_table)

Unnamed: 0,prompt,response,baseline_model_response,pairwise_question_answering_quality/explanation,pairwise_question_answering_quality/pairwise_choice
0,"<instructions>\n Prepare a document to fulfill the task based on the context provided.\n </instructions>\n<task>\n What is the cost of each phase of production?\n If days are mentioned, assume an 8 hour work day.\n </task>\n<context>\n Screenwriter: $40\nActor: $25\nDirector: $30\nCamera Operator: $35\nSound Engineer: $20\nEditor: $30\n\nPhases of Production:\n Writing:\n The Screenwriter will write the script.\n They need 72 hours to do so.\n\n\n Pre-Production:\n The Director needs time to analyze the script.\n They will work on it for 36 hours.\n The Camera Operator will join the director for 24 hours of planning.\n\n\n Production Phase 1\n The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer\n\n\n Production Phase 2\n The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer\n\n\n Post-Production\n The editor will take 64 hours to edit the film.\n The director will work with the editor for 24 hours during this phase.\n </context>\n",Here's a breakdown of the cost for each phase of production:\n\n**Rates:**\n\n* Screenwriter: $40/hour\n* Actor: $25/hour\n* Director: $30/hour\n* Camera Operator: $35/hour\n* Sound Engineer: $20/hour\n* Editor: $30/hour\n\n**Phase Costs:**\n\n* **Writing:**\n * Screenwriter: 72 hours * $40/hour = $2880\n * **Total Writing Cost: $2880**\n\n* **Pre-Production:**\n * Director: 36 hours * $30/hour = $1080\n * Camera Operator: 24 hours * $35/hour = $840\n * **Total Pre-Production Cost: $1920**\n\n* **Production Phase 1 (3 days = 24 hours):**\n * Director: 24 hours * $30/hour = $720\n * Actors (4): 24 hours * $25/hour * 4 actors = $2400\n * Camera Operator: 24 hours * $35/hour = $840\n * Sound Engineer: 24 hours * $20/hour = $480\n * **Total Production Phase 1 Cost: $4440**\n\n* **Production Phase 2 (3 days = 24 hours):**\n * Director: 24 hours * $30/hour = $720\n * Actors (8): 24 hours * $25/hour * 8 actors = $4800\n * Camera Operator: 24 hours * $35/hour = $840\n * Sound Engineer: 24 hours * $20/hour = $480\n * **Total Production Phase 2 Cost: $6840**\n\n* **Post-Production:**\n * Editor: 64 hours * $30/hour = $1920\n * Director: 24 hours * $30/hour = $720\n * **Total Post-Production Cost: $2640**\n,"Based on the context provided, here is the cost for each phase of production.\n\n### **Writing**\n\nThis phase involves the Screenwriter creating the script.\n\n* Screenwriter: 72 hours @ $40/hour = $2,880.00\n\n**Total Writing Cost: $2,880.00**\n\n---\n\n### **Pre-Production**\n\nThis phase involves the Director and Camera Operator planning and analyzing the script.\n\n* Director: 36 hours @ $30/hour = $1,080.00\n* Camera Operator: 24 hours @ $35/hour = $840.00\n\n**Total Pre-Production Cost: $1,920.00**\n\n---\n\n### **Production Phase 1**\n\nThis phase covers the first three days of filming (3 days x 8 hours/day = 24 hours).\n\n* Director: 24 hours @ $30/hour = $720.00\n* 4 Actors: 24 hours x 4 @ $25/hour = $2,400.00\n* Camera Operator: 24 hours @ $35/hour = $840.00\n* Sound Engineer: 24 hours @ $20/hour = $480.00\n\n**Total Production Phase 1 Cost: $4,440.00**\n\n---\n\n### **Production Phase 2**\n\nThis phase covers the next three days of filming (3 days x 8 hours/day = 24 hours).\n\n* Director: 24 hours @ $30/hour = $720.00\n* 8 Actors: 24 hours x 8 @ $25/hour = $4,800.00\n* Camera Operator: 24 hours @ $35/hour = $840.00\n* Sound Engineer: 24 hours @ $20/hour = $480.00\n\n**Total Production Phase 2 Cost: $6,840.00**\n\n---\n\n### **Post-Production**\n\nThis phase involves the Editor and Director finalizing the film.\n\n* Editor: 64 hours @ $30/hour = $1,920.00\n* Director: 24 hours @ $30/hour = $720.00\n\n**Total Post-Production Cost: $2,640.00**","Both responses followed the prompt and produced the exact same information. They are both equally well, and there is no reason to prefer one over the other.",TIE
1,"<instructions>\n Prepare a document to fulfill the task based on the context provided.\n </instructions>\n<task>\n How many days will each phase require? Assume an\n 8 hour work day. If multiple people are working in parallel,\n do not add those times together, but only use the longest time.\n Also include a count of the total number of days of the entire\n project.\n </task>\n<context>\n Screenwriter: $40\nActor: $25\nDirector: $30\nCamera Operator: $35\nSound Engineer: $20\nEditor: $30\n\nPhases of Production:\n Writing:\n The Screenwriter will write the script.\n They need 72 hours to do so.\n\n\n Pre-Production:\n The Director needs time to analyze the script.\n They will work on it for 36 hours.\n The Camera Operator will join the director for 24 hours of planning.\n\n\n Production Phase 1\n The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer\n\n\n Production Phase 2\n The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer\n\n\n Post-Production\n The editor will take 64 hours to edit the film.\n The director will work with the editor for 24 hours during this phase.\n </context>\n","**Phase Durations (8-hour workday):**\n\n* **Writing:** 72 hours / 8 hours/day = 9 days (Screenwriter)\n* **Pre-Production:**\n * Director: 36 hours / 8 hours/day = 4.5 days\n * Camera Operator: 24 hours / 8 hours/day = 3 days\n * Longest time: 4.5 days\n* **Production Phase 1:** 3 days (Director, 4 Actors, Camera Operator, Sound Engineer)\n* **Production Phase 2:** 3 days (Director, 8 Actors, Camera Operator, Sound Engineer)\n* **Post-Production:**\n * Editor: 64 hours / 8 hours/day = 8 days\n * Director: 24 hours / 8 hours/day = 3 days\n * Longest time: 8 days\n\n**Total Project Duration:**\n\n9 days (Writing) + 4.5 days (Pre-Production) + 3 days (Production Phase 1) + 3 days (Production Phase 2) + 8 days (Post-Production) = **27.5 days**\n","Based on the context provided, here is a breakdown of the days required for each phase and the total project duration, assuming an 8-hour workday.\n\n### **Phase Durations**\n\n**Writing:**\nThe Screenwriter requires 72 hours to complete the script.\n* 72 hours / 8 hours per day = **9 days**\n\n**Pre-Production:**\nThe Director will work for 36 hours, and the Camera Operator will work for 24 hours. Since these activities overlap, we use the longest duration.\n* 36 hours / 8 hours per day = **4.5 days**\n\n**Production Phase 1:**\nThe duration for the first phase of filming is explicitly stated.\n* **3 days**\n\n**Production Phase 2:**\nThe duration for the second phase of filming is explicitly stated.\n* **3 days**\n\n**Post-Production:**\nThe Editor will work for 64 hours, and the Director will work alongside them for 24 of those hours. We use the longest duration.\n* 64 hours / 8 hours per day = **8 days**\n\n---\n\n### **Total Project Duration**\n\nTo find the total number of days for the project, we add the duration of each phase.\n\n* **Writing:** 9 days\n* **Pre-Production:** 4.5 days\n* **Production Phase 1:** 3 days\n* **Production Phase 2:** 3 days\n* **Post-Production:** 8 days\n\n**Total Project Days: 27.5 days**","Both responses accurately calculate the durations for each phase and the total project duration based on the provided context, demonstrating a clear understanding of the instructions and constraints.",TIE
2,"<instructions>\n Prepare a document to fulfill the task based on the context provided.\n </instructions>\n<task>\n Prepare a text schedule for all phases of the film starting\n on Feb 3, 2025. The whole crew should be off Saturdays\n and Sundays.\n </task>\n<context>\n Screenwriter: $40\nActor: $25\nDirector: $30\nCamera Operator: $35\nSound Engineer: $20\nEditor: $30\n\nPhases of Production:\n Writing:\n The Screenwriter will write the script.\n They need 72 hours to do so.\n\n\n Pre-Production:\n The Director needs time to analyze the script.\n They will work on it for 36 hours.\n The Camera Operator will join the director for 24 hours of planning.\n\n\n Production Phase 1\n The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer\n\n\n Production Phase 2\n The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer\n\n\n Post-Production\n The editor will take 64 hours to edit the film.\n The director will work with the editor for 24 hours during this phase.\n </context>\n","Okay, here's a production schedule for the film, starting February 3rd, 2025, with weekends off:\n\n**Film Production Schedule (Starting Feb 3, 2025)**\n\n**Key:**\n\n* SC = Screenwriter\n* D = Director\n* CO = Camera Operator\n* SE = Sound Engineer\n* E = Editor\n* A = Actor(s)\n\n**Phase 1: Writing (Screenwriter - 72 hours)**\n\n* **Week 1:**\n * Mon, Feb 3: SC - 8 hours\n * Tue, Feb 4: SC - 8 hours\n * Wed, Feb 5: SC - 8 hours\n * Thu, Feb 6: SC - 8 hours\n * Fri, Feb 7: SC - 8 hours\n* **Week 2:**\n * Mon, Feb 10: SC - 8 hours\n * Tue, Feb 11: SC - 8 hours\n * Wed, Feb 12: SC - 8 hours\n * Thu, Feb 13: SC - 8 hours\n\n**Phase 2: Pre-Production (Director - 36 hours, Camera Operator - 24 hours)**\n\n* **Week 2:**\n * Fri, Feb 14: D - 8 hours\n* **Week 3:**\n * Mon, Feb 17: D - 8 hours, CO - 8 hours\n * Tue, Feb 18: D - 8 hours, CO - 8 hours\n * Wed, Feb 19: D - 8 hours, CO - 8 hours\n * Thu, Feb 20: D - 4 hours\n\n**Phase 3: Production Phase 1 (Director, 4 Actors, Camera Operator, Sound Engineer)**\n\n* **Week 3:**\n * Fri, Feb 21: D, 4A, CO, SE - 8 hours\n* **Week 4:**\n * Mon, Feb 24: D, 4A, CO, SE - 8 hours\n * Tue, Feb 25: D, 4A, CO, SE - 8 hours\n\n**Phase 4: Production Phase 2 (Director, 8 Actors, Camera Operator, Sound Engineer)**\n\n* **Week 4:**\n * Wed, Feb 26: D, 8A, CO, SE - 8 hours\n * Thu, Feb 27: D, 8A, CO, SE - 8 hours\n * Fri, Feb 28: D, 8A, CO, SE - 8 hours\n\n**Phase 5: Post-Production (Editor - 64 hours, Director - 24 hours)**\n\n* **Week 5:**\n * Mon, Mar 3: E - 8 hours\n * Tue, Mar 4: E - 8 hours\n * Wed, Mar 5: E - 8 hours\n * Thu, Mar 6: E - 8 hours\n * Fri, Mar 7: E - 8 hours\n* **Week 6:**\n * Mon, Mar 10: E - 8 hours\n * Tue, Mar 11: E - 8 hours\n * Wed, Mar 12: E - 8 hours\n * Thu, Mar 13: D, E - 8 hours\n * Fri, Mar 14: D, E - 8 hours\n* **Week 7:**\n * Mon, Mar 17: D, E - 8 hours\n\n**Notes:**\n\n* This schedule assumes 8-hour workdays.\n* This schedule is a template and can be adjusted based on the specific needs of the production.\n* Travel time for cast and crew is not factored in.\n* Buffer days are not included. Consider adding buffer days between phases to account for potential delays.\n* The number of actors needed each day is specified (4A, 8A).\n","Of course. Here is the text schedule for all phases of the film, starting on February 3, 2025.\n\n***\n\n### **Film Production Schedule**\n\n**Start Date:** Monday, February 3, 2025\n\n**General Notes:**\n* This schedule is based on a standard 8-hour workday.\n* The entire crew is off on Saturdays and Sundays.\n* All phases are scheduled sequentially.\n\n---\n\n### **Phase 1: Writing**\nThe Screenwriter will write the script.\n\n* **Personnel:** Screenwriter\n* **Duration:** 72 hours (9 workdays)\n* **Schedule:** Monday, Feb 3, 2025 – Thursday, Feb 13, 2025\n\n---\n\n### **Phase 2: Pre-Production**\nThe Director and Camera Operator will analyze the script and plan the shoot. The Director's time is the longest, determining the phase length.\n\n* **Personnel:** Director, Camera Operator\n* **Duration:** 36 hours (4.5 workdays)\n* **Schedule:** Friday, Feb 14, 2025 – Thursday, Feb 20, 2025\n * **Director (36 hours):** Works all 4.5 days of this phase.\n * **Camera Operator (24 hours):** Joins the Director for the final 3 days (Tuesday, Feb 18 – Thursday, Feb 20).\n\n---\n\n### **Phase 3: Production Phase 1**\nThe first block of filming.\n\n* **Personnel:** Director, 4 Actors, Camera Operator, Sound Engineer\n* **Duration:** 3 workdays\n* **Schedule:** Friday, Feb 21, 2025 – Tuesday, Feb 25, 2025\n * *Note: Skips the weekend of Feb 22-23.*\n\n---\n\n### **Phase 4: Production Phase 2**\nThe second block of filming.\n\n* **Personnel:** Director, 8 Actors, Camera Operator, Sound Engineer\n* **Duration:** 3 workdays\n* **Schedule:** Wednesday, Feb 26, 2025 – Friday, Feb 28, 2025\n\n---\n\n### **Phase 5: Post-Production**\nThe Editor assembles the film, with the Director joining for key review sessions. The Editor's time is the longest, determining the phase length.\n\n* **Personnel:** Editor, Director\n* **Duration:** 64 hours (8 workdays)\n* **Schedule:** Monday, Mar 3, 2025 – Wednesday, Mar 12, 2025\n * **Editor (64 hours):** Works all 8 days of this phase.\n * **Director (24 hours):** Joins the Editor for the final 3 days for review (Monday, Mar 10 – Wednesday, Mar 12).\n\n---\n**Project Conclusion:** Wednesday, March 12, 2025","Model A is superior as it accurately computes the end dates for each phase, adhering to the constraints of only scheduling work on weekdays, which Model B fails to do.",CANDIDATE


In [17]:
display(eval_result.metrics_table[['pairwise_question_answering_quality/pairwise_choice']])

Unnamed: 0,pairwise_question_answering_quality/pairwise_choice
0,TIE
1,TIE
2,CANDIDATE


In [18]:
display(eval_result.metrics_table[['pairwise_question_answering_quality/explanation']])

Unnamed: 0,pairwise_question_answering_quality/explanation
0,"Both responses followed the prompt and produced the exact same information. They are both equally well, and there is no reason to prefer one over the other."
1,"Both responses accurately calculate the durations for each phase and the total project duration based on the provided context, demonstrating a clear understanding of the instructions and constraints."
2,"Model A is superior as it accurately computes the end dates for each phase, adhering to the constraints of only scheduling work on weekdays, which Model B fails to do."


In [19]:
print(summary_table)
print(metrics_table)

{'row_count': 3, 'pairwise_question_answering_quality/candidate_model_win_rate': np.float64(0.3333333333333333), 'pairwise_question_answering_quality/baseline_model_win_rate': np.float64(0.0)}
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       