# Lab: Evaluate Gen AI Model and Agent Performance: Challenge Lab (GENAI098)

Link: https://partner.cloudskillsboost.google/paths/2311/course_templates/1293/labs/531997

## Your Challenge
Cymbal would like to use the Gen AI evaluation service in the following scenarios:
 - Evaluate a model's performance in response to prompts
 - Evaluate and compare two model's performance to each other for model selection
 - Evaluate the performance of agents

Your challenge is to provide the initial examples of how this can be achieved.

## Task 1. Initialize Vertex AI in a Colab Enterprise notebook

In [1]:
!pip install --upgrade google-cloud-aiplatform google-cloud-logging --quiet
!pip install "google-cloud-aiplatform[evaluation]" --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/7.7 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/7.7 MB[0m [31m99.2 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m7.7/7.7 MB[0m [31m122.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m80.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/229.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m229.5/229.5 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.8/65.8 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m118.6/118.6 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━

In [1]:
# 1) Restart the runtime (!)
# 2) Import required libs

import pandas as pd
import logging
import google.cloud.logging
from IPython.display import display, Markdown

import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig
from vertexai.evaluation import (
    MetricPromptTemplateExamples,
    EvalTask,
    PairwiseMetric,
    PointwiseMetric,
)

# Do not remove logging section
client = google.cloud.logging.Client()
client.setup_logging()

pd.set_option("display.max_colwidth", None)

In [2]:
# initialize Vertex AI
PROJECT_ID = "qwiklabs-gcp-03-49581a752317"
LOCATION = "us-west1"

import vertexai

# Initialize vertexai
vertexai.init()

# Do not remove logging section
log_message = f"Vertex AI initialize: {vertexai}"
logging.info(log_message)

INFO:root:Vertex AI initialize: <module 'vertexai' from '/usr/local/lib/python3.11/dist-packages/vertexai/__init__.py'>


## Task 2. Explore example data and make a simple summarization request to a Gen AI model

In this task, you will set up some sample product description data for a large language model to summarize and then use a prompt to send a simple summarization request to the model. This configuration will then be used for evaluating the models performance in providing summarizations.

1. The prompt you will use for summarization is shown below. Run the following code in a new cell to set the prompt_template.

In [3]:
prompt_template="# System_prompt\n{system_prompt} # Question\n{question} # Description {description}"

2. Next, define the system prompt and question for the prompt template and some sample product data to be summarised.

In [4]:
system_prompt=["You are an retail domestic merchandise expert"]

question=["Provide a one sentence summary of the following text"]

description=[
  "Men’s Blue Dress Shorts Elevate your warm-weather wardrobe with these tailored men's blue dress shorts — where polished style meets everyday comfort. Designed with a modern slim fit and cut from a lightweight, breathable cotton-blend fabric, these shorts offer a refined silhouette perfect for smart-casual settings. Featuring a flat front, belt loops, and discreet side and back pockets, they pair effortlessly with a crisp button-down or a relaxed polo. Whether you're headed to a summer wedding, rooftop brunch, or casual Friday at the office, these shorts strike the perfect balance between laid-back and sophisticated",

  "Summer Floral Dress. Breathe life into your summer wardrobe with this effortlessly elegant floral midi dress. Crafted from lightweight, breathable fabric, this dress is designed to keep you cool and confident from sunny brunches to sunset strolls. The vibrant floral print pops against soft pastels, while the flattering silhouette — complete with a cinched waist and flowing A-line skirt — moves beautifully with every step. Featuring delicate spaghetti straps, a sweetheart neckline, and subtle ruffle detailing, this dress is the perfect balance of feminine charm and relaxed ease. Style it with strappy sandals for a daytime look or elevate it with wedges and a clutch for those golden-hour moments.",

  "Outdoor Garden Furniture Transform your backyard into a personal oasis with this elegant garden furniture set designed for comfort, durability, and timeless style. Whether you're hosting a summer soirée, enjoying a quiet morning coffee, or stretching out under the stars, this set offers the perfect blend of relaxation and sophistication. Crafted with weather-resistant materials and plush, all-season cushions, it’s built to withstand the elements while keeping you cozy. The modern, neutral design complements any garden, patio, or balcony — making it a seamless fit for both small spaces and open-air retreats.",

  "OLED 4K Ultra HD Smart TV. Step into the future of home entertainment with breathtaking clarity, vibrant color, and cinematic sound. This OLED 4K Ultra HD Smart TV transforms your living room into a private theater — where every scene comes alive with lifelike detail and stunning contrast. With self-lit pixels that turn on and off individually, OLED delivers perfect blacks and infinite contrast, revealing depth and dimension that LED TVs simply can’t match. Whether you're streaming your favorite show, gaming at high frame rates, or watching the big game in dazzling clarity, every moment is immersive. Powered by a next-gen AI processor, it automatically adjusts picture and sound based on your environment and content, so you always get the optimal viewing experience. And with an ultra-slim bezel and sleek modern design, it’s not just a TV — it’s a centerpiece.",

  "Smartwash Dishwasher. Let your kitchen work for you. Say goodbye to scrubbing and soaking — the SmartWash Dishwasher delivers a powerful, whisper-quiet clean that saves you time, energy, and water. Whether it’s post-dinner chaos or a pile of party plates, this dishwasher handles it all with precision and polish. With advanced spray technology, high-temperature sanitization, and intuitive smart cycles, every dish comes out sparkling — from delicate wine glasses to stubborn, baked-on pots. The sleek stainless steel finish adds a modern touch to any kitchen, while the adjustable racks and spacious interior let you load more in fewer cycles. Plus, with smart connectivity, you can monitor and control your wash right from your phone — because clean dishes should never slow you down. "
]


3. Instantiate a flash_model variable with a GenerativeModel that uses Gemini version gemini-2.0-flash.
4. Add a generation configuration to the model to set the temperature to 0.

In [12]:
flash_model = GenerativeModel(
  "gemini-2.0-flash",
  generation_config={
      "temperature": 0,
  },
)

5. Using the prompt template and data, generate a response to the second product description (description[1]) from the gemini-2.0-flash model. Use the Markdown() class imported from IPython.display to wrap the response text to render Gemini's responses, which are often formatted as Markdown strings.

In [14]:
# prompt = "Write a one paragraph apartment listing to promote this apartment. Make it sound amazing: "
# View the response using Markdown to format it nicely for notebook viewing
# Markdown(model.generate_content(prompt + str(apartment_records[0])).text)

llm_response = flash_model.generate_content(
  # Fill in the appropriate configuration
  flash_model.generate_content(prompt_template + str(system_prompt[0])).text
)

Markdown(llm_response.text)

# Do not remove logging section
log_message = f"Markdown output: {llm_response.text}"
logging.info(log_message)

INFO:root:Markdown output: Okay, great! Let's start with something practical.

I'm a small boutique owner looking to refresh my home decor section for the upcoming fall season. I want to move beyond the typical pumpkins and leaves and offer something a bit more sophisticated and on-trend.

**What are 3-5 specific home decor trends for Fall 2024 that are *not* the usual fall tropes, and what types of products would I need to stock to capitalize on those trends?**



In [11]:
# Generate the prompt for the second product description
generated_prompt = prompt_template.format(
    system_prompt=system_prompt[0],
    question=question[0],
    description=description[1]
)

# Generate response from the Gemini 2.0 Flash model
llm_response = flash_model.generate_content(
    generated_prompt
)

# Display the response using Markdown
display(Markdown(llm_response.text))

# Do not remove logging section
log_message = f"Markdown output: {llm_response.text}"
logging.info(log_message)

This summer floral midi dress features a vibrant floral print, a flattering silhouette with a cinched waist and A-line skirt, and delicate details like spaghetti straps and ruffle accents, perfect for both casual and dressy occasions.


INFO:root:Markdown output: This summer floral midi dress features a vibrant floral print, a flattering silhouette with a cinched waist and A-line skirt, and delicate details like spaghetti straps and ruffle accents, perfect for both casual and dressy occasions.



In [17]:
import pandas as pd
import logging
from IPython.display import display, Markdown

import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig

# Initialize Vertex AI
# Make sure PROJECT_ID and LOCATION are defined if not already in your environment
# For qwiklabs, these are often set automatically or provided.
# You might need to replace "qwiklabs-gcp-03-49581a752317" with your actual Project ID
# and "us-west1" with your actual Location if running outside a specific Qwiklabs environment.
PROJECT_ID = "qwiklabs-gcp-03-49581a752317"
LOCATION = "us-west1"

vertexai.init(project=PROJECT_ID, location=LOCATION)

# Set the prompt_template
prompt_template = "# System_prompt\n{system_prompt} # Question\n{question} # Description {description}"

# Define the system prompt, question, and sample product data
system_prompt = ["You are an retail domestic merchandise expert"]

question = ["Provide a one sentence summary of the following text"]

description = [
  "Men’s Blue Dress Shorts Elevate your warm-weather wardrobe with these tailored men's blue dress shorts — where polished style meets everyday comfort. Designed with a modern slim fit and cut from a lightweight, breathable cotton-blend fabric, these shorts offer a refined silhouette perfect for smart-casual settings. Featuring a flat front, belt loops, and discreet side and back pockets, they pair effortlessly with a crisp button-down or a relaxed polo. Whether you're headed to a summer wedding, rooftop brunch, or casual Friday at the office, these shorts strike the perfect balance between laid-back and sophisticated",

  "Summer Floral Dress. Breathe life into your summer wardrobe with this effortlessly elegant floral midi dress. Crafted from lightweight, breathable fabric, this dress is designed to keep you cool and confident from sunny brunches to sunset strolls. The vibrant floral print pops against soft pastels, while the flattering silhouette — complete with a cinched waist and flowing A-line skirt — moves beautifully with every step. Featuring delicate spaghetti straps, a sweetheart neckline, and subtle ruffle detailing, this dress is the perfect balance of feminine charm and relaxed ease. Style it with strappy sandals for a daytime look or elevate it with wedges and a clutch for those golden-hour moments.",

  "Outdoor Garden Furniture Transform your backyard into a personal oasis with this elegant garden furniture set designed for comfort, durability, and timeless style. Whether you're hosting a summer soirée, enjoying a quiet morning coffee, or stretching out under the stars, this set offers the perfect blend of relaxation and sophistication. Crafted with weather-resistant materials and plush, all-season cushions, it’s built to withstand the elements while keeping you cozy. The modern, neutral design complements any garden, patio, or balcony — making it a seamless fit for both small spaces and open-air retreats.",

  "OLED 4K Ultra HD Smart TV. Step into the future of home entertainment with breathtaking clarity, vibrant color, and cinematic sound. This OLED 4K Ultra HD Smart TV transforms your living room into a private theater — where every scene comes alive with lifelike detail and stunning contrast. With self-lit pixels that turn on and off individually, OLED delivers perfect blacks and infinite contrast, revealing depth and dimension that LED TVs simply can’t match. Whether you're streaming your favorite show, gaming at high frame rates, or watching the big game in dazzling clarity, every moment is immersive. Powered by a next-gen AI processor, it automatically adjusts picture and sound based on your environment and content, so you always get the optimal viewing experience. And with an ultra-slim bezel and sleek modern design, it’s not just a TV — it’s a centerpiece.",

  "Smartwash Dishwasher. Let your kitchen work for you. Say goodbye to scrubbing and soaking — the SmartWash Dishwasher delivers a powerful, whisper-quiet clean that saves you time, energy, and water. Whether it’s post-dinner chaos or a pile of party plates, this dishwasher handles it all with precision and polish. With advanced spray technology, high-temperature sanitization, and intuitive smart cycles, every dish comes out sparkling — from delicate wine glasses to stubborn, baked-on pots. The sleek stainless steel finish adds a modern touch to any kitchen, while the adjustable racks and spacious interior let you load more in fewer cycles. Plus, with smart connectivity, you can monitor and control your wash right from your phone — because clean dishes should never slow you down. "
]

# Instantiate a flash_model variable with a GenerativeModel that uses Gemini version gemini-2.0-flash.
# Add a generation configuration to the model to set the temperature to 0.
flash_model = GenerativeModel(
    "gemini-2.0-flash",
    generation_config={
        "temperature": 0.0, # Temperature set to 0 as requested
        "top_p": 0.4, # Keeping top_p from your previous code, as it's a common configuration
    },
)

# Using the prompt template and data, generate a response to the second product description (description[0])
# from the gemini-2.0-flash model.
# Use the Markdown() class imported from IPython.display to wrap the response text to render Gemini's responses.

# Generate the full prompt for the second product description
generated_prompt = prompt_template.format(
    system_prompt=system_prompt[0],
    question=question[0],
    description=description[0]
)

# Generate response from the Gemini 2.0 Flash model
llm_response = flash_model.generate_content(
    generated_prompt
)

# Display the response using Markdown
print("--- Model's Summary for Description 2 ---")
display(Markdown(llm_response.text))

# Do not remove logging section
log_message = f"Markdown output: {llm_response.text}"
logging.info(log_message)

--- Model's Summary for Description 2 ---


These men's blue dress shorts offer a refined, comfortable style suitable for various smart-casual occasions with their slim fit and breathable cotton-blend fabric.


INFO:root:Markdown output: These men's blue dress shorts offer a refined, comfortable style suitable for various smart-casual occasions with their slim fit and breathable cotton-blend fabric.



In [19]:
llm_response = Markdown(flash_model.generate_content(system_prompt[0] + str(description[0])))

# Do not remove logging section
log_message = f"Markdown output: {llm_response.text}"
logging.info(log_message)


TypeError: Markdown expects text, not candidates {
  content {
    role: "model"
    parts {
      text: "Okay, I can definitely work with that description and expand on it from a retail domestic merchandise expert\'s perspective. Here\'s a breakdown of what I see, and how I\'d approach selling these shorts:\n\n**Strengths of the Current Description:**\n\n*   **Highlights Key Features:** Slim fit, lightweight fabric, flat front, pockets, belt loops.\n*   **Focuses on Versatility:** Suitable for various occasions (wedding, brunch, office).\n*   **Emphasizes Comfort and Style:** Balances \"polished style\" with \"everyday comfort.\"\n*   **Clear Target Audience:** Men looking for a refined, smart-casual look.\n\n**Areas for Improvement & Expansion (From a Retail Expert\'s View):**\n\n*   **Fabric Specifics:** Go beyond \"cotton-blend.\" What *kind* of cotton blend? (e.g., \"premium cotton-linen blend,\" \"stretch cotton twill,\" \"performance cotton-poly blend\"). This impacts feel, drape, and care.\n*   **Construction Details:** Mention things like:\n    *   **Seam Quality:** \"Durable, reinforced seams for lasting wear.\"\n    *   **Lining:** \"Partially lined for added comfort and structure.\" (If applicable)\n    *   **Closure:** \"Secure button and zip fly closure.\"\n    *   **Pocket Depth:** \"Deep front pockets to securely hold essentials.\"\n*   **Color Nuance:** \"Blue\" is broad. Is it navy, royal blue, sky blue, chambray? Be specific.\n*   **Sizing & Fit Details:**\n    *   \"Available in sizes [range].\"\n    *   \"Slim fit through the seat and thigh, with a tailored leg opening.\"\n    *   \"Sits comfortably at the natural waist.\"\n*   **Care Instructions:** \"Machine washable for easy care.\" (Or specific instructions if needed)\n*   **Styling Suggestions (More Specific):**\n    *   \"Pair with a linen shirt and loafers for a relaxed summer look.\"\n    *   \"Dress them up with a blazer and dress shoes for a more formal occasion.\"\n    *   \"Try them with a striped t-shirt and sneakers for a casual weekend vibe.\"\n*   **Call to Action:** \"Shop now and elevate your summer style!\" or \"Add these versatile shorts to your wardrobe today!\"\n*   **Consider adding lifestyle imagery:** Show the shorts being worn in different settings to inspire customers.\n\n**Revised Description (Example - Incorporating Improvements):**\n\n\"**Men\'s Navy Blue Stretch Cotton Twill Dress Shorts**\n\nElevate your warm-weather wardrobe with these tailored men\'s navy blue dress shorts — where polished style meets everyday comfort. Designed with a modern slim fit and cut from a lightweight, breathable stretch cotton twill fabric (97% Cotton, 3% Spandex), these shorts offer a refined silhouette perfect for smart-casual settings. The added spandex provides a comfortable range of motion.\n\nFeaturing a flat front, belt loops, and discreet side and back pockets, they pair effortlessly with a crisp button-down or a relaxed polo. Durable, reinforced seams ensure lasting wear. The shorts feature a secure button and zip fly closure and deep front pockets to securely hold essentials.\n\nAvailable in sizes 30-40. Slim fit through the seat and thigh, with a tailored leg opening. Sits comfortably at the natural waist. Machine washable for easy care.\n\n**Style Tips:**\n\n*   Pair with a light blue linen shirt and loafers for a relaxed summer look.\n*   Dress them up with a navy blazer, white dress shirt, and brown leather dress shoes for a more formal occasion.\n*   Try them with a striped t-shirt and white sneakers for a casual weekend vibe.\n\nWhether you\'re headed to a summer wedding, rooftop brunch, or casual Friday at the office, these shorts strike the perfect balance between laid-back and sophisticated.\n\n**Shop now and elevate your summer style!**\"\n\n**Why These Changes Matter (From a Retail Perspective):**\n\n*   **Increased Credibility:** Specific details build trust with the customer.\n*   **Reduced Returns:** Accurate sizing and fit information minimizes the chance of returns due to poor fit.\n*   **Improved SEO:** Specific keywords (e.g., \"stretch cotton twill,\" \"navy blue\") help the product rank higher in search results.\n*   **Enhanced Customer Experience:** Clear styling suggestions help customers visualize how to wear the shorts and create outfits.\n*   **Higher Conversion Rates:** A compelling description with a clear call to action encourages customers to make a purchase.\n\nBy focusing on these details, you transform a basic product description into a compelling sales tool that drives conversions and builds customer loyalty. Remember to always tailor the description to your specific target audience and brand voice.\n"
    }
  }
  finish_reason: STOP
  avg_logprobs: -0.16275280761718749
}
usage_metadata {
  prompt_token_count: 129
  candidates_token_count: 1000
  total_token_count: 1129
  prompt_tokens_details {
    modality: TEXT
    token_count: 129
  }
  candidates_tokens_details {
    modality: TEXT
    token_count: 1000
  }
}
model_version: "gemini-2.0-flash"
create_time {
  seconds: 1750715188
  nanos: 431367000
}
response_id: "NMtZaIeqGtWIi9YPytnoqQ8"


In [21]:
str(llm_response)

'<IPython.core.display.Markdown object>'