<a href="https://colab.research.google.com/github/postak/colazione-con-adk/blob/main/2025_09_Partners_ADK_Learning_session_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

```
Copyright 2025 Google LLC.
SPDX-License-Identifier: Apache-2.0
```

In [None]:
#@title Last Session
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Advanced Agent Development: From Deployment to Tracing on Vertex AI

Welcome! 🚀

This notebook is your comprehensive guide to the full lifecycle of building, deploying, and tracing sophisticated AI agents using the Google Agent Development Kit (ADK) and Vertex AI.

**Our journey will follow a specific, educational flow:**
1.  **Agent #1 (Standard Deployment)**: We'll create a multi-agent trip planner with **custom Python tools** and deploy it *without* tracing to see a baseline deployment.
2.  **Understanding Sessions**: We'll query our first agent with and without sessions to clearly demonstrate the difference between stateless and stateful (short-term memory) conversations.
3.  **Agent #2 (Memory & Tracing)**: We'll build a second agent with a long-term **Memory Bank** and deploy it *with* **tracing** enabled.
4.  **Comparing Tracing**: We'll see the practical difference in observability between an agent deployed with and without tracing.
5.  **Comprehensive Evaluation**: Finally, we'll run a rigorous evaluation on our first agent to measure its performance, tool selection, and even the accuracy of the arguments passed to its tools.


Let's get started!


-------------
### 🎁 🛑 Important Prerequisite: Setup Your Environment! 🛑 🎁
-----------------------------------------------------------------------------

You will need a **Google AI API Key** to run this notebook.

👉 Follow the instructions [here](https://github.com/postak/colazione-con-adk/blob/main/Setting%20Up%20Your%20GCP%20Project%20%26%20Gemini%20API%20Key.pdf)

 -----------------------------------------------------------------------------

---

## 1. Setup and Configuration

First, we'll install the necessary packages and configure our Google Cloud environment.

### Install Libraries
This command installs the Google Cloud AI Platform SDK, including support for the Agent Development Kit (`adk`), evaluation, and Agent Engines.

In [None]:
# --- 1. Installation ---
%pip install --upgrade --quiet 'google-cloud-aiplatform[evaluation,adk,agent_engines]'
%pip install --upgrade --quiet 'google-api-python-client'
%pip install --upgrade --quiet 'google-adk==1.15.1'
%pip install --upgrade --quiet 'google-cloud-aiplatform==1.117.0'

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m880.9 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.0/8.0 MB[0m [31m55.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.0/9.0 MB[0m [31m43.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.7/119.7 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m272.3/272.3 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m753.1/753.1 kB[0m [31m27.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.3/14.3 MB[0m [31m104.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Import Libraries
Now, let's import all the libraries we'll need for this session.

In [None]:
# --- 2. Import all necessary libraries ---
import os
import sys
import json
import asyncio
import nest_asyncio
import random
import string
from uuid import uuid4
from typing import Any, List

import pandas as pd
import plotly.graph_objects as go
import vertexai
from google.colab import auth
from IPython.display import HTML, Markdown, display

# --- ADK, Agent, and Evaluation Components ---
from google.adk.agents import Agent
from google.adk.events import Event
from google.adk.runners import Runner
import google.adk as adk
from google.adk.memory import VertexAiMemoryBankService
from google.adk.sessions import VertexAiSessionService, InMemorySessionService
from google.genai import types
from vertexai import agent_engines
from vertexai.preview import reasoning_engines
from vertexai.preview.evaluation import EvalTask

print("✅ All libraries are ready to go!")

✅ All libraries are ready to go!


### Authenticate and Configure Your Project
To use Vertex AI, you need an active Google Cloud project. This section handles authenticating your environment and setting up the necessary project configurations.

In [None]:
# --- 3. Authentication & Project Configuration ---

# Authenticate user in Colab
if "google.colab" in sys.modules:
    auth.authenticate_user()
    print("✅ Authenticated successfully.")

# @title Set Your Google Cloud Project Details
PROJECT_ID = "gen-ai-392513"  # @param {type:"string"}
LOCATION = "us-central1"          # @param {type:"string"}
STAGING_BUCKET = f"gs://{PROJECT_ID}-adk-staging" # A unique name for your bucket

# Set environment variables for the ADK and gcloud
os.environ["GOOGLE_CLOUD_PROJECT"] = PROJECT_ID
os.environ["GOOGLE_CLOUD_LOCATION"] = LOCATION
os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "True"

# Create the staging bucket if it doesn't exist
!gcloud storage buckets create {STAGING_BUCKET} --project={PROJECT_ID} --location={LOCATION} -l {LOCATION} 2>/dev/null

# Initialize the Vertex AI SDK
# We'll also create an "Experiment" to group our evaluation runs
EVAL_EXPERIMENT_NAME = f"trip-planner-evaluation-{uuid4().hex[:6]}"
vertexai.init(project=PROJECT_ID, location=LOCATION, staging_bucket=STAGING_BUCKET, experiment=EVAL_EXPERIMENT_NAME)

print(f"🔬 Evaluation results will be saved to experiment: {EVAL_EXPERIMENT_NAME}")
print(f"\n✅ Vertex AI configured for project '{PROJECT_ID}' in '{LOCATION}'.")

✅ Authenticated successfully.


🔬 Evaluation results will be saved to experiment: trip-planner-evaluation-b2337b

✅ Vertex AI configured for project 'gen-ai-392513' in 'us-central1'.


---

## 1. Agent #1: The Trip Planner (Standard Deployment)

We'll build our first agent, a "team" of specialists. Crucially, we will now provide them with our own **custom Python functions as tools** instead of a generic search tool. This makes their behavior deterministic and much easier to evaluate accurately.

### Define Custom Tools

These functions act as our mock APIs or databases. The agent will learn to call these with the correct parameters based on the user's request.

In [None]:
# --- Custom Tool Definitions ---

def get_travel_attractions(location: str, interest: str = "any"):
    """
    Finds travel attractions in a given location based on an interest.

    Args:
        location: The city or place to search for attractions (e.g., "Paris", "Rome").
        interest: The type of attraction to look for (e.g., "museum", "historical site", "park").
    """
    # This is a mock database for demonstration
    attractions = {
        "paris": {"museum": "The Louvre", "historical site": "Eiffel Tower"},
        "rome": {"historical site": "The Colosseum", "museum": "Vatican Museums"},
        "tokyo": {"museum": "teamLab Borderless", "park": "Shinjuku Gyoen"},
        "san francisco": {"historical site": "Alcatraz Island", "park": "Golden Gate Park"}
    }

    city = attractions.get(location.lower(), {})
    if interest == "any":
        return city or f"Sorry, I don't have any information for {location}."
    else:
        return city.get(interest.lower(), f"Sorry, I couldn't find a {interest} in {location}.")

def get_restaurant_recommendations(location: str, cuisine: str):
    """
    Recommends a restaurant in a location based on the desired cuisine.

    Args:
        location: The city where the user wants to eat (e.g., "Paris", "Rome").
        cuisine: The type of food they want (e.g., "pasta", "seafood", "cafe").
    """
    # This is a mock database for demonstration
    restaurants = {
        "rome": {"pasta": "Trattoria Da Enzo al 29"},
        "san francisco": {"seafood": "Hog Island Oyster Co."},
        "paris": {"cafe": "Les Deux Magots", "fancy": "Le Cinq"}
    }

    city = restaurants.get(location.lower(), {})
    return city.get(cuisine.lower(), f"Sorry, I couldn't find a {cuisine} restaurant in {location}.")

### Define and Deploy the Agent Team (Without Tracing)
Now, we'll update our specialist agents to use these new, more specific tools.

In [None]:
# --- Specialist Agent Definitions using Custom Tools ---
def create_activities_agent():
    return Agent(
        name="activities_agent", model="gemini-2.5-pro",
        description="Specialist for finding interesting activities, attractions, and historical sites.",
        instruction="",
        tools=[get_travel_attractions],
    )

def create_food_agent():
    return Agent(
        name="food_agent", model="gemini-2.5-pro",
        description="Specialist for finding great restaurants based on cuisine.",
        instruction="",
        tools=[get_restaurant_recommendations],
    )

# --- Coordinator Agent Definition ---
def create_trip_coordinator_agent(sub_agents: List[Agent]):
    return Agent(
        name="trip_coordinator_agent", model="gemini-2.5-pro",
        description="The master coordinator for trip planning. It delegates tasks to specialist agents.",
        sub_agents=sub_agents,
        instruction="""
        You are the master "Trip Coordinator" 🧭. Your job is to plan a full itinerary by delegating to your specialist agents.
        Extract the location, interests, and cuisine types from the user's prompt to use as arguments for your specialist agents.
        Once you have the information, combine the it  into a single, cohesive itinerary and present it in a clean MARKDOWN format.
        """,
    )


In [None]:
# --- Assemble and Deploy the Team ---
trip_coordinator_agent = create_trip_coordinator_agent(
    sub_agents=[create_activities_agent(), create_food_agent()]
)
print(f"👑 Agent '{trip_coordinator_agent.name}' is ready for deployment!")

deployment_name = f"trip-planner-basic-{uuid4().hex[:6]}"
print(f"🚀 Deploying agent as '{deployment_name}' without tracing...")

deployed_agent_no_trace = agent_engines.create(
    display_name=deployment_name,
    agent_engine=trip_coordinator_agent, # Pass the agent object directly
    requirements=["google-cloud-aiplatform[agent_engines]==1.117.0", "google-adk==1.15.1"],
    description="A multi-agent trip planning system with custom tools and tracing disabled."
)

print("-" * 50)
print("✅ Agent System deployed successfully!")
display(Markdown(f"**Resource Name:** `{deployed_agent_no_trace.resource_name}`"))

---

## 2. Interacting with Agent #1: Understanding Sessions

Now we'll query our deployed agent. The goal here is to understand the crucial difference between a single, stateless query and a stateful conversation that uses a session for short-term memory.

### Helper Function for Clean Interaction
To avoid repeating code, we'll create one reusable helper function.

In [None]:
async def stream_and_print_response(agent, message: str, user_id, session) -> str:

  events = []
  async for event in agent.async_stream_query(
      user_id=user_id,
      session_id = session["id"],
      message=message,
  ):
    events.append(event)

  print("--- Full Event Stream ---")
  for event in events:
      print(event)

  # For quick tests, you can extract just the final text response
  final_text_responses = [
      e for e in events
      if e.get("content", {}).get("parts", [{}])[0].get("text")
      and e.get("finish_reason", {})
      # and not e.get("content", {}).get("parts", [{}])[0].get("function_call")
  ]
  if final_text_responses:
      print("\n--- Final Response ---")
      output = ""
      for final_text_response in final_text_responses:
        print(final_text_response["content"]["parts"][0]["text"])
        output += final_text_response["content"]["parts"][0]["text"]
      return (output)
  print ("No response from agent")


In [None]:
# !!!!! To speed up, let use an agent already deployed
# deployed_agent_no_trace = agent_engines.get("tbd")


### Querying With a Session
we'll show the behaviour of the session to enable short-term memory for a single, continuous conversation.

In [None]:
print("\n\n🗣️ Querying WITH a session (Stateful Conversation)")

user_id = f"session-demo-user-{uuid4().hex[:6]}"

session = await deployed_agent_no_trace.async_create_session(user_id=user_id)
print (session)

# --- First Query ---
query_1 = "My name is Annie. I want to plan a trip to Rome."
print(f"\n💬 User: {query_1}")
await stream_and_print_response(deployed_agent_no_trace, query_1, user_id, session)

print("\n" + "-" * 50)


# --- Follow-up Query (in the same session) ---
query_2 = "Find a famous historical site there and a good place for pasta."
print(f"\n💬 User: {query_2}")
await stream_and_print_response(deployed_agent_no_trace, query_2, user_id, session)

print("\n" + "-" * 50)

# --- Follow-up Query (in the same session) ---
query_3 = "What is my name?"
print(f"\n💬 User: {query_3}")
await stream_and_print_response(deployed_agent_no_trace, query_3,user_id, session)

print("\n" + "-" * 50)

# --- Follow-up Query (in the different session) ---
session2 = await deployed_agent_no_trace.async_create_session(user_id=user_id)
print (session2)
print(f"\n💬 User: {query_3} Now in different session")
await stream_and_print_response(deployed_agent_no_trace, query_3,user_id, session2)


print("\n" + "-" * 50)
# print("Success! The agent remembered the location ('Rome') from the first message to correctly answer the second.")

---

## 3. Agent #2: Adding Long-Term Memory & Tracing
Session memory is temporary. For an agent to remember preferences across *different conversations*, it needs a **Memory Bank**.

Let's build a new agent designed for this. This time, we **will enable tracing** so we can see its inner workings.

### Define and Deploy the Memory Agent (With Tracing)

In [None]:
from google.genai.types import Content, Part


def create_memory_trip_agent():
    """Creates a trip planner that uses a long-term memory bank."""
    return Agent(
        name="memory_trip_agent", model="gemini-2.5-flash",
        instruction="""You are a helpful trip planning assistant with a perfect memory.
        Your Core Instructions:
        1. At the start of every conversation, use your memory to recall any user preferences.
        2. When the user tells you a new preference (e.g., "I'm a vegetarian"), save this fact to your memory.
        3. When planning, always incorporate the user's remembered preferences.
        """,
        tools=[adk.tools.preload_memory_tool.PreloadMemoryTool()]
    )

memory_agent = create_memory_trip_agent()
print("🧠 Agent with long-term memory is defined.")

In [None]:
# --- 9. Configure and Run the Local Agent with Memory Bank ---
async def run_memory_test():
    # We will attach our new memory bank to the Agent Engine we already deployed.
    agent_engine_id = deployed_agent_no_trace.name
    app_name = memory_agent.name # Use the agent's name as the app identifier

    print(f"✅ Attaching Memory & Session services to Agent Engine: {agent_engine_id}")

    # Configure services to point to our existing deployment
    memory_bank_service = VertexAiMemoryBankService(
        project=PROJECT_ID, location=LOCATION, agent_engine_id=agent_engine_id
    )
    session_service = VertexAiSessionService(
        project=PROJECT_ID, location=LOCATION, agent_engine_id=agent_engine_id
    )

    # Initialize the local runner once
    runner = Runner(
        agent=memory_agent,
        app_name=app_name,
        session_service=session_service,
        memory_service=memory_bank_service,
    )

    # ---
    # Helper function to run a single turn in a chat
    async def run_turn(query: str, user_id: str, session_id: str):
        print(f"\n🗣️ You: {query}")
        print("🤖 Assistant: ", end="", flush=True) # Print prefix for streaming

        final_response_parts = []
        async for event in runner.run_async(
            user_id=user_id,
            session_id=session_id,
            new_message=Content(parts=[Part(text=query)], role="user")
        ):
            # ⭐️ FIX: Access event data using dot notation for objects.
            if event.content and event.content.role == "model":
                part_text = event.content.parts[0].text
                if part_text:
                    final_response_parts.append(part_text)
                    print(part_text, end="", flush=True) # Print streaming parts on one line

        print() # Add a final newline after the response is complete
        return "".join(final_response_parts)
    # ---

    # == Step 1: The "Teach Me" Conversation ==
    user_id_memory = f"adk-traveler-{uuid4()}"
    print(f"\n--- Starting Session 1 for user '{user_id_memory}' ---")
    session_one = await session_service.create_session(user_id=user_id_memory, app_name=app_name)
    await run_turn("Hi there. When you help me plan trips, please remember that I am a vegetarian.", user_id_memory, session_one.id)

    # ⭐️ IMPORTANT: Explicitly save the completed session to long-term memory
    print("\n💾 Saving session 1 to Memory Bank...")
    completed_session = await session_service.get_session(app_name=app_name, user_id=user_id_memory, session_id=session_one.id)
    await memory_bank_service.add_session_to_memory(completed_session)
    print("✅ Session 1 saved.")

    # == Step 2: The "Recall" Conversation ==
    print(f"\n--- Starting a brand new Session 2 for the SAME user ---")
    session_two = await session_service.create_session(user_id=user_id_memory, app_name=app_name)
    await run_turn("Great. Now, can you plan a weekend dinner trip to Napa Valley?", user_id_memory, session_two.id)
    print("-" * 50)
    print("🎉 Memory test complete!")

# In a notebook cell, run the test with await
await run_memory_test()

### Tracing

In [None]:
# This time, we wrap the agent in AdkApp to enable tracing

from vertexai.agent_engines import AdkApp

app_with_tracing = AdkApp(agent=trip_coordinator_agent, enable_tracing=True)
deployment_name = f"trip-planner-tracing-{uuid4().hex[:6]}"

print(f"🚀 Deploying '{deployment_name}' with tracing enabled...")
deployed_agent = agent_engines.create(
    display_name=deployment_name,
    agent_engine=app_with_tracing,
    requirements=["google-cloud-aiplatform[agent_engines]==1.117.0", "google-adk==1.15.1"],
    description="A trip planning agent with  tracing enabled."
)
print("-" * 50)
print("✅ Agent with Tracing deployed successfully!")
display(Markdown(f"**Resource Name:** `{deployed_agent.resource_name}`"))

---

## 4. Comparing Agents in Cloud Trace
Now, let's see the difference tracing makes.

1.  Navigate to the **[Cloud Trace](https://console.cloud.google.com/traces)** page in your Google Cloud Console.
2.  In the filter or search bar, look for traces related to your Reasoning Engines. You can often filter by the service name, which will contain your deployment name.

You will find detailed, step-by-step execution graphs for **`trip-planner-trace-...`** (Agent #2). However, you will find **no detailed traces** for **`trip-planner-basic-...`** (Agent #1), because we deployed it without enabling tracing. This demonstrates the critical importance of the `AdkApp` wrapper for observability and debugging.

### Query The Remote Agent
We'll have one conversation to teach the agent a preference, and then start a completely **new session** to see if it remembers.

In [None]:
#deployed_agent = agent_engines.get("projects/418653888075/locations/us-central1/reasoningEngines/5308428944762994688")


In [None]:
# agent_client = agent_engines.get(deployed_agent.resource_name)
user_id_memory = f"trace-test-user-{uuid4().hex[:6]}"

# --- Conversation 1: Teach the Agent a Fact ---
print(f"--- 📚 Starting Conversation 1 for user '{user_id_memory}' ---")
query1 = "Find a famous historical site there and a good place for pasta in Rome."
print(f"💬 User: {query1}")
session1 = await deployed_agent.async_create_session(user_id=user_id_memory)
await stream_and_print_response(deployed_agent,query1, user_id_memory, session1)


print("\n" + "-" * 50)
print("✅ Trace test complete.")

---

## 5. Comprehensive Evaluation of Agent #1

Now that we understand how to build and deploy agents, let's circle back to our first agent—the multi-agent trip planner. Because we gave it **custom tools with specific arguments**, we can now test it on a deeper level. We'll check if it:
1.  Chooses the correct sub-agent (`activities_agent` or `food_agent`).
2.  Calls the correct underlying tool (`get_travel_attractions` or `get_restaurant_recommendations`).
3.  Passes the **correct arguments** (`location`, `interest`, `cuisine`) to that tool.



In [None]:
nest_asyncio.apply()

# These two helper function are required because in the last release of agentengine API
# all the invocations are async. The use of nest_asyncio in Colab allows to wrap
# an asyncronous call to a sync function

async def async_stream_query(agent, message: str, user_id) -> str:

  events = []
  async for event in agent.async_stream_query(
      user_id=user_id,
      message=message,
  ):
    events.append(event)
  return events

def sync_stream_query(agent, message: str, user_id) -> str:
    loop = asyncio.get_event_loop()
    return loop.run_until_complete(async_stream_query(agent,message, user_id))

### Helpers and Dataset for a Deeper Evaluation

In [None]:
# --- Helper functions specifically for the evaluation process ---

def parse_multi_agent_stream_for_eval(stream_generator) -> dict:
    """Parses the stream, capturing full tool call details."""
    trajectory = []
    # This loop captures the tool calls made by the agent during its execution
    for chunk in stream_generator:
        try:
            # The tool call info is in the 'function_call' part of a chunk
            for part in chunk.get('content', {}).get('parts', []):
                if 'function_call' in part:
                    tool_call = part['function_call']
                    info = {
                        "tool_name": tool_call.get('name'),
                        "tool_input": dict(tool_call.get('args', {}))
                    }
                    if info not in trajectory: trajectory.append(info)
        except (KeyError, IndexError): pass
    return {"predicted_trajectory": trajectory}

def query_trip_planner_for_eval(prompt: str) -> dict:
    """Queries Agent #1 and formats the output for the evaluation service."""
    # We need to provide a user_id even for evaluation runs
    eval_user_id = f"eval-user-{uuid4().hex[:6]}"
    stream_generator = sync_stream_query(agent=deployed_agent_no_trace,message=prompt, user_id=eval_user_id)
    result = parse_multi_agent_stream_for_eval(stream_generator)
    result["predicted_trajectory"] = json.dumps(result.get("predicted_trajectory", []))
    return result

def display_eval_report(eval_result: pd.DataFrame):
    """Displays a formatted summary of the evaluation results."""
    display(Markdown("### Summary Metrics"))
    display(pd.DataFrame(eval_result.summary_metrics.items(), columns=["metric", "value"]))
    if hasattr(eval_result, "metrics_table"):
        display(Markdown("### Row-wise Metrics (Sample)"))
        display(eval_result.metrics_table.head())

def plot_bar_plot(eval_result: pd.DataFrame, title: str, metrics: list[str] = None):
    """Generates a bar plot for the given evaluation metrics."""
    fig = go.Figure()
    summary_metrics = eval_result.summary_metrics
    if metrics:
        summary_metrics = {k: v for k, v in summary_metrics.items() if any(m in k for m in metrics)}
    fig.add_trace(go.Bar(x=list(summary_metrics.keys()), y=list(summary_metrics.values()), name=title))
    fig.update_layout(barmode="group", title_text=title)
    fig.show()

# --- Create the detailed dataset for evaluation ---
eval_data = {
    "prompt": [
        "Find a historical site in Rome and a place for pasta.",
        "I'm in Tokyo and want to see a museum.",
        "I need a seafood restaurant in San Francisco.",
    ],
    # The ground truth now includes the tool and its exact arguments
    "reference_trajectory": [
        json.dumps([
            {"tool_name": "get_travel_attractions", "tool_input": {"location": "Rome", "interest": "historical site"}},
            {"tool_name": "get_restaurant_recommendations", "tool_input": {"location": "Rome", "cuisine": "pasta"}}
        ]),
        json.dumps([
            {"tool_name": "get_travel_attractions", "tool_input": {"location": "Tokyo", "interest": "museum"}}
        ]),
        json.dumps([
            {"tool_name": "get_restaurant_recommendations", "tool_input": {"location": "San Francisco", "cuisine": "seafood"}}
        ]),
    ],
}
eval_trip_planner_dataset = pd.DataFrame(eval_data)
print("✅ Detailed evaluation dataset created.")

✅ Detailed evaluation dataset created.


### Run the Evaluation
We'll use the `trajectory_exact_match` metric, which is very strict. It will only score a 1.0 if the agent calls the exact tools with the exact arguments in the correct order.

In [None]:
# Define the metrics for our multi-agent system
tool_eval_metrics = [
    "trajectory_exact_match", # Checks for perfect match of tools, args, and order.
    "trajectory_recall",      # Checks what percentage of required calls were made.
]

tool_eval_task = EvalTask(
    dataset=eval_trip_planner_dataset,
    metrics=tool_eval_metrics,
    experiment=EVAL_EXPERIMENT_NAME
)

print("\n🚀 Running detailed tool evaluation... This may take several minutes.")

tool_eval_result = tool_eval_task.evaluate(
    runnable=query_trip_planner_for_eval,
    experiment_run_name=f"detailed-tool-eval-{uuid4().hex[:6]}"
)

print("\n✅ Evaluation complete!")
display_eval_report(tool_eval_result)
plot_bar_plot(
    tool_eval_result,
    title="Agent Tool-Calling Accuracy",
    metrics=["trajectory_exact_match/mean", "trajectory_recall/mean"],
)


🚀 Running detailed tool evaluation... This may take several minutes.


100%|██████████| 3/3 [00:34<00:00, 11.54s/it]
INFO:vertexai.preview.evaluation._evaluation:All 3 responses are successfully generated from the runnable.
INFO:vertexai.preview.evaluation._evaluation:Computing metrics with a total of 6 Vertex Gen AI Evaluation Service API requests.
100%|██████████| 6/6 [00:00<00:00, 10.69it/s]
INFO:vertexai.preview.evaluation._evaluation:All 6 metric requests are successfully computed.
INFO:vertexai.preview.evaluation._evaluation:Evaluation Took:0.5766943380000384 seconds



✅ Evaluation complete!


### Summary Metrics

Unnamed: 0,metric,value
0,row_count,3.0
1,trajectory_exact_match/mean,0.0
2,trajectory_exact_match/std,0.0
3,trajectory_recall/mean,1.0
4,trajectory_recall/std,0.0
5,latency_in_seconds/mean,21.112104
6,latency_in_seconds/std,12.04831
7,failure/mean,0.0
8,failure/std,0.0


### Row-wise Metrics (Sample)

Unnamed: 0,prompt,reference_trajectory,response,latency_in_seconds,failure,predicted_trajectory,trajectory_exact_match/score,trajectory_recall/score
0,Find a historical site in Rome and a place for...,"[{""tool_name"": ""get_travel_attractions"", ""tool...",,34.629127,0,"[{""tool_name"": ""transfer_to_agent"", ""tool_inpu...",0.0,1.0
1,I'm in Tokyo and want to see a museum.,"[{""tool_name"": ""get_travel_attractions"", ""tool...",,11.502393,0,"[{""tool_name"": ""transfer_to_agent"", ""tool_inpu...",0.0,1.0
2,I need a seafood restaurant in San Francisco.,"[{""tool_name"": ""get_restaurant_recommendations...",,17.204792,0,"[{""tool_name"": ""transfer_to_agent"", ""tool_inpu...",0.0,1.0


---

## 6. Cleanup

Finally, let's clean up the resources we've created to avoid incurring unnecessary costs. This will delete the deployed Agent Engines and the Vertex AI Experiment.

In [None]:
# --- Delete the deployed Agent Engines ---
all_deployments = [deployed_agent_no_trace, deployed_agent]
for deployment in all_deployments:
    try:
        deployment.delete()
        print(f"🗑️ Deleted Agent Engine: {deployment.display_name}")
    except Exception as e:
        print(f"Could not delete {deployment.display_name}. It may have been deleted already. Error: {e}")