# End-to-End Travel Agent: Vector Search 2.0 + ADK

This notebook demonstrates the complete workflow for building a **Gen AI Travel Agent** using **[Vertex AI Vector Search 2.0](https://cloud.google.com/vertex-ai/docs/vector-search-2/overview)** and **[Agent Development Kit (ADK)](https://google.github.io/adk-docs/)**.

## What is Vector Search 2.0?

Vector Search 2.0 is Google Cloud's fully managed, self-tuning vector database built on Google's [ScaNN (Scalable Nearest Neighbors)](https://github.com/google-research/google-research/tree/master/scann) algorithm - the same technology powering Google Search, YouTube, and Google Play.

### Key Features

| Feature | Description |
|---------|-------------|
| **Zero Indexing to Billion-Scale** | Start immediately with kNN (no indexing), scale to billions with ANN indexes |
| **Unified Data Storage** | Store vectors and metadata together (no separate database needed) |
| **Auto-Embeddings** | Automatic embedding generation using Vertex AI models like `gemini-embedding-001` |
| **Built-in Full Text Search** | No need to generate sparse embeddings yourself |
| **Hybrid Search** | Combine semantic + keyword search with [RRF ranking](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) |
| **Self-Tuning** | Auto-optimized performance without manual configuration |

### Core Architecture

Vector Search 2.0 has three main components:

1. **[Collections](https://cloud.google.com/vertex-ai/docs/vector-search-2/collections/collections)**: Schema-enforced containers for your data
2. **[Data Objects](https://cloud.google.com/vertex-ai/docs/vector-search-2/data-objects/data-objects)**: Individual items with data fields and vector embeddings
3. **[Indexes](https://cloud.google.com/vertex-ai/docs/vector-search-2/indexes/indexes)**: kNN (instant, dev) or ANN (fast, production-scale)

## What We'll Build

In this notebook, we'll:

1. **Build the Knowledge Core**: Ingest real Airbnb data into Vector Search 2.0, using auto-embeddings and metadata storage
2. **Build the Agent**: Use ADK to create an agent that reasons about user intent
3. **Connect Them**: Wrap Vector Search 2.0 as a **Tool** that the Agent calls autonomously to find rentals

**New to Vector Search 2.0?** For a comprehensive introduction, see: [Introduction to Vertex AI Vector Search 2.0](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/vector-search-2-intro.ipynb)

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/google/adk-samples/blob/main/python/notebooks/grounding/vectorsearch2_travel_agent.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/google/adk-samples/blob/main/python/notebooks/grounding/vectorsearch2_travel_agent.ipynb">
      <img width="32px" src="https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

---

## Prerequisites

Before running this notebook, ensure you have:

1. A Google Cloud project with billing enabled ([setup guide](https://cloud.google.com/vertex-ai/docs/start/cloud-environment))
2. The [Security Admin](https://cloud.google.com/iam/docs/roles-permissions/iam#iam.securityAdmin) (`roles/iam.securityAdmin`) IAM role on your project

### Important: Resource Cleanup

Vector Search 2.0 resources incur costs when active. **Make sure to run the cleanup section at the end** of this tutorial to delete all Collections and avoid unexpected charges.

---

# Part 1: Setup and Installation

## Install Required Packages

First, we'll install the necessary Python libraries:

- **google-cloud-vectorsearch**: The Vector Search 2.0 SDK for creating collections and searching
- **google-adk**: Agent Development Kit for building AI agents
- **pandas, requests**: Utilities for downloading and processing our Airbnb dataset

If running in Colab, this will also authenticate you and restart the runtime.

In [None]:
!pip install --upgrade --quiet google-cloud-vectorsearch pandas requests google-adk

import sys
if "google.colab" in sys.modules:
    # Authenticate to Google Cloud (required for Colab)
    from google.colab import auth
    auth.authenticate_user()

    # Restart runtime to pick up new packages
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

print("Libraries installed. Runtime restarted if on Colab.")

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.1/46.1 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.3/53.3 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m173.6/173.6 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m26.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.7/64.7 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m72.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m234.9/234.9 kB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Configure Project and Initialize SDK Clients

Here we set up the core configuration for our project:

1. Set your Google Cloud project ID and location
2. Configure environment variables for ADK/Gemini
3. Initialize the three Vector Search 2.0 SDK clients:
   - **admin_client**: For managing Collections and Indexes
   - **data_client**: For creating/updating/deleting Data Objects
   - **search_client**: For performing search queries

> **Important**: Replace `"your-project-id"` with your actual Google Cloud project ID.

In [None]:
import os
from google.cloud import vectorsearch_v1beta

# --- PROJECT SETTINGS ---
# Replace with your Google Cloud project ID
PROJECT_ID = "gcp-samples-ic0"  # @param {type:"string"}
LOCATION = "us-central1"        # @param {type:"string"}
COLLECTION_ID = "london-travel-agent-demo"  # Unique name for our collection

# Validate PROJECT_ID
if PROJECT_ID == "your-project-id" or not PROJECT_ID:
    raise ValueError("Please set PROJECT_ID to your actual Google Cloud project ID")

# ADK / Gemini Configuration
os.environ["GOOGLE_CLOUD_PROJECT"] = PROJECT_ID
os.environ["GOOGLE_CLOUD_LOCATION"] = LOCATION
os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "True"

# --- SDK CLIENTS ---
# Vector Search 2.0 uses a modular client architecture with three specialized clients:

# 1. VectorSearchServiceClient: Manages Collections and Indexes (CRUD operations)
admin_client = vectorsearch_v1beta.VectorSearchServiceClient()

# 2. DataObjectServiceClient: Manages Data Objects (create, update, delete)
data_client = vectorsearch_v1beta.DataObjectServiceClient()

# 3. DataObjectSearchServiceClient: Performs search and query operations
search_client = vectorsearch_v1beta.DataObjectSearchServiceClient()

# Resource paths
parent = f"projects/{PROJECT_ID}/locations/{LOCATION}"
collection_path = f"{parent}/collections/{COLLECTION_ID}"

print(f"Project: {PROJECT_ID}")
print(f"Location: {LOCATION}")
print(f"Collection: {COLLECTION_ID}")
print(f"\nSDK clients initialized successfully.")

Project: gcp-samples-ic0
Location: us-central1
Collection: london-travel-agent-demo

SDK clients initialized successfully.


## Enable Required Google Cloud APIs

Before using Vector Search 2.0, we need to enable two APIs in your project:

- **vectorsearch.googleapis.com**: The Vector Search 2.0 API itself
- **aiplatform.googleapis.com**: Required for auto-embeddings with Vertex AI models like `gemini-embedding-001`

This command is idempotent - it's safe to run even if the APIs are already enabled.

In [None]:
!gcloud services enable vectorsearch.googleapis.com aiplatform.googleapis.com --project "{PROJECT_ID}"

Operation "operations/acat.p2-761793285222-6100c791-e4ee-48ae-8ef6-c26dcc7e21dc" finished successfully.


To take a quick anonymous survey, run:
  $ gcloud survey



---

# Part 2: Data Pipeline

## Download and Clean the Airbnb Dataset

We'll use real data from [Inside Airbnb](http://insideairbnb.com/) - a project that provides data about Airbnb listings in cities worldwide. Our dataset contains London vacation rentals with:

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique listing ID |
| `name` | string | Listing title |
| `description` | string | Full listing description |
| `price` | number | Price per night in GBP |
| `neighborhood` | string | London neighborhood (e.g., "Hackney", "Islington") |
| `listing_url` | string | Airbnb URL |
| `instant_bookable` | string | "t" if instantly bookable, "f" otherwise |
| `neighborhood_overview` | string | Description of the area |

The cleaning process includes:

- Selecting relevant columns (name, description, price, neighborhood, etc.)
- Converting price from `"$1,234.00"` format to numeric
- Filling missing values to avoid API errors
- Limiting to 2,000 listings for demo speed (scales to 90K+ in production)

In [None]:
import pandas as pd
import requests
import io

# Source: Inside Airbnb (London, Sept 2025)
DATA_URL = "https://data.insideairbnb.com/united-kingdom/england/london/2025-09-14/data/listings.csv.gz"

print("Downloading dataset...")
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"}
response = requests.get(DATA_URL, headers=headers)
response.raise_for_status()

# Load GZIP directly
df = pd.read_csv(io.BytesIO(response.content), compression='gzip')

# --- CLEANING ---
cols = ['id', 'name', 'description', 'price', 'neighborhood_overview', 'listing_url', 'instant_bookable', 'neighbourhood_cleansed']
df = df[cols].copy()

# Clean Price (remove $ and ,) and convert to float
df['price'] = df['price'].astype(str).str.replace(r'[$,]', '', regex=True)
df['price'] = pd.to_numeric(df['price'], errors='coerce').fillna(0.0)

# Fill NaNs in text fields (Critical to avoid API errors)
str_cols = ['name', 'neighbourhood_cleansed', 'instant_bookable', 'listing_url', 'description', 'neighborhood_overview']
for col in str_cols:
    df[col] = df[col].fillna("").astype(str)

# Normalize boolean string
df['instant_bookable'] = df['instant_bookable'].str.lower()

# Subset for demo speed
df_demo = df.head(2000).reset_index(drop=True)
print(f"Loaded & Cleaned {len(df_demo)} listings.")

Downloading dataset...
Loaded & Cleaned 2000 listings.


---

# Part 3: Create Collection

## Create a Vector Search 2.0 Collection

A **Collection** is a schema-enforced container for your data in Vector Search 2.0. Think of it as a table in a traditional database, but optimized for vector operations.

### Collection Schemas

Each Collection has two schemas:

1. **[Data Schema](https://cloud.google.com/vertex-ai/docs/vector-search-2/collections/collections#data-schema)**: Defines the structure of your data fields using [JSON Schema](https://json-schema.org/) format. All Data Objects must conform to this schema.

2. **[Vector Schema](https://cloud.google.com/vertex-ai/docs/vector-search-2/collections/collections#vector-schema)**: Defines your embedding fields with their dimensions and configurations. You can have multiple vector fields per object (e.g., text_embedding, image_embedding).

### Auto-Embeddings Feature

One of Vector Search 2.0's most powerful features is **automatic embedding generation**. When you configure `vertex_embedding_config` in your vector schema, the service automatically generates embeddings using Vertex AI models. This means you don't need to:

- Manage embedding model infrastructure
- Pre-compute embeddings before ingestion  
- Handle embedding API calls yourself

We use a `text_template` to combine `description` + `neighborhood_overview` for richer semantic embeddings.

In [None]:
# Define the Collection schema
collection_config = {
    # DATA SCHEMA: Defines the structure of your data fields
    # All fields in Data Objects must match these types
    "data_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},              # Listing title
            "price": {"type": "number"},             # Price per night (GBP)
            "neighborhood": {"type": "string"},      # London neighborhood
            "listing_url": {"type": "string"},       # Airbnb URL
            "instant_bookable": {"type": "string"},  # "t" or "f"
            "description": {"type": "string"},       # Full listing description
            "neighborhood_overview": {"type": "string"}  # Area description
        }
    },

    # VECTOR SCHEMA: Defines embedding fields and their configurations
    "vector_schema": {
        "description_embedding": {
            "dense_vector": {
                # Embedding dimensions (768 for gemini-embedding-001)
                "dimensions": 768,

                # AUTO-EMBEDDING CONFIGURATION
                # Vector Search 2.0 will automatically generate embeddings
                # using the specified Vertex AI model
                "vertex_embedding_config": {
                    "model_id": "gemini-embedding-001",

                    # text_template: Combines multiple fields into embedding input
                    # This creates richer semantic embeddings by including both
                    # the description AND neighborhood context
                    "text_template": "Description: {description}. Neighborhood: {neighborhood_overview}.",

                    # task_type: Optimizes embeddings for retrieval use cases
                    # Use RETRIEVAL_DOCUMENT for documents being indexed
                    "task_type": "RETRIEVAL_DOCUMENT"
                }
            }
        }
    }
}

# Create the Collection (or skip if it already exists)
try:
    existing = admin_client.get_collection(name=collection_path)
    print(f"Collection '{COLLECTION_ID}' already exists.")
except Exception:
    print(f"Creating Collection '{COLLECTION_ID}'...")
    request = vectorsearch_v1beta.CreateCollectionRequest(
        parent=parent,
        collection_id=COLLECTION_ID,
        collection=collection_config
    )
    operation = admin_client.create_collection(request=request)
    operation.result()  # Wait for completion
    print(f"Collection '{COLLECTION_ID}' created successfully!")

Creating Collection 'london-travel-agent-demo'...
Collection 'london-travel-agent-demo' created successfully!


---

# Part 4: Ingest Data Objects

## Batch Ingest Data Objects with Auto-Embeddings

A **Data Object** represents a single item in your Collection. Each Data Object consists of:

1. **data_object_id**: Unique identifier for the object
2. **data**: Data fields (matching the data_schema)
3. **vectors**: Embedding vectors (matching the vector_schema, or empty for auto-generation)

### Batch Ingestion

For efficient data loading, Vector Search 2.0 supports batch operations:

- **BatchCreateDataObjectsRequest**: Add up to 250 objects per request
- **Auto-embeddings**: Pass `vectors: {}` to trigger automatic embedding generation

### Batch Size Limits

When using auto-embeddings, batch size is limited by the embedding model's "max texts per request":
- `gemini-embedding-001`: 250 texts per request
- Other models may have different limits

We add a small delay between batches to respect API quotas. This step may take a few minutes as embeddings are generated for each listing.

In [None]:
from tqdm.auto import tqdm
import time

print(f"Ingesting {len(df_demo)} listings into '{COLLECTION_ID}'...")

# Batch size: Max 250 for gemini-embedding-001 auto-embeddings
BATCH_SIZE = 100  # Using 100 for safety margin

# Prepare Data Objects
# Each object needs: data_object_id + data (matching schema) + vectors (empty for auto-embedding)
data_objects = []
for _, row in df_demo.iterrows():
    data_objects.append({
        "data_object_id": str(row['id']),  # Unique ID (must be string)
        "data_object": {
            "data": {
                "name": row['name'],
                "price": float(row['price']),  # Ensure numeric type
                "neighborhood": row['neighbourhood_cleansed'],
                "instant_bookable": row['instant_bookable'],
                "listing_url": row['listing_url'],
                "description": row['description'],
                "neighborhood_overview": row['neighborhood_overview']
            },
            # Empty vectors = trigger auto-embedding generation
            # Vector Search 2.0 will use the vertex_embedding_config from our schema
            "vectors": {}
        }
    })

# Batch Upload with Progress Bar
for i in tqdm(range(0, len(data_objects), BATCH_SIZE), desc="Uploading batches"):
    batch = data_objects[i:i + BATCH_SIZE]

    try:
        request = vectorsearch_v1beta.BatchCreateDataObjectsRequest(
            parent=collection_path,
            requests=batch
        )
        data_client.batch_create_data_objects(request)

        # Rate limiting: Pause between batches to respect embedding API quotas
        time.sleep(2)

    except Exception as e:
        # Skip "already exists" errors (useful for re-runs)
        if "already exists" not in str(e).lower():
            tqdm.write(f"Batch error: {str(e)[:80]}")

print(f"\nIngestion complete! {len(data_objects)} listings loaded.")

Ingesting 2000 listings into 'london-travel-agent-demo'...


Uploading batches:   0%|          | 0/20 [00:00<?, ?it/s]


Ingestion complete! 2000 listings loaded.


---

# Part 5: Vector Search Tool

## Define the Vector Search Tool

This function will be used as a "tool" by our ADK agent. It performs **Hybrid Search**, which combines:

1. **Semantic Search**: Uses auto-generated query embeddings to find semantically similar listings
2. **Text Search**: Matches exact keywords across name, description, and neighborhood fields
3. **RRF Ranking**: Combines results using Reciprocal Rank Fusion (60% semantic, 40% keyword)

### Search Types in Vector Search 2.0

| Search Type | Description | Use Case |
|-------------|-------------|----------|
| **[Semantic Search](https://cloud.google.com/vertex-ai/docs/vector-search-2/query-search/search#semantic-search)** | Natural language queries with auto-generated embeddings | "Find cozy artist lofts" |
| **[Text Search](https://cloud.google.com/vertex-ai/docs/vector-search-2/query-search/search#text-search)** | Traditional keyword matching | "garden flat" (exact match) |
| **[Hybrid Search](https://cloud.google.com/vertex-ai/docs/vector-search-2/query-search/search#hybrid-search)** | Combine semantic + keyword with RRF ranking | Best of both worlds |
| **[Vector Search](https://cloud.google.com/vertex-ai/docs/vector-search-2/query-search/search#vector-search)** | Provide your own query vector | Custom embeddings |

### Filter Syntax

Vector Search 2.0 supports [rich query operators](https://cloud.google.com/vertex-ai/docs/vector-search-2/query-search/query#filter-syntax) for filtering:

**Comparison**: `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`  
**Logical**: `$and`, `$or`  
**Array**: `$in`, `$nin`, `$all`

**Filter Examples**:

```json
// Price under 200
{"price": {"$lt": 200.0}}

// Specific neighborhood
{"neighborhood": {"$eq": "Hackney"}}

// Combined: Hackney + under 200 + instant bookable
{"$and": [
    {"neighborhood": {"$eq": "Hackney"}},
    {"price": {"$lt": 200.0}},
    {"instant_bookable": {"$eq": "t"}}
]}
```

In [None]:
import json
from typing import Dict, List, Any
from google.cloud import vectorsearch_v1beta

def find_rentals(query: str, filter: str = "") -> List[Dict[str, Any]]:
    """
    Search for vacation rentals using Hybrid Search (Semantic + Keyword) with metadata filtering.

    This function demonstrates Vector Search 2.0's hybrid search capability, combining:
    1. Semantic Search: Understands query intent (e.g., "cozy" finds warm, inviting spaces)
    2. Text Search: Matches exact keywords (e.g., "garden" finds listings with gardens)
    3. RRF Ranking: Merges results using Reciprocal Rank Fusion for balanced relevance

    Args:
        query: Natural language description of desired rental (e.g., "artist loft with garden")
        filter: JSON string with metadata filters (e.g., '{"price": {"$lt": 200}}')

    Returns:
        List of matching rentals with name, price, neighborhood, and URL
    """
    print(f"\n>>> TOOL CALL: find_rentals (Hybrid Search)")
    print(f"    Query: {query}")
    print(f"    Filter: {filter if filter else 'None'}")

    # Parse Filter JSON (if provided)
    filter_dict = None
    if filter.strip():
        try:
            filter_dict = json.loads(filter)
        except json.JSONDecodeError:
            print("    Warning: Invalid JSON filter, ignoring.")

    try:
        # Configure Semantic Search
        # Uses auto-generated embeddings with QUESTION_ANSWERING task type
        # (pairs with RETRIEVAL_DOCUMENT used during indexing)
        semantic_search = vectorsearch_v1beta.SemanticSearch(
            search_text=query,
            search_field="description_embedding",  # The vector field to search
            filter=filter_dict,  # Metadata filtering supported
            task_type="QUESTION_ANSWERING",  # Optimized for query-document matching
            top_k=10,
            output_fields=vectorsearch_v1beta.OutputFields(
                data_fields=["name", "price", "neighborhood", "listing_url"]
            )
        )

        # Configure Text Search (Keyword Matching)
        # Searches across multiple text fields for exact keyword matches
        text_search = vectorsearch_v1beta.TextSearch(
            search_text=query,
            data_field_names=["name", "description", "neighborhood_overview"],
            top_k=10,
            output_fields=vectorsearch_v1beta.OutputFields(
                data_fields=["name", "price", "neighborhood", "listing_url"]
            )
        )

        # Execute Hybrid Search with RRF Ranking
        # BatchSearchDataObjectsRequest combines multiple searches
        # RRF (Reciprocal Rank Fusion) merges results based on position in each list
        # weights=[0.6, 0.4] gives slightly more importance to semantic search
        request = vectorsearch_v1beta.BatchSearchDataObjectsRequest(
            parent=collection_path,
            searches=[
                vectorsearch_v1beta.Search(semantic_search=semantic_search),
                vectorsearch_v1beta.Search(text_search=text_search)
            ],
            combine=vectorsearch_v1beta.BatchSearchDataObjectsRequest.CombineResultsOptions(
                ranker=vectorsearch_v1beta.Ranker(
                    rrf=vectorsearch_v1beta.ReciprocalRankFusion(weights=[0.6, 0.4])
                )
            )
        )

        response = search_client.batch_search_data_objects(request=request)

        # Format Results
        results = []
        if response.results and response.results[0].results:
            for res in response.results[0].results:
                data = res.data_object.data
                results.append({
                    "name": data.get("name"),
                    "price": data.get("price"),
                    "neighborhood": data.get("neighborhood"),
                    "url": data.get("listing_url")
                })

        print(f"    Found: {len(results)} listings")
        return results

    except Exception as e:
        print(f"    Error: {e}")
        return []

---

# Part 6: Build the ADK Agent

## Create the ADK Travel Agent

Now we bring everything together! We'll use **[Agent Development Kit (ADK)](https://google.github.io/adk-docs/)** to create an AI agent that:

1. **Understands user intent**: Parses natural language requests
2. **Constructs filters**: Generates appropriate metadata filters from the conversation
3. **Calls our tool**: Invokes `find_rentals` with the right parameters
4. **Summarizes results**: Presents findings in a helpful format

The agent's instructions teach it how to use the filter syntax and what fields are available.

In [None]:
from google.adk.agents import Agent
from google.adk.runners import InMemoryRunner

# Agent Instructions
# These teach the agent how to use the find_rentals tool effectively
AGENT_INSTRUCTION = '''
You are an expert London Travel Agent helping users find vacation rentals.

You have access to a tool called `find_rentals` with two arguments:
1. `query`: A description of the vibe/place (e.g., "artist loft", "garden flat", "cozy workspace")
2. `filter`: A JSON string to filter results by metadata

### AVAILABLE FILTER FIELDS
- `price` (number): Price per night in GBP
- `neighborhood` (string): London neighborhood (e.g., "Hackney", "Islington", "Camden")
- `instant_bookable` (string): "t" for instantly bookable, "f" for requires approval

### FILTER SYNTAX EXAMPLES
```
Price under £200:           {"price": {"$lt": 200.0}}
Specific neighborhood:      {"neighborhood": {"$eq": "Hackney"}}
Price range:                {"$and": [{"price": {"$gte": 100}}, {"price": {"$lte": 300}}]}
Hackney + Instant Book:     {"$and": [{"neighborhood": {"$eq": "Hackney"}}, {"instant_bookable": {"$eq": "t"}}]}
Complex filter:             {"$and": [{"neighborhood": {"$eq": "Hackney"}}, {"price": {"$lt": 200}}, {"instant_bookable": {"$eq": "t"}}]}
```

### GUIDELINES
- Extract the semantic/vibe part of the request for the `query` parameter
- Extract price, location, and booking constraints for the `filter` parameter
- Always summarize the results in a friendly, helpful manner
- Include prices and URLs when available
'''

# Create the Agent
travel_agent = Agent(
    model='gemini-2.5-flash',
    name='travel_agent',
    instruction=AGENT_INSTRUCTION,
    tools=[find_rentals],  # Bind our Vector Search tool
)

# Create the Runner with InMemoryRunner for simplified execution
runner = InMemoryRunner(agent=travel_agent, app_name="travel_agent")

print("Travel Agent initialized and ready!")

Travel Agent initialized and ready!


---

# Part 7: Test the Agent

Now let's test our agent with various queries! The agent will:
1. Parse your natural language request
2. Construct appropriate filters from constraints (price, location, booking)
3. Call the `find_rentals` tool with hybrid search
4. Summarize the results

## Example Queries

### Test 1: Simple Query with Location Filter

Let's test a basic query. The agent should:

- Extract "Hackney" as a neighborhood filter
- Use "inspiring workspace" as the semantic query
- Return listings that match both criteria

In [None]:
await runner.run_debug("I want an inspiring workspace in Hackney")


 ### Continue session: debug_session_id

User > I want an inspiring workspace in Hackney

>>> TOOL CALL: find_rentals (Hybrid Search)
    Query: inspiring workspace
    Filter: {"neighborhood": {"$eq": "Hackney"}}
    Found: 10 listings
travel_agent > Here are some options for an inspiring workspace in Hackney:

*   **Cosy Town house with garden in London, Hackney** - £275 - https://www.airbnb.com/rooms/379137
*   **Luxury 2 bed flat in Dalston Square** - Price not available - https://www.airbnb.com/rooms/520962
*   **Room + own shower and WC, Victorian villa** - Price not available - https://www.airbnb.com/rooms/907927
*   **Dble Room+Wi-Fi-Arsenal/Clissold park/North London** - £63 - https://www.airbnb.com/rooms/823097
*   **East London flat on the Canal** - £55 - https://www.airbnb.com/rooms/469219
*   **Stylish Private Retro Hackney Studio Apartment** - £102 - https://www.airbnb.com/rooms/605812
*   **Beautiful, Luxurious Art Deco +private bathroom** - Price not available - https:

[Event(model_version='gemini-2.5-flash', content=Content(
   parts=[
     Part(
       function_call=FunctionCall(
         args={
           'filter': '{"neighborhood": {"$eq": "Hackney"}}',
           'query': 'inspiring workspace'
         },
         id='adk-cdad73aa-45da-4741-adad-47ca07956f27',
         name='find_rentals'
       ),
       thought_signature=b"\n\x97\t\x01\x8f=k_\xcd\xf5\xba\xa10\x04\xdd\x1f;\xdfc_\xadZ\x86\x8aDn\xed\xe1k\x00\x92R\xad\xebr\xadV~\x08\xc9\xb3J+~\x87\x9d'V\xcc\xf8\x19D&%*\xe9\xadC<I\xb5'\xf5;\xb9H\xcc\x99\x82\xf1\x1c\x8dK6\xf2\x82\xbd\x0f\xd97\x9c\xe3\xf2PW/@\xb7\xeb\xac\xe6K\xfd\x93G\x82\\...'
     ),
   ],
   role='model'
 ), grounding_metadata=None, partial=None, turn_complete=None, finish_reason=<FinishReason.STOP: 'STOP'>, error_code=None, error_message=None, interrupted=None, custom_metadata=None, usage_metadata=GenerateContentResponseUsageMetadata(
   cache_tokens_details=[
     ModalityTokenCount(
       modality=<MediaModality.TEXT: 'TEXT'>,

### Test 2: Complex Query with Multiple Filters

This query has multiple constraints. The agent should construct a compound filter:

- `neighborhood`: "Hackney"
- `price`: less than 200
- `instant_bookable`: "t"

And use "creative artist workspace" as the semantic query.

In [None]:
await runner.run_debug(
    "Find me a creative artist workspace in Hackney under £200 that I can book instantly."
)


 ### Continue session: debug_session_id

User > Find me a creative artist workspace in Hackney under £200 that I can book instantly.

>>> TOOL CALL: find_rentals (Hybrid Search)
    Query: creative artist workspace
    Filter: {"$and": [{"neighborhood": {"$eq": "Hackney"}}, {"price": {"$lt": 200}}, {"instant_bookable": {"$eq": "t"}}]}
    Found: 10 listings
travel_agent > Great news! I found several creative artist workspaces in Hackney that are under £200 and available for instant booking:

*   **The Residential Suite Above Gallery** - £131 - https://www.airbnb.com/rooms/173082
*   **Characterful Warehouse Apartment in the Heart of Shoreditch** - £198 - https://www.airbnb.com/rooms/3008678
*   **Shoreditch with Garden!!!** - £122 - https://www.airbnb.com/rooms/727626
*   **Shoreditch Loft** - £151 - https://www.airbnb.com/rooms/227502

I also found some other interesting options, though their prices are currently listed as £0, which might mean they require direct inquiry:

*   **Fant

[Event(model_version='gemini-2.5-flash', content=Content(
   parts=[
     Part(
       function_call=FunctionCall(
         args={
           'filter': '{"$and": [{"neighborhood": {"$eq": "Hackney"}}, {"price": {"$lt": 200}}, {"instant_bookable": {"$eq": "t"}}]}',
           'query': 'creative artist workspace'
         },
         id='adk-8c3f1a38-4a01-4b3f-88a2-7d91a024751d',
         name='find_rentals'
       ),
       thought_signature=b'\n\x95\x06\x01\x8f=k_\xb8&\xae>-\x99\xabC\xbb|\xd7E4\xfb\xf0\x89\\\x13\x1f\x0e\x18\xc0n-k\xd5V7\xec\x192G\xc8\xd0\xab\x94\xd0[@\x1e\xf2\xb3\x8f!}\x9b\x9e\x97L\xcd;\x08\x82\x9a\xe1~\x92x\xe2\xba\xf7(\xc2Y`\xa9\t\x82\t\x84\xe7]\x7f[\x8b\xdf\xe3\xaf\xba!\xcd\xab\x00m\x87K\xd9\x94i...'
     ),
   ],
   role='model'
 ), grounding_metadata=None, partial=None, turn_complete=None, finish_reason=<FinishReason.STOP: 'STOP'>, error_code=None, error_message=None, interrupted=None, custom_metadata=None, usage_metadata=GenerateContentResponseUsageMetadata(
   c

### Test 3: Different Phrasing, Same Intent

This query has the same constraints as Test 2 but with different phrasing. The agent should produce similar results, demonstrating its ability to understand varied natural language expressions of the same intent.

In [None]:
await runner.run_debug(
    "Find me a place in Hackney under £200 that I can book instantly. I want a creative artist vibe.",
)


 ### Continue session: debug_session_id

User > Find me a place in Hackney under £200 that I can book instantly. I want a creative artist vibe.

>>> TOOL CALL: find_rentals (Hybrid Search)
    Query: creative artist vibe
    Filter: {"$and": [{"neighborhood": {"$eq": "Hackney"}}, {"price": {"$lt": 200}}, {"instant_bookable": {"$eq": "t"}}]}
    Found: 10 listings
travel_agent > I've found some great places in Hackney that fit your criteria for a creative artist vibe, are under £200, and are instantly bookable!

Here are the top matches:

*   **The Residential Suite Above Gallery** - £131 - https://www.airbnb.com/rooms/173082
*   **Characterful Warehouse Apartment in the Heart of Shoreditch** - £198 - https://www.airbnb.com/rooms/3008678
*   **Shoreditch Loft** - £151 - https://www.airbnb.com/rooms/227502
*   **Hackney Stylish & light 1 bedroom Victorian flat** - £123 - https://www.airbnb.com/rooms/427584

I also found some other listings in Hackney that might have a creative vibe and 

[Event(model_version='gemini-2.5-flash', content=Content(
   parts=[
     Part(
       function_call=FunctionCall(
         args={
           'filter': '{"$and": [{"neighborhood": {"$eq": "Hackney"}}, {"price": {"$lt": 200}}, {"instant_bookable": {"$eq": "t"}}]}',
           'query': 'creative artist vibe'
         },
         id='adk-7540dfdc-0176-4cbf-a04a-6e268b3e86de',
         name='find_rentals'
       ),
       thought_signature=b'\n\x97\x05\x01\x8f=k_\xad\xbf&\xc2v\xb0\tH\xb7\xe9\xe4Mh\x88\x9eT\xa9\xdc\xd4%\x95\xe4a\xe7\x0e\x8f\xda\xb6\xb2\xad/\x11\x1b\x836\x1e`\xeb\xb5j\xad\x0b\xd2ZMNOB\xce\xf5\tL67\xb5\xc4\xc1`\x8f)$\xbf$\x8e\xf4\xb2\xd3F\nf\x1a\x840\x98\xcc\xac\x03\x11\xc8BzO\xdc\xa7)\xa7\xc3(\xda...'
     ),
   ],
   role='model'
 ), grounding_metadata=None, partial=None, turn_complete=None, finish_reason=<FinishReason.STOP: 'STOP'>, error_code=None, error_message=None, interrupted=None, custom_metadata=None, usage_metadata=GenerateContentResponseUsageMetadata(
   cache_tok

---

# Part 8: Cleanup

## Clean Up Resources

Vector Search 2.0 resources incur costs when active. Run the cell below to delete the Collection and all its data.

**Note**: Data Objects must be deleted before the Collection can be deleted. The cleanup code handles this automatically by:
1. Querying and deleting all Data Objects in batches
2. Deleting the Collection after all objects are removed

> **Warning**: This action is irreversible. All data will be permanently deleted. Set `DELETE_COLLECTION = True` to run the cleanup.

In [None]:
DELETE_COLLECTION = True  # @param {type:"boolean"}

if DELETE_COLLECTION:
    try:
        print(f"Deleting all Data Objects from '{COLLECTION_ID}'...")

        # Delete all Data Objects first (required before Collection deletion)
        deleted_count = 0
        while True:
            query_request = vectorsearch_v1beta.QueryDataObjectsRequest(
                parent=collection_path,
                page_size=100,
                output_fields=vectorsearch_v1beta.OutputFields(data_fields=[])
            )
            results = list(search_client.query_data_objects(query_request))

            if not results:
                break

            for obj in results:
                try:
                    delete_request = vectorsearch_v1beta.DeleteDataObjectRequest(
                        name=obj.name
                    )
                    data_client.delete_data_object(delete_request)
                    deleted_count += 1
                except Exception as e:
                    pass

            print(f"  Deleted {deleted_count} data objects...")

        print(f"Deleted {deleted_count} total data objects.")

        # Now delete the Collection
        print(f"Deleting Collection '{COLLECTION_ID}'...")
        request = vectorsearch_v1beta.DeleteCollectionRequest(
            name=collection_path
        )
        operation = admin_client.delete_collection(request=request)
        operation.result()
        print(f"Collection '{COLLECTION_ID}' deleted successfully.")

    except Exception as e:
        print(f"Error during cleanup: {e}")
else:
    print("Cleanup skipped. Set DELETE_COLLECTION = True to delete resources.")

Deleting all Data Objects from 'london-travel-agent-demo'...
  Deleted 2000 data objects...
Deleted 2000 total data objects.
Deleting Collection 'london-travel-agent-demo'...
Collection 'london-travel-agent-demo' deleted successfully.


---

# Summary

In this notebook, you learned how to:

1. **Set up Vector Search 2.0**: Initialize SDK clients and configure your project
2. **Create a Collection**: Define data and vector schemas with auto-embedding configuration
3. **Ingest Data**: Batch upload Data Objects with automatic embedding generation
4. **Implement Hybrid Search**: Combine semantic and keyword search with RRF ranking
5. **Build an Agent**: Use ADK to create an AI agent that autonomously searches your data
6. **Apply Filters**: Use rich query syntax for metadata filtering

## Key Takeaways

| Concept | What You Learned |
|---------|-----------------|
| **Collections** | Schema-enforced containers with data + vector schemas |
| **Auto-Embeddings** | Automatic embedding generation via `vertex_embedding_config` |
| **Hybrid Search** | Combine semantic understanding with keyword precision |
| **RRF Ranking** | Reciprocal Rank Fusion for balanced result merging |
| **Filter Syntax** | `$eq`, `$lt`, `$and`, `$or` for metadata filtering |

## Next Steps

- **[Vector Search 2.0 Introduction](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/vector-search-2-intro.ipynb)**: Deep dive into all Vector Search 2.0 features
- **[ANN Indexes](https://cloud.google.com/vertex-ai/docs/vector-search-2/indexes/indexes)**: Scale to billions of vectors with production-ready performance
- **[ADK Documentation](https://google.github.io/adk-docs/)**: Learn more about building AI agents
- **[Vector Search 2.0 Documentation](https://cloud.google.com/vertex-ai/docs/vector-search-2/overview)**: Complete API reference