# Vertex AI + Parallel Web Search Grounding Tutorial

This notebook teaches you how to ground Gemini responses with real-time web data using Parallel's web search API.

**What you'll learn:**
1. How the Parallel grounding integration works
2. How to make grounded API calls to Vertex AI
3. How to parse sources and citations from responses
4. How to compare grounded vs. ungrounded responses

## Prerequisites

1. A Google Cloud project with Vertex AI API enabled
2. A Parallel API key from https://parallel.ai/products/search
3. Google Cloud authentication configured (`gcloud auth application-default login`)

## Step 1: Setup

First, let's configure our credentials and imports.

In [None]:
import os

import google.auth
import google.auth.transport.requests
import requests

# Load from .env file if available
from dotenv import load_dotenv
from IPython.display import Markdown, display

load_dotenv()

# Configuration - set these or use environment variables
PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
PARALLEL_API_KEY = os.environ.get("PARALLEL_API_KEY")
LOCATION = "us-central1"

# Validate setup
assert PROJECT_ID, "Set GOOGLE_CLOUD_PROJECT environment variable"
assert PARALLEL_API_KEY, "Set PARALLEL_API_KEY environment variable"
print(f"Project: {PROJECT_ID}")

## Step 2: Understanding the Grounding Config

To enable Parallel web search grounding, we add a `tools` parameter to our API request. Here's the structure:

In [None]:
def build_grounding_config(api_key: str, max_results: int = 10) -> dict:
    """
    Build the grounding configuration for Parallel web search.
    
    This config tells Vertex AI to use Parallel's web search API
    to ground the model's responses with real-time web data.
    """
    return {
        "parallelAiSearch": {
            "api_key": api_key,
            # Optional: customize search behavior
            # "customConfigs": {
            #     "max_results": 5,
            #     "source_policy": {
            #         "include_domains": ["reuters.com", "bbc.com"],
            #         "exclude_domains": ["twitter.com"]
            #     }
            # }
        }
    }

grounding_config = build_grounding_config(PARALLEL_API_KEY)

## Step 3: Making a Grounded API Call

Now let's build a function to call the Vertex AI API with grounding enabled.

In [None]:
def get_access_token():
    """Get a Google Cloud access token for API authentication."""
    credentials, _ = google.auth.default()
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)
    return credentials.token


def generate_with_grounding(prompt: str, model_id: str = "gemini-2.5-flash") -> dict:
    """
    Call Vertex AI Gemini with Parallel web search grounding.
    
    Args:
        prompt: The question to ask
        model_id: Which Gemini model to use
        
    Returns:
        The raw API response as a dictionary
    """
    # Build the API endpoint URL
    url = f"https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{model_id}:generateContent"
    
    # Build the request body
    request_body = {
        "contents": [
            {
                "role": "user",
                "parts": [{"text": prompt}]
            }
        ],
        # This is the key part - adding the grounding tool
        "tools": [build_grounding_config(PARALLEL_API_KEY)],
        "generationConfig": {
            "temperature": 0.2
        }
    }
    
    # Make the API call
    response = requests.post(
        url,
        headers={
            "Authorization": f"Bearer {get_access_token()}",
            "Content-Type": "application/json",
        },
        json=request_body,
        timeout=120,
    )
    response.raise_for_status()
    return response.json()


def generate_without_grounding(prompt: str, model_id: str = "gemini-2.5-flash") -> dict:
    """
    Call Vertex AI Gemini WITHOUT grounding (for comparison).
    """
    url = f"https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{model_id}:generateContent"
    
    request_body = {
        "contents": [
            {
                "role": "user",
                "parts": [{"text": prompt}]
            }
        ],
        # No "tools" parameter = no grounding
        "generationConfig": {
            "temperature": 0.2
        }
    }
    
    response = requests.post(
        url,
        headers={
            "Authorization": f"Bearer {get_access_token()}",
            "Content-Type": "application/json",
        },
        json=request_body,
        timeout=120,
    )
    response.raise_for_status()
    return response.json()

print("Functions defined")

## Step 4: Make Your First Grounded Request

Let's ask a question that requires recent information.

In [None]:
# Ask a question about recent events
question = "Who won the most recent Super Bowl?"
raw_response = generate_with_grounding(question)

# Let's look at the raw response structure
print("Response keys:", raw_response.keys())
print("\nCandidate keys:", raw_response["candidates"][0].keys())

## Step 5: Parsing the Response

The API response has a specific structure. Let's write a function to extract the useful parts.

In [None]:
def parse_grounded_response(response: dict) -> dict:
    """
    Parse a grounded API response into a clean format.
    
    Response structure:
    {
        "candidates": [{
            "content": {
                "parts": [{"text": "The answer..."}]
            },
            "groundingMetadata": {
                "webSearchQueries": ["query1", "query2"],
                "groundingChunks": [
                    {"web": {"uri": "https://...", "title": "Page Title"}}
                ],
                "groundingSupports": [...]
            }
        }]
    }
    """
    candidate = response.get("candidates", [{}])[0]
    
    # Extract the generated text
    content = candidate.get("content", {})
    parts = content.get("parts", [])
    text = parts[0].get("text", "") if parts else ""
    
    # Extract grounding metadata
    grounding = candidate.get("groundingMetadata", {})
    
    # Extract search queries the model executed
    queries = grounding.get("webSearchQueries", [])
    
    # Extract sources (URLs and titles)
    sources = []
    for chunk in grounding.get("groundingChunks", []):
        web_info = chunk.get("web", {})
        if web_info:
            sources.append({
                "uri": web_info.get("uri", ""),
                "title": web_info.get("title", "Untitled")
            })
    
    return {
        "text": text,
        "sources": sources,
        "queries": queries
    }


# Parse our response
result = parse_grounded_response(raw_response)

# Display nicely
sources_md = "\n".join([f"- [{s['title']}]({s['uri']})" for s in result["sources"][:5]])

display(Markdown(f"""
### Answer

{result["text"]}

---

**Sources ({len(result['sources'])}):**

{sources_md}

**Search queries:** {result['queries']}
"""))

## Step 6: Compare Grounded vs. Ungrounded

Let's see the difference grounding makes for time-sensitive questions.

In [None]:
def compare_responses(question: str):
    """Compare grounded vs ungrounded responses side by side."""
    
    # Get both responses
    grounded_raw = generate_with_grounding(question)
    ungrounded_raw = generate_without_grounding(question)
    
    # Parse them
    grounded = parse_grounded_response(grounded_raw)
    ungrounded_text = ungrounded_raw["candidates"][0]["content"]["parts"][0]["text"]
    
    sources_md = "\n".join([f"- [{s['title']}]({s['uri']})" for s in grounded["sources"][:3]])
    
    display(Markdown(f"""
## Question: {question}

---

### Without Grounding (training data only)

{ungrounded_text}

---

### With Parallel Grounding (real-time web)

{grounded["text"]}

**Sources:**

{sources_md}
"""))

# Try it!
compare_responses("What was the final score of the most recent Los Angeles Lakers game?")

In [None]:
# Try another question
compare_responses("What were the results of the most recent NBA Finals?")

## Step 7: Custom Grounding Configuration

You can customize the grounding behavior - for example, restricting to specific domains.

In [None]:
def generate_with_custom_grounding(
    prompt: str,
    include_domains: list = None,
    exclude_domains: list = None,
    max_results: int = 10,
) -> dict:
    """
    Call Vertex AI with customized grounding configuration.
    """
    url = f"https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/gemini-2.5-flash:generateContent"
    
    # Build custom grounding config
    grounding_config = {
        "parallelAiSearch": {
            "api_key": PARALLEL_API_KEY,
        }
    }
    
    # Add custom configurations if specified
    custom_configs = {}
    if max_results != 10:
        custom_configs["max_results"] = max_results
    
    source_policy = {}
    if include_domains:
        source_policy["include_domains"] = include_domains
    if exclude_domains:
        source_policy["exclude_domains"] = exclude_domains
    if source_policy:
        custom_configs["source_policy"] = source_policy
        
    if custom_configs:
        grounding_config["parallelAiSearch"]["customConfigs"] = custom_configs
    
    request_body = {
        "contents": [{"role": "user", "parts": [{"text": prompt}]}],
        "tools": [grounding_config],
        "generationConfig": {"temperature": 0.2}
    }
    
    response = requests.post(
        url,
        headers={
            "Authorization": f"Bearer {get_access_token()}",
            "Content-Type": "application/json",
        },
        json=request_body,
        timeout=120,
    )
    response.raise_for_status()
    return response.json()


# Example: Only use trusted news sources
response = generate_with_custom_grounding(
    prompt="What is the latest AI news?",
    include_domains=["theverge.com", "techcrunch.com", "wired.com", "reuters.com"],
    max_results=5
)

result = parse_grounded_response(response)
sources_md = "\n".join([f"- {s['uri']}" for s in result["sources"]])

display(Markdown(f"""
### News from trusted sources only

{result["text"]}

**Sources used:**

{sources_md}
"""))

## Step 8: Try Your Own Questions!

Experiment with different questions to see how grounding helps.

In [None]:
# Try your own question!
your_question = "What were the key announcements at the latest Google I/O?"

response = generate_with_grounding(your_question)
result = parse_grounded_response(response)

sources_md = "\n".join([f"- [{s['title']}]({s['uri']})" for s in result["sources"][:5]])

display(Markdown(f"""
### Q: {your_question}

{result["text"]}

---

**Sources:**

{sources_md}
"""))

## Summary

You've learned:

1. **Grounding config** - Add `tools: [{"parallelAiSearch": {"api_key": "..."}}]` to your request
2. **Making calls** - Use the standard Vertex AI REST API with the grounding tool
3. **Parsing responses** - Extract text from `candidates[0].content.parts[0].text` and sources from `groundingMetadata.groundingChunks`
4. **Customization** - Use `customConfigs` to filter domains and limit results

## Next Steps

- Check out `quickstart.py` for a minimal example using the helper library
- See `demo.py` for a command-line demo
- Read the [Vertex AI documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/grounding-with-parallel) for more options