<a href="https://colab.research.google.com/github/pratik-gond/temp_files/blob/main/GeminiURL_Config.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Build Fast with AI](https://img.shields.io/badge/BuildFastWithAI-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://www.buildfastwithai.com/genai-course)
[![EduChain GitHub](https://img.shields.io/github/stars/satvik314/educhain?style=for-the-badge&logo=github&color=gold)](https://github.com/satvik314/educhain)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1KxCJz4FIwiiR6kjuk8w8SiQJzu8uZuzB?usp=sharing)
## Master Generative AI in 6 Weeks
**What You'll Learn:**
- Build with Latest LLMs
- Create Custom AI Apps
- Learn from Industry Experts
- Join Innovation Community
Transform your AI ideas into reality through hands-on projects and expert mentorship.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)
*Empowering the Next Generation of AI Innovators

# Gemini URL Context Starter Notebook

Use this notebook to learn and quickly prototype with the URL Context tool in the Google GenAI SDK. The URL Context tool lets models fetch and reason over live content from URLs you provide, so the model can ground its answers in fresh or specific sources.

What you'll do:
- Understand URL Context in a sentence or two
- Set up the SDK and authentication
- Run a basic example (compare two pages)
- Inspect retrieval metadata
- Synthesize across multiple URLs with structured JSON output
- Combine URL Context with Google Search for agentic browsing

Prerequisites:
- A Gemini API key set in the environment variable `GEMINI_API_KEY`


## What is URL Context?

- The URL Context tool allows Gemini models to retrieve and use content from URLs that you explicitly provide in your request.
- Retrieval uses an optimized cache first and falls back to live fetch for newer content.
- Retrieved content counts toward input tokens; the tool supports up to 20 URLs per request and up to ~34MB per URL.
- Safety checks are enforced. Unsupported sources include paywalled pages and certain media types.

When to use it:
- Extract facts, compare documents, build summaries across multiple sources
- Power agentic workflows that browse and then analyze pages


In [1]:
%pip install -U -q google-genai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.1/43.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m231.9/231.9 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from google import genai
from google.genai import types
from google.colab import userdata

API_KEY = userdata.get('GEMINI_API_KEY')
if not API_KEY:
    raise RuntimeError("GEMINI_API_KEY not found")

client = genai.Client(api_key=API_KEY)
MODEL_ID = "gemini-2.5-flash"

## Basic: Compare two pages

We'll ask the model to compare two recipe pages using URL Context. The tool is enabled by adding `url_context` to the `tools` configuration.


In [5]:
from google.genai.types import GenerateContentConfig

tools = [
    {"url_context": {}},
]

url1 = "https://www.foodnetwork.com/recipes/ina-garten/perfect-roast-chicken-recipe-1940592"
url2 = "https://www.foodnetwork.com/recipes/perfect-roast-chicken-3645195"

response = client.models.generate_content(
    model=MODEL_ID,
    contents=f"Compare the ingredients and cooking times from the recipes at {url1} and {url2}",
    config=types.GenerateContentConfig(
        tools=tools,
    ),
)

print(response.text)


Here's a comparison of the ingredients and cooking times for the two "Perfect Roast Chicken" recipes from Food Network:

**Ina Garten's Perfect Roast Chicken**
*   **Ingredients:**
    *   5 to 6 pound roasting chicken
    *   Kosher salt
    *   Freshly ground black pepper
    *   1 large bunch fresh thyme, plus 20 sprigs
    *   1 lemon, halved
    *   1 head garlic, cut in half crosswise
    *   2 tablespoons (1/4 stick) butter, melted
    *   1 large yellow onion, thickly sliced
    *   4 carrots cut into 2-inch chunks
    *   1 bulb of fennel, tops removed, and cut into wedges
    *   Olive oil
*   **Cooking Time:** 1 hour 30 minutes
*   **Total Time:** 2 hours 10 minutes (Prep: 20 min, Inactive: 20 min, Cook: 1 hour 30 min)
*   **Oven Temperature:** 425 degrees F
*   **Yield:** 8 servings

**Food Network's Perfect Roast Chicken (Emeril Lagasse)**
*   **Ingredients:**
    *   3 carrots, peeled and cut into thirds
    *   3 ribs celery, peeled and cut into thirds
    *   3 onions, 

## Inspect URL retrieval metadata

You can verify which URLs were fetched and their status via `url_context_metadata` on the response.


In [6]:
metadata = getattr(response.candidates[0], "url_context_metadata", None)
if metadata and getattr(metadata, "url_metadata", None):
    for item in metadata.url_metadata:
        print(f"retrieved_url: {item.retrieved_url}")
        print(f"status: {item.url_retrieval_status}")
        print("-")
else:
    print("No url_context_metadata present or tool not used in this response.")


retrieved_url: https://www.foodnetwork.com/recipes/perfect-roast-chicken-3645195
status: UrlRetrievalStatus.URL_RETRIEVAL_STATUS_SUCCESS
-
retrieved_url: https://www.foodnetwork.com/recipes/ina-garten/perfect-roast-chicken-recipe-1940592
status: UrlRetrievalStatus.URL_RETRIEVAL_STATUS_SUCCESS
-


## Intermediate: Multi-URL synthesis with structured output

Ask the model to read several URLs and return a normalized JSON summary. We'll constrain the response using `response_mime_type` and `response_schema`.


In [30]:
from pydantic import BaseModel
import json

class PageSummary(BaseModel):
    url: str
    title: str | None = None
    key_points: list[str]

class Synthesis(BaseModel):
    pages: list[PageSummary]
    cross_findings: list[str]

urls = [
    "https://www.foodnetwork.com/recipes/ina-garten/perfect-roast-chicken-recipe-1940592",
    "https://www.foodnetwork.com/recipes/perfect-roast-chicken-3645195",
]

response = client.models.generate_content(
    model=MODEL_ID,
    contents=(
        "Read the following URLs and return a concise ONLY JSON with: for each page (url, title, 3-5 key_points) and 3-5 cross_findings that compare/contrast them.\n"
        + "\n".join(urls)
    ),
    config=types.GenerateContentConfig(
        tools=[{"url_context": {}}],
        response_schema=Synthesis,
    ),
)

print((response.text))


```json
{
  "pages": [
    {
      "url": "https://www.foodnetwork.com/recipes/ina-garten/perfect-roast-chicken-recipe-1940592",
      "title": "Perfect Roast Chicken Recipe | Ina Garten | Food Network",
      "key_points": [
        "Recipe is by Ina Garten, adapted from 'The Barefoot Contessa Cookbook'. [1]",
        "Calls for a 5 to 6-pound roasting chicken. [1]",
        "The chicken cavity is stuffed with fresh thyme, lemon halves, and garlic. [1]",
        "The chicken is roasted at 425 degrees F for 1 1/2 hours. [1]",
        "Vegetables like onions, carrots, and fennel are roasted in the pan with the chicken. [1]"
      ]
    },
    {
      "url": "https://www.foodnetwork.com/recipes/perfect-roast-chicken-3645195",
      "title": "Perfect Roast Chicken Recipe | Food Network",
      "key_points": [
        "Recipe is by Emeril Lagasse. [2]",
        "Uses a smaller chicken, 3 1/2 to 4 pounds. [2]",
        "The chicken cavity is stuffed with lemon rinds and bay leaves, and rubb

## Advanced: Combine URL Context with Google Search

Enable both `url_context` and `google_search` so the model can find relevant sources first, then fetch the pages for deeper analysis.


In [31]:
tools = [
    {"url_context": {}},
    {"google_search": {}},
]

prompt = (
    "Find 2-3 recent articles about urban micromobility trends (e-scooters, e-bikes), "
    "then provide a 5-bullet synthesis with a short source attribution after each bullet."
)

response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt,
    config=types.GenerateContentConfig(tools=tools),
)

print(response.text)
print("\n\n")

# Inspect which URLs were fetched
metadata = getattr(response.candidates[0], "url_context_metadata", None)
if metadata and getattr(metadata, "url_metadata", None):
    for item in metadata.url_metadata:
        print(f"retrieved_url: {item.retrieved_url}")
        print(f"status: {item.url_retrieval_status}")
        print("-")


Here's a 5-bullet synthesis of recent urban micromobility trends:

*   **Growing Popularity and Market Growth:** Urban micro-commuting, driven by e-scooters and e-bikes, is rapidly gaining traction as cities seek sustainable and efficient transportation options. The global micro-mobility market was valued at USD 101.02 billion in 2023 and is projected to reach USD 303.47 billion by 2032.
*   **Solving "Last-Mile" and Congestion Challenges:** E-scooters and e-bikes are crucial for bridging the "last mile" gap between public transit and destinations, and they help reduce traffic congestion by taking up less road space than cars.
*   **Environmental and Cost Benefits:** These electric vehicles produce zero direct emissions, contributing to cleaner air and aligning with urban sustainability goals. They also offer a cost-effective alternative to car ownership or frequent ridesharing for users.
*   **Infrastructure and Regulatory Evolution:** As micromobility grows, cities are investing in i

## Notes, limits, and tips

- Provide full URLs (with protocol) that are publicly accessible.
- Limit to 20 URLs per request; each URL content up to ~34MB.
- Retrieved content counts toward input tokens (cost). Monitor `usage_metadata` if needed.
- Some URLs may be blocked by safety filters; check `url_context_metadata` for status.
- For repeat use across turns, include URLs again or cache context using other features (e.g., files or caches).
- Consider structured outputs (JSON) for downstream automation.
