##### Copyright 2025 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## Gemini API: Getting started with information grounding for Gemini models

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Grounding.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

In this notebook you will learn how to use information grounding with [Gemini models](https://ai.google.dev/gemini-api/docs/models/).

Information grounding is the process of connecting these models to specific, verifiable information sources to enhance the accuracy, relevance, and factual correctness of their responses. While LLMs are trained on vast amounts of data, this knowledge can be general, outdated, or lack specific context for particular tasks or domains. Grounding helps to bridge this gap by providing the LLM with access to curated, up-to-date information.

Here you will experiment with:
- Grounding information using <a href="#search_grounding">Google Search grounding</a>
- Adding <a href="#yt_links">YouTube links</a> to gather context information to your prompt
- Using <a href="#url_context">URL context</a> to include website, pdf or image URLs as context to your prompt

## Set up the SDK and the client

### Install SDK

This guide uses the [`google-genai`](https://pypi.org/project/google-genai) Python SDK to connect to the Gemini models.

In [None]:
%pip install -q -U "google-genai>=1.16.0"

### Set up your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see the [Authentication](https://github.com/google-gemini/gemini-api-cookbook/blob/main/quickstarts/Authentication.ipynb) quickstart for an example.

In [None]:
from google.colab import userdata

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

### Select model and initialize SDK client

Select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. [thinking notebook](./Get_started_thinking.ipynb) for more details and in particular learn how to switch the thiking off).

In [None]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

MODEL_ID = "gemini-2.5-flash" # @param ["gemini-2.5-flash-lite", "gemini-2.5-flash-lite-preview-09-2025", "gemini-2.5-flash", "gemini-2.5-flash-preview-09-2025", "gemini-2.5-pro"] {"allow-input":true, isTemplate: true}

## Use Google Search grounding

<a name="search_grounding"></a>

Google Search grounding is particularly useful for queries that require current information or external knowledge. Using Google Search, Gemini can access nearly real-time information and better responses.

To enable Google Search, simply add the `google_search` tool in the `generate_content`'s `config` that way:
```
    config={
      "tools": [
        {
          "google_search": {}
        }
      ]
    },
```

In [None]:
from IPython.display import HTML, Markdown

response = client.models.generate_content(
    model=MODEL_ID,
    contents='What was the latest Indian Premier League match and who won?',
    config={"tools": [{"google_search": {}}]},
)

# print the response
display(Markdown(f"**Response**:\n {response.text}"))
# print the search details
print(f"Search Query: {response.candidates[0].grounding_metadata.web_search_queries}")
# urls used for grounding
print(f"Search Pages: {', '.join([site.web.title for site in response.candidates[0].grounding_metadata.grounding_chunks])}")

display(HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content))

**Response**:
 The latest Indian Premier League (IPL) match was the final of the IPL 2025 season, which took place on June 3, 2025. In this match, Royal Challengers Bengaluru defeated Punjab Kings by 6 runs to win their maiden title.

Search Query: ['latest Indian Premier League match and winner', 'when did IPL 2025 finish', 'IPL 2024 final match and winner']
Search Pages: olympics.com, wikipedia.org, thehindu.com, olympics.com, skysports.com, wikipedia.org, thehindu.com


You can see that running the same prompt without search grounding gives you outdated information:

In [None]:
from IPython.display import Markdown

response = client.models.generate_content(
    model=MODEL_ID,
    contents='What was the latest Indian Premier League match and who won?',
)

# print the response
display(Markdown(response.text))

The latest Indian Premier League (IPL) match was the **Final of the IPL 2024 season**.

*   **Match:** Kolkata Knight Riders (KKR) vs. Sunrisers Hyderabad (SRH)
*   **Date:** May 26, 2024
*   **Winner:** **Kolkata Knight Riders (KKR)** won by 8 wickets.

For more examples, please refer to the [dedicated notebook](./Search_Grounding.ipynb).

## Grounding with YouTube links

<a name="yt_links"></a>

you can directly include a public YouTube URL in your prompt. The Gemini models will then process the video content to perform tasks like summarization and answering questions about the content.

This capability leverages Gemini's multimodal understanding, allowing it to analyze and interpret video data alongside any text prompts provided.

You do need to explicitly declare the video URL you want the model to process as part of the contents of the request using a `FileData` part. Here a simple interaction where you ask the model to summarize a YouTube video:

In [None]:
yt_link = "https://www.youtube.com/watch?v=XV1kOFo1C8M"

response = client.models.generate_content(
    model=MODEL_ID,
    contents= types.Content(
        parts=[
            types.Part(text="Summarize this video."),
            types.Part(
                file_data=types.FileData(
                    file_uri=yt_link
                )
            )
        ]
    )
)

Markdown(response.text)

This video introduces "Gemma Chess," demonstrating how Google's large language model, Gemma, can enhance the game of chess by leveraging its linguistic abilities.

The speaker, Ju-yeong Ji from Google DeepMind, explains that Gemma isn't intended to replace powerful chess engines that excel at calculating moves. Instead, it aims to bring a "new dimension" to chess through understanding and creating text.

The video highlights three key applications:

1.  **Explainer:** Gemma can analyze chess games (e.g., Kasparov vs. Deep Blue) and explain the "most interesting" or strategically significant moves in plain language, detailing their impact, tactical considerations, and psychological aspects, making complex analyses more understandable.
2.  **Storytellers:** Gemma can generate narrative stories about chess games, transforming raw move data into engaging accounts that capture the tension, emotions, and key moments of a match, bringing the game to life beyond just the moves.
3.  **Supporting Chess Learning:** Gemma can act as a personalized chess tutor, explaining concepts like specific openings (e.g., Sicilian Defense) or tactics in an accessible way, even adapting to the user's language and skill level, effectively serving as an always-available, intelligent chess encyclopedia and coach.

By combining the computational strength of traditional chess AI with Gemma's advanced language capabilities, this approach offers a more intuitive and human-friendly way to learn, analyze, and engage with chess.

But you can also use the link as the source of truth for your request. In this example, you will first ask how Gemma models can help on chess games:

In [None]:
yt_link = "https://www.youtube.com/watch?v=XV1kOFo1C8M"

response = client.models.generate_content(
    model=MODEL_ID,
    contents=types.Content(
        parts=[
            types.Part(text="In 2 paragraph, how Gemma models can help on chess games?"),
            types.Part(
                file_data=types.FileData(file_uri=yt_link)
            )
        ]
    )
)

Markdown(response.text)

Gemma models, as large language models (LLMs), can significantly enhance the chess experience by bridging the gap between raw computational power and human understanding. Unlike traditional chess engines that excel at brute-force calculation and generating optimal moves (often in cryptic notation or complex numerical evaluations), Gemma's strength lies in processing and generating human-like text. This allows it to translate intricate chess engine outputs into intuitive, prose-based explanations, elucidating the strategic and tactical rationale behind moves, clarifying complex game concepts like openings and endgames, and providing accessible insights for players of all skill levels, significantly enhancing understanding beyond mere data.

Furthermore, Gemma can serve as an invaluable tool for personalized chess learning and engagement. It can act as a dynamic, interactive coach, offering tailored explanations of specific positions, identifying weaknesses in a player's understanding, or even detailing the historical and psychological context of famous matches. By summarizing complex game analyses, highlighting pivotal moments, and even crafting narrative descriptions of entire games, Gemma can make chess more approachable, immersive, and educational, transforming how players learn, analyze, and appreciate the strategic depth of the game.

Now your answer is more insightful for the topic you want, using the knowledge shared on the video and not necessarily available on the model knowledge.

## Grounding information using URL context

<a name="url_context"></a>

The URL Context tool empowers Gemini models to directly access and process content from specific web page URLs you provide within your API requests. This is incredibly interesting because it allows your applications to dynamically interact with live web information without needing you to manually pre-process and feed that content to the model.

URL Context is effective because it allows the models to base its responses and analysis directly on the content of the designated web pages. Instead of relying solely on its general training data or broad web searches (which are also valuable grounding tools), URL Context anchors the model's understanding to the specific information present at those URLs.

### Process website URLs

If you want Gemini to specifically ground its answers thanks to the content of a specific website, just add the urls in your prompt and enable the tool by adding it to your config:
```
config = {
  "tools": [
    {
      "url_context": {}
    }
  ],
}
```

You can add up to 20 links in your prompt.

In [None]:
prompt = """
  Based on https://ai.google.dev/gemini-api/docs/models, what are the key
  differences between Gemini 1.5, Gemini 2.0 and Gemini 2.5 models?
  Create a markdown table comparing the differences.
"""

config = {
    "tools": [{"url_context": {}}],
}

response = client.models.generate_content(
    contents=[prompt],
    model=MODEL_ID,
    config=config
)

display(Markdown(response.text))

The provided document details various Gemini model variants, including Gemini 1.5, Gemini 2.0, and Gemini 2.5, each with different "Flash," "Pro," and "Lite" versions optimized for specific use cases.

Here's a comparison of the key differences:

| Feature           | Gemini 1.5 Pro                                  | Gemini 1.5 Flash                                  | Gemini 2.0 Flash                                     | Gemini 2.5 Pro                                                                  | Gemini 2.5 Flash                                                            | Gemini 2.5 Flash-Lite                                                     |
| :---------------- | :---------------------------------------------- | :------------------------------------------------ | :--------------------------------------------------- | :------------------------------------------------------------------------------ | :-------------------------------------------------------------------------- | :------------------------------------------------------------------------ |
| **Description**   | Mid-size multimodal model, optimized for reasoning tasks, can process large amounts of data. | Fast and versatile multimodal model for diverse tasks. | Next-gen features, improved capabilities, superior speed, and native tool use. | Most powerful thinking model, maximum accuracy, state-of-the-art performance. | Best model in terms of price-performance, well-rounded capabilities. | Optimized for cost-efficiency and high throughput. |
| **Input(s)**      | Audio, images, video, text.                 | Audio, images, video, text.                   | Audio, images, video, text.                      | Audio, images, video, text, and PDF.                                        | Audio, images, video, and text.                                       | Text, image, video, audio.                                            |
| **Output(s)**     | Text.                                       | Text.                                         | Text.                                            | Text.                                                                       | Text.                                                                   | Text.                                                                 |
| **Input Token Limit** | 2,097,152.                                  | 1,048,576.                                    | 1,048,576.                                       | 1,048,576.                                                                  | 1,048,576.                                                              | 1,048,576.                                                            |
| **Output Token Limit** | 8,192.                                      | 8,192.                                        | 8,192.                                           | 65,536.                                                                     | 65,536.                                                                 | 65,536.                                                               |
| **Key Use Cases** | Complex reasoning tasks.                    | Scaling across diverse tasks.                 | Next generation features, speed, realtime streaming. | Complex coding, reasoning, multimodal understanding, analyzing large data.  | Low latency, high volume tasks that require thinking.                   | Real time, low latency use cases.                                     |
| **Thinking**      | Not explicitly mentioned as a core capability, but optimized for reasoning tasks. | Not explicitly mentioned.                     | Experimental.                                    | Supported (default on).                                                     | Supported (default on, can configure thinking budget).                  | Supported.                                                            |
| **Live API**      | Not supported.                              | Not supported.                                | Supported.                                       | Not supported.                                                              | Not explicitly mentioned for the base Flash model, but Live variants exist. | Not supported.                                                        |
| **Knowledge Cutoff** | September 2024.                             | September 2024.                               | August 2024.                                     | January 2025.                                                               | January 2025.                                                           | January 2025.                                                         |
| **Deprecation** | September 2025.                             | September 2025.                               | Not deprecated.                                  | Not deprecated.                                                             | Not deprecated.                                                         | Not deprecated.                                                       |


You can see the status of the retrival using `url_context_metadata`:

In [None]:
# get URLs retrieved for context
print(response.candidates[0].url_context_metadata)


url_metadata=[UrlMetadata(
  retrieved_url='https://ai.google.dev/gemini-api/docs/models',
  url_retrieval_status=<UrlRetrievalStatus.URL_RETRIEVAL_STATUS_SUCCESS: 'URL_RETRIEVAL_STATUS_SUCCESS'>
)]


### Add PDFs by URL

Gemini can also process PDFs from an URL. Here's an example:

In [None]:
prompt = """
  Can you give me an overview of the content of this pdf?
  https://abc.xyz/assets/cc/27/3ada14014efbadd7a58472f1f3f4/2025q2-alphabet-earnings-release.pdf

"""

config = {
    "tools": [{"url_context": {}}],
}

response = client.models.generate_content(
    contents=[prompt],
    model=MODEL_ID,
    config=config
)

display(Markdown(response.text.replace('$','\$')))



The PDF is Alphabet Inc.'s Second Quarter 2025 Earnings Release. It details the company's financial performance for the quarter ended June 30, 2025.

Key highlights include:
*   **Total Revenues:** Consolidated Alphabet revenues increased 14% year-over-year to \$96.4 billion.
*   **Google Services:** Revenues grew 12% to \$82.5 billion, driven by strong performance in Google Search & other, Google subscriptions, platforms, devices, and YouTube ads.
*   **Google Cloud:** Revenues increased 32% to \$13.6 billion, with growth in Google Cloud Platform (GCP) across core GCP products, AI Infrastructure, and Generative AI Solutions. Google Cloud's annual revenue run-rate is now over \$50 billion.
*   **Operating Income:** Total operating income rose 14%, and the operating margin was 32.4%.
*   **Net Income and EPS:** Net income increased 19%, and diluted EPS grew 22% to \$2.31.
*   **AI Impact:** CEO Sundar Pichai highlighted that AI is positively impacting every part of the business, driving strong momentum, with new features like AI Overviews and AI Mode performing well in Search.
*   **Capital Expenditures:** Alphabet plans to increase capital expenditures to approximately \$85 billion in 2025 due to strong demand for Cloud products and services.
*   **Issuance of Senior Unsecured Notes:** In May 2025, Alphabet issued \$12.5 billion in fixed-rate senior unsecured notes.

The document also provides detailed financial tables, including consolidated balance sheets, statements of income, and statements of cash flows, as well as segment results for Google Services, Google Cloud, and Other Bets. It also includes reconciliations of GAAP to non-GAAP financial measures.

### Add images by URL

Gemini can also process images from an URL. Here's an example:

In [None]:
prompt = """
  Can you help me name of the numbered parts of that instrument, in French?
  https://upload.wikimedia.org/wikipedia/commons/thumb/4/40/Trombone.svg/960px-Trombone.svg.png

"""

config = {
    "tools": [{"url_context": {}}],
}

response = client.models.generate_content(
    contents=[prompt],
    model=MODEL_ID,
    config=config
)

display(Markdown(response.text))

I cannot directly interpret the numbered parts within the image you provided. However, I can give you the common names of trombone parts in French, which you can then match to the numbers on your image:

Here are some common parts of a trombone in French:
*   **Embouchure** (Mouthpiece)
*   **Pavillon** (Bell)
*   **Coulisse** (Slide)
*   **Coulisse d'accord** or **Pompe d'accord** (Tuning slide)
*   **Clé d'eau** or **Barillet** (Water key or spit valve)
*   **Entretoise** (Brace/Cross-stay, often used for various connecting rods)
*   **Manchon** (Ferrule/Sleeve, connecting parts)

Please match these names to the numbered parts in your image.

## Mix Search grounding and URL context

The different tools can also be use in conjunction by adding them both to the config. It's a good way to steer Gemini in the right direction and then let it do its magic using search grounding.

In [None]:
prompt = """
  Can you give me an overview of the content of this pdf?
  https://abc.xyz/assets/cc/27/3ada14014efbadd7a58472f1f3f4/2025q2-alphabet-earnings-release.pdf
  Search on the web for the reaction of the main financial analysts, what's the trend?
"""

config = {
  "tools": [
      {"url_context": {}},
      {"google_search": {}}
  ],
}

response = client.models.generate_content(
  contents=[prompt],
  model=MODEL_ID,
  config=config
)

display(Markdown(response.text.replace('$','\$')))
display(HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content))

Alphabet Inc. announced strong financial results for the second quarter of 2025, ending June 30, 2025. Consolidated revenues increased by 14% year-over-year to \$96.4 billion, or 13% in constant currency, with double-digit growth seen across Google Search & other, YouTube ads, Google subscriptions, platforms, and devices, and Google Cloud.

Key financial highlights include:
*   **Total revenues** of \$96.428 billion, up from \$84.742 billion in Q2 2024.
*   **Net income** increased by 19% to \$28.196 billion.
*   **Diluted EPS** rose by 22% to \$2.31.
*   **Operating income** increased by 14% to \$31.271 billion, with an **operating margin** of 32.4%.
*   **Google Services revenues** increased by 12% to \$82.5 billion.
*   **Google Cloud revenues** significantly increased by 32% to \$13.6 billion, driven by growth in Google Cloud Platform (GCP), AI Infrastructure, and Generative AI Solutions. Its annual revenue run-rate now exceeds \$50 billion.
*   The company announced an increase in **capital expenditures** to approximately \$85 billion for 2025 due to strong demand for Cloud products and services.

Sundar Pichai, CEO of Alphabet, highlighted the company's "standout quarter" with robust growth, attributing success to leadership in AI and rapid shipping. He noted the positive impact of AI across the business, strong momentum in Search (including AI Overviews and AI Mode), and continued strong performance in YouTube and subscriptions.

**Financial Analyst Reactions and Trends:**

Financial analysts generally maintain a positive outlook on Alphabet, with a majority (43 out of 55) recommending "buy" or "strong buy" ratings. However, the average target price has seen a slight decline from approximately \$215 in March to \$202.05, indicating increased uncertainty. Despite this, the current consensus suggests a potential upside of 11% from recent trading levels as of mid-July 2025.

Prior to the earnings release, analysts expected a moderation in growth for Q2 2025, with projected revenue of \$93.8 billion (+10.7% YoY) and net income of \$26.5 billion (+12.2% YoY). Alphabet's actual Q2 2025 results surpassed these expectations, with EPS of \$2.31 beating the forecasted \$2.17 and revenue of \$96.43 billion exceeding projections.

Despite the positive earnings beat, Alphabet's stock experienced a modest increase of 0.57% in after-hours trading, and in some instances, a slight decline (around 1.5%) post-announcement. This dip was primarily attributed to the company's decision to raise its 2025 capital expenditures guidance by \$10 billion to \$85 billion, reflecting increased investments in AI and technology infrastructure, which raised some investor concerns about higher costs.

The key areas of focus for analysts include Alphabet's ability to expand its GenAI model's user base without impacting traditional Search revenue, and the continued growth of Google Cloud, which is seen as the company's largest growth opportunity. Analysts are closely monitoring AI developments, cloud growth, and regulatory challenges. The overall trend appears to be one of cautious optimism, with strong underlying business performance balanced by increased investment needs for future growth in AI and cloud services.

## Next steps

<a name="next_steps"></a>

* For more details about using Google Search grounding, check out the [Search Grounding cookbook](./Search_Grounding.ipynb).
* If you are looking for another scenarios using videos, take a look at the [Video understanding cookbook](./Video_understanding.ipynb).

Also check the other Gemini capabilities that you can find in the [Gemini quickstarts](https://github.com/google-gemini/cookbook/tree/main/quickstarts/).