##### Copyright 2025 Google LLC.

In [26]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## Gemini API: Getting started with information grounding for Gemini models

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Grounding.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

In this notebook you will learn how to use information grounding with [Gemini models](https://ai.google.dev/gemini-api/docs/models/).

Information grounding is the process of connecting these models to specific, verifiable information sources to enhance the accuracy, relevance, and factual correctness of their responses. While LLMs are trained on vast amounts of data, this knowledge can be general, outdated, or lack specific context for particular tasks or domains. Grounding helps to bridge this gap by providing the LLM with access to curated, up-to-date information.

Here you will experiment with:
- Grounding information using <a href="#search_grounding">Google Search grounding</a>
- Adding <a href="#yt_links">YouTube links</a> to gather context information to your prompt
- Using <a href="#url_context">URL context</a> to include website, pdf or image URLs as context to your prompt

## Set up the SDK and the client

### Install SDK

This guide uses the [`google-genai`](https://pypi.org/project/google-genai) Python SDK to connect to the Gemini models.

In [27]:
%pip install -q -U "google-genai>=1.16.0"

### Set up your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see the [Authentication](https://github.com/google-gemini/gemini-api-cookbook/blob/main/quickstarts/Authentication.ipynb) quickstart for an example.

In [28]:
from google.colab import userdata

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

### Select model and initialize SDK client

Select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. [thinking notebook](./Get_started_thinking.ipynb) for more details and in particular learn how to switch the thiking off).

In [29]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

MODEL_ID = "gemini-2.0-flash" # @param ["gemini-2.5-flash-lite","gemini-2.0-flash","gemini-2.5-pro"] {"allow-input":true}

## Use Google Search grounding

<a name="search_grounding"></a>

Google Search grounding is particularly useful for queries that require current information or external knowledge. Using Google Search, Gemini can access nearly real-time information and better responses.

To enable Google Search, simply add the `google_search` tool in the `generate_content`'s `config` that way:
```
    config={
      "tools": [
        {
          "google_search": {}
        }
      ]
    },
```

In [30]:
from IPython.display import HTML, Markdown

response = client.models.generate_content(
    model=MODEL_ID,
    contents='What was the latest Indian Premier League match and who won?',
    config={"tools": [{"google_search": {}}]},
)

# print the response
display(Markdown(f"**Response**:\n {response.text}"))
# print the search details
print(f"Search Query: {response.candidates[0].grounding_metadata.web_search_queries}")
# urls used for grounding
print(f"Search Pages: {', '.join([site.web.title for site in response.candidates[0].grounding_metadata.grounding_chunks])}")

display(HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content))

**Response**:
 The latest Indian Premier League (IPL) match was the final of the 2025 season, played on June 3, 2025, at the Narendra Modi Stadium in Ahmedabad. The Royal Challengers Bengaluru (RCB) won the match by 6 runs against the Punjab Kings (PBKS). This was the Royal Challengers Bengaluru's first IPL title.


Search Query: ['latest Indian Premier League match who won']
Search Pages: business-standard.com, iplt20.com, business-standard.com, wikipedia.org, thehindu.com


You can see that running the same prompt without search grounding gives you outdated information:

In [31]:
from IPython.display import Markdown

response = client.models.generate_content(
    model=MODEL_ID,
    contents='What was the latest Indian Premier League match and who won?',
)

# print the response
display(Markdown(response.text))

As an AI, I am unable to provide you with real-time information, including live IPL match results. The information changes rapidly. 

To find the latest IPL match result, I recommend checking these resources:

*   **Official IPL Website:** The official IPL website is the most reliable source.
*   **Reputable Sports News Websites/Apps:** Sites like ESPNcricinfo, and others will have up-to-the-minute scores and match reports.
*   **Sports Apps:** Many sports apps (e.g., ESPN, etc.) provide live scores and updates.

These sources will give you the accurate result of the most recent IPL match.

For more examples, please refer to the [dedicated notebook](./Search_Grounding.ipynb).

## Grounding with YouTube links

<a name="yt_links"></a>

you can directly include a public YouTube URL in your prompt. The Gemini models will then process the video content to perform tasks like summarization and answering questions about the content.

This capability leverages Gemini's multimodal understanding, allowing it to analyze and interpret video data alongside any text prompts provided.

You do need to explicitly declare the video URL you want the model to process as part of the contents of the request using a `FileData` part. Here a simple interaction where you ask the model to summarize a YouTube video:

In [32]:
yt_link = "https://www.youtube.com/watch?v=XV1kOFo1C8M"

response = client.models.generate_content(
    model=MODEL_ID,
    contents= types.Content(
        parts=[
            types.Part(text="Summarize this video."),
            types.Part(
                file_data=types.FileData(
                    file_uri=yt_link
                )
            )
        ]
    )
)

Markdown(response.text)

Ju-yeong Ji, from Google Deepmind, discusses in this video some ways Gemma Chess can be applied to the world of Chess to add a new dimension to the game.

He says that Gemma, a language model, can be used to make Chess easier to understand by taking all the technical information and turning it into plain text.
Gemma can also tell stories about Chess games. 
And Gemma can act like a 24/7 Chess study buddy or personal Chess coach and answer questions in your native language.
It is also possible to combine the analytical strengths of Chess AI with Gemma's linguistic skills for a more intuitive approach to learning and analysis.

But you can also use the link as the source of truth for your request. In this example, you will first ask how Gemma models can help on chess games:

In [33]:
yt_link = "https://www.youtube.com/watch?v=XV1kOFo1C8M"

response = client.models.generate_content(
    model=MODEL_ID,
    contents=types.Content(
        parts=[
            types.Part(text="In 2 paragraph, how Gemma models can help on chess games?"),
            types.Part(
                file_data=types.FileData(file_uri=yt_link)
            )
        ]
    )
)

Markdown(response.text)

Okay, here's a summary of how the Gemma model can help with Chess games in 2 paragraphs:

Firstly, Gemma can be used to make Chess analysis easier to understand. When traditional engines display results in numeric and abstract move sequences, Gemma can convert these results into a plain text explanation that describes strategic and tactical advantages. This could include why a move is good, dangers, or summaries of difficult portions of the game. 

Secondly, Gemma can help tell stories about Chess games. Gemma can analyze a game's moves, players, tournament and then describe the game in words that bring it to life. If you're trying to get better at Chess, Gemma could help you study your strategy, understand your ideas in real time or learn about chess defenses in different languages.

Now your answer is more insightful for the topic you want, using the knowledge shared on the video and not necessarily available on the model knowledge.

## Grounding information using URL context

<a name="url_context"></a>

The URL Context tool empowers Gemini models to directly access and process content from specific web page URLs you provide within your API requests. This is incredibly interesting because it allows your applications to dynamically interact with live web information without needing you to manually pre-process and feed that content to the model.

URL Context is effective because it allows the models to base its responses and analysis directly on the content of the designated web pages. Instead of relying solely on its general training data or broad web searches (which are also valuable grounding tools), URL Context anchors the model's understanding to the specific information present at those URLs.

### Process website URLs

If you want Gemini to specifically ground its answers thanks to the content of a specific website, just add the urls in your prompt and enable the tool by adding it to your config:
```
config = {
  "tools": [
    {
      "url_context": {}
    }
  ],
}
```

You can add up to 20 links in your prompt.

In [45]:
prompt = """
  Based on https://ai.google.dev/gemini-api/docs/models, what are the key
  differences between Gemini 1.5, Gemini 2.0 and Gemini 2.5 models?
  Create a markdown table comparing the differences.
"""

config = {
    "tools": [{"url_context": {}}],
}

response = client.models.generate_content(
    contents=[prompt],
    model=MODEL_ID,
    config=config
)

display(Markdown(response.text))

The Gemini API offers several model variants, each optimized for specific use cases and capabilities. Here's a comparison of key differences between Gemini 1.5, Gemini 2.0, and Gemini 2.5 models:

| Feature           | Gemini 1.5 Pro                                                                                                                                                                                                            | Gemini 1.5 Flash                                                                                              | Gemini 2.0 Flash                                                                                                                        | Gemini 2.5 Pro                                                                                                                           | Gemini 2.5 Flash                                                                                                          |
| :---------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :------------------------------------------------------------------------------------------------------------ | :-------------------------------------------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------ |
| **Optimization**  | Mid-size multimodal model optimized for a wide range of reasoning tasks; processes large amounts of data (2 hours video, 19 hours audio, 60,000 lines code, 2,000 pages text).                                          | Fast and versatile multimodal model for scaling across diverse tasks.                                     | Delivers next-gen features, superior speed, native tool use, and a 1M token context window.                                       | Our most powerful thinking model with maximum response accuracy and state-of-the-art performance.                                | Best in terms of price-performance, offering well-rounded capabilities. Optimized for low latency, high volume tasks that require thinking. |
| **Input(s)**      | Audio, images, video, and text.                                                                                                                                                                                         | Audio, images, video, and text.                                                                           | Audio, images, video, and text.                                                                                                     | Audio, images, video, text, and PDF.                                                                                               | Audio, images, video, and text.                                                                                       |
| **Output(s)**     | Text.                                                                                                                                                                                                                 | Text.                                                                                                     | Text.                                                                                                                               | Text.                                                                                                                                | Text.                                                                                                                 |
| **Token Limits**  | Input: 2,097,152 tokens. Output: 8,192 tokens.                                                                                                                                                                          | Input: 1,048,576 tokens. Output: 8,192 tokens.                                                              | Input: 1,048,576 tokens. Output: 8,192 tokens.                                                                                      | Input: 1,048,576 tokens. Output: 65,536 tokens.                                                                                      | Input: 1,048,576 tokens. Output: 65,536 tokens.                                                                       |
| **Capabilities**  | System instructions, JSON mode, JSON schema, adjustable safety settings, caching, tuning, function calling, code execution.                                                                                           | System instructions, JSON mode, JSON schema, adjustable safety settings, caching, tuning, function calling, code execution. | Structured outputs, caching, function calling, code execution, search. Thinking is experimental. Live API supported.              | Structured outputs, caching, function calling, code execution, search grounding, thinking. Batch Mode supported.                    | Structured outputs, caching, code execution, function calling, search grounding, thinking. Batch Mode supported.      |
| **Primary Use Case** | Complex reasoning tasks, analyzing large datasets, codebases, and documents. | Scaling across diverse tasks. | Next-gen features, speed, and real-time streaming. | Complex coding, reasoning, and multimodal understanding. | Large-scale processing, low-latency, high-volume tasks that require thinking, agentic use cases. |


### Add PDFs by URL

Gemini can also process PDFs from an URL. Here's an example:

In [35]:
prompt = """
  Can you give me an overview of the content of this pdf?
  https://abc.xyz/assets/cc/27/3ada14014efbadd7a58472f1f3f4/2025q2-alphabet-earnings-release.pdf

"""

config = {
    "tools": [{"url_context": {}}],
}

response = client.models.generate_content(
    contents=[prompt],
    model=MODEL_ID,
    config=config
)

display(Markdown(response.text.replace('$','\$')))

Alphabet Inc. announced its Q2 2025 financial results, reporting a 14% increase in consolidated revenues to \$96.4 billion. Google Services revenues grew by 12% to \$82.5 billion, driven by Google Search & other, YouTube ads, and Google subscriptions. Google Cloud revenues increased by 32% to \$13.6 billion, fueled by growth in Google Cloud Platform. The total operating income increased by 14%, with an operating margin of 32.4%. Net income rose by 19%, and EPS increased by 22% to \$2.31.


### Add images by URL

Gemini can also process images from an URL. Here's an example:

In [47]:
prompt = """
  Can you help me name of the numbered parts of that instrument, in French?
  https://upload.wikimedia.org/wikipedia/commons/thumb/4/40/Trombone.svg/960px-Trombone.svg.png

"""

config = {
    "tools": [{"url_context": {}}],
}

response = client.models.generate_content(
    contents=[prompt],
    model=MODEL_ID,
    config=config
)

display(Markdown(response.text))

I am unable to directly access the content of the image URL provided. Therefore, I cannot identify the numbered parts of the trombone and translate them into French. To assist you, I need a resource that provides the names of the trombone parts in either English or French. If you can provide a link to a webpage with this information, I can help you translate the English terms to French or identify the French terms directly.


## Mix Search grounding and URL context

The different tools can also be use in conjunction by adding both tools to the config:

In [42]:
prompt = """
  Can you give me an overview of the content of this pdf?
  https://abc.xyz/assets/cc/27/3ada14014efbadd7a58472f1f3f4/2025q2-alphabet-earnings-release.pdf
  Search on the web for the reaction of the main financial analysts, what's the trend?
"""

config = {
  "tools": [
      {"url_context": {}},
      {"google_search": {}}
  ],
}

response = client.models.generate_content(
  contents=[prompt],
  model=MODEL_ID,
  config=config
)

display(Markdown(response.text.replace('$','\$')))
display(HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content))

Okay, I have browsed the Alphabet 2025 Q2 earnings release. Here's an overview of the content:

**Key Highlights from the Earnings Release:**

*   **Revenue Growth:** Consolidated Alphabet revenues in Q2 2025 increased 14% year-over-year to \$96.4 billion. Google Services revenues increased 12% to \$82.5 billion, and Google Cloud revenues increased 32% to \$13.6 billion.
*   **Operating Income:** Total operating income increased 14%, with an operating margin of 32.4%.
*   **Net Income and EPS:** Net income increased 19%, and EPS increased 22% to \$2.31.
*   **CEO Statement:** Sundar Pichai highlighted the company's strong performance, particularly in AI, Search, YouTube, and Cloud. He also mentioned increased capital expenditures for 2025, expected to be approximately \$85 billion.
*   **Segment Results:** The release provides a breakdown of revenues and operating income for Google Services, Google Cloud, and Other Bets.
*   **Financial Tables:** The document includes consolidated balance sheets, statements of income, and statements of cash flows.
*   **Non-GAAP Measures:** The release discusses and reconciles certain non-GAAP financial measures, such as free cash flow and constant currency revenues.
*   **Forward-Looking Statements:** Standard cautionary language regarding forward-looking statements and associated risks and uncertainties is included.

To provide information on financial analysts' reactions and trends, I will conduct a web search.
Based on the search results, here's an overview of analyst reactions to Alphabet's Q2 2025 earnings:

*   **Positive Outlook:** Several analysts expected Alphabet to deliver strong Q2 2025 results, with double-digit growth in both EPS and revenue.
*   **Key Growth Drivers:** The integration of Generative AI (Gen AI) into the search engine and the robust growth of Google Cloud were expected to be key drivers of growth.
*   **Bullish Sentiment:** Some analysts remain bullish on Alphabet due to its diversified business, including Google Cloud Platform and Waymo self-driving cars. KeyBanc analyst Justin Patterson raised the price target on Alphabet's stock to \$215.00, citing strong performance in Search, YouTube, and Cloud segments, as well as positive commentary on AI Mode, Waymo, and expense efficiencies. Tickeron's AI Trading Bots also reinforce strong bullish potential for GOOG.
*   **Increased Capex:** Alphabet is increasing its 2025 capital expenditure forecast to \$85 billion to build out its AI infrastructure, which could concern some investors, despite the positive impact on results.
*   **Initial Market Reaction:** Despite the strong business performance, the initial market reaction to Alphabet's earnings report was slightly negative, with shares trading about 1% lower.
*   **Potential Risks:** Investors should be cautious about potential risks, such as Alphabet's shares underperforming the Zacks Internet Services industry and the Zacks Computer & Technology sector year-to-date, and its stretched valuation.

**Overall Trend:**

The overall trend seems to be cautiously optimistic. Analysts generally anticipated strong earnings growth driven by AI and cloud services, but some express concerns regarding increased capital expenditure and valuation. The initial market reaction was slightly negative, possibly due to the capex increase.


## Next steps

<a name="next_steps"></a>

* For more details about using Google Search grounding, check out the [Search Grounding cookbook](./Search_Grounding.ipynb).
* If you are looking for another scenarios using videos, take a look at the [Video understanding cookbook](./Video_understanding.ipynb).

Also check the other Gemini capabilities that you can find in the [Gemini quickstarts](https://github.com/google-gemini/cookbook/tree/main/quickstarts/).