##### Copyright 2025 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Video understanding with Gemini

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Video_understanding.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini models](https://ai.google.dev/gemini-api/docs/models/) bring video analysis to a whole new level as illustrated in [this video](https://www.youtube.com/watch?v=Mot-JEU26GQ):


In [None]:
#@title Building with Gemini 2.0: Video understanding
%%html
<iframe width="560" height="315" src="https://www.youtube.com/embed/Mot-JEU26GQ?si=pcb7-_MZTSi_1Zkw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

This notebook will show you how to easily use Gemini to perform the same kind of video analysis. Each of them has different prompts that you can select using the dropdown, also feel free to experiment with your own.

You can also check the [live demo](https://aistudio.google.com/starter-apps/video) and try it on your own videos on [AI Studio](https://aistudio.google.com/starter-apps/video).

## Setup

This section install the SDK, set it up using your [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.

Expand the section if you are curious, but you can also just run it (it should take a couple of minutes since there are large files) and go straight to the examples.

### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.

More details about this new SDK on the [documentation](https://ai.google.dev/gemini-api/docs/sdks) or in the [Getting started](../quickstarts/Get_started.ipynb) notebook.

In [1]:
%pip install -U -q "google-genai>=1.16.0"

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/200.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.0/200.0 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25h

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](../quickstarts/Authentication.ipynb) for an example.

In [2]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK you now only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [3]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best with Gemini 2.5 models. You can also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.

For more information about all Gemini models, check the [documentation](https://ai.google.dev/gemini-api/docs/models/gemini) for extended information on each of them.


In [4]:
MODEL_ID = "gemini-2.5-flash-preview-05-20" # @param ["gemini-2.5-flash-preview-05-20", "gemini-2.5-pro-preview-06-05","gemini-2.0-flash","gemini-2.0-flash-lite"] {"allow-input":true, isTemplate: true}

### Get sample videos

You will start with uploaded videos, as it's a more common use-case, but you will also see later that you can also use Youtube videos.

In [5]:
# Load sample images
!wget https://storage.googleapis.com/generativeai-downloads/videos/Pottery.mp4 -O Pottery.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/Jukin_Trailcam_Videounderstanding.mp4 -O Trailcam.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/post_its.mp4 -O Post_its.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/user_study.mp4 -O User_study.mp4 -q

### Upload the videos

Upload all the videos using the File API. You can find modre details about how to use it in the [Get Started](../quickstarts/Get_started.ipynb#scrollTo=KdUjkIQP-G_i) notebook.

This can take a couple of minutes as the videos will need to be processed and tokenized.

In [6]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'Video processing complete: ' + video_file.uri)

  return video_file

pottery_video = upload_video('Pottery.mp4')
trailcam_video = upload_video('Trailcam.mp4')
post_its_video = upload_video('Post_its.mp4')
user_study_video = upload_video('User_study.mp4')

Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/0w87yxp3d257
Waiting for video to be processed.
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/qxfsdb4uy9jv
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/zbay420108wn
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/65xctq6o81eh


### Imports

In [7]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

# Search within videos

First, try using the model to search within your videos and describe all the animal sightings in the trailcam video.

<video controls width="500"><source src="https://storage.googleapis.com/generativeai-downloads/videos/Jukin_Trailcam_Videounderstanding.mp4" type="video/mp4"></video>

In [8]:
prompt = "For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video."  # @param ["For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video.", "Organize all scenes from this video in a table, along with timecode, a short description, a list of objects visible in the scene (with representative emojis) and an estimation of the level of excitement on a scale of 1 to 10"] {"allow-input":true}

video = trailcam_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

```json
[
  {
    "time": "00:00 - 00:17",
    "caption": "Two gray foxes in the wild, foraging. One comes into view from the right, followed by another. They are sniffing the ground, and one climbs onto a rock."
  },
  {
    "time": "00:17 - 00:34",
    "caption": "A mountain lion is seen sniffing the ground in a forest, then briefly looking up and walking off. (Night vision)"
  },
  {
    "time": "00:34 - 00:50",
    "caption": "Two foxes are captured at night. One digs in the ground, and then they engage in a brief, aggressive interaction before running out of frame. (Night vision with IR flash)"
  },
  {
    "time": "00:50 - 01:04",
    "caption": "A bright flash occurs, followed by two foxes in a rocky area at night. They move around, one looks at the camera, and then another bright flash illuminates the scene. (Night vision with IR flash)"
  },
  {
    "time": "01:04 - 01:17",
    "caption": "A mountain lion walks from right to left across the frame in the dark. (Night vision)"
  },
  {
    "time": "01:17 - 01:29",
    "caption": "Two mountain lions are seen at night. The larger one walks past the camera in the foreground, while a smaller one (possibly a cub) walks on top of a rock in the background. (Night vision)"
  },
  {
    "time": "01:29 - 01:51",
    "caption": "A bobcat is seen at night, foraging on the ground, then digging a hole, and looking directly at the camera with glowing eyes. (Night vision)"
  },
  {
    "time": "01:51 - 01:56",
    "caption": "A brown bear walks away from the camera through a sun-dappled forest. (Daylight)"
  },
  {
    "time": "01:56 - 02:04",
    "caption": "A mountain lion walks into the frame from the left, looks at the camera, and then walks out of frame to the right. (Night vision)"
  },
  {
    "time": "02:04 - 02:22",
    "caption": "Two bears, possibly a mother and cub, are walking through the forest. One briefly obstructs the camera's view before they both move off into the distance. (Daylight)"
  },
  {
    "time": "02:22 - 02:34",
    "caption": "A fox is seen at night on a hill overlooking a city with twinkling lights. It sniffs the ground and then sits up to look out over the city. (Night vision)"
  },
  {
    "time": "02:34 - 02:41",
    "caption": "A bear walks past the camera at night, with a city lights landscape visible in the background. (Night vision)"
  },
  {
    "time": "02:41 - 02:51",
    "caption": "A mountain lion walks past the camera at night, with the illuminated city in the distance. (Night vision)"
  },
  {
    "time": "02:51 - 03:04",
    "caption": "A mountain lion walks towards a tree and then sniffs around on the ground. (Night vision)"
  },
  {
    "time": "03:04 - 03:22",
    "caption": "A brown bear stands in the forest, looks around, then directly at the camera, before walking off. (Daylight)"
  },
  {
    "time": "03:22 - 03:40",
    "caption": "Two brown bears are seen foraging on the ground in the forest. One bear briefly obstructs the camera's view as it moves closer. (Daylight)"
  },
  {
    "time": "03:40 - 04:03",
    "caption": "Two brown bears walk away from the camera. One sits down and scratches itself, then they both continue walking into the distance. (Daylight)"
  },
  {
    "time": "04:03 - 04:22",
    "caption": "Two brown bears walk towards the camera. One walks past, while the other remains in view, sniffing the ground. (Daylight)"
  },
  {
    "time": "04:22 - 04:30",
    "caption": "A bobcat with bright, glowing eyes looks at the camera, then walks past and out of frame. (Night vision)"
  },
  {
    "time": "04:30 - 04:49",
    "caption": "A fox appears in the distance with glowing eyes, walks closer to the camera, and then suddenly dashes out of frame. (Night vision)"
  },
  {
    "time": "04:49 - 04:57",
    "caption": "A fox is seen walking away from the camera into the dark forest. (Night vision)"
  },
  {
    "time": "04:57 - 05:10",
    "caption": "A mountain lion walks towards a tree, sniffs the ground, and then walks past the camera. (Night vision)"
  }
]
```

The prompt used is quite a generic one, but you can get even better results if you cutomize it to your needs (like asking specifically for foxes).

The [live demo on AI Studio](https://aistudio.google.com/starter-apps/video) shows how you can postprocess this output to jump directly to the the specific part of the video by clicking on the timecodes. If you are interested, you can check the [code of that demo on Github](https://github.com/google-gemini/starter-applets/tree/main/video).

# Extract and organize text

Gemini models can also read what's in the video and extract it in an organized way. You can even use Gemini reasoning capabilities to generate new ideas for you.

<video controls width="400"><source src="https://storage.googleapis.com/generativeai-downloads/videos/post_its.mp4" type="video/mp4"></video>

In [9]:
prompt = "Transcribe the sticky notes, organize them and put it in a table. Can you come up with a few more ideas?" # @param ["Transcribe the sticky notes, organize them and put it in a table. Can you come up with a few more ideas?", "Which of those names who fit an AI product that can resolve complex questions using its thinking abilities?"] {"allow-input":true}

video = post_its_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

Here are the transcribed project names from the sticky notes, organized alphabetically in a table, along with a few more ideas:

## Brainstorm: Project Names

| Project Name         | Project Name         |
| :------------------- | :------------------- |
| Aether               | Leo Minor            |
| Andromeda's Reach    | Lunar Eclipse        |
| Astral Forge         | Lyra                 |
| Athena               | Lynx                 |
| Athena's Eye         | Medusa               |
| Bayes Theorem        | Odin                 |
| Canis Major          | Orion's Belt         |
| Celestial Drift      | Orion's Sword        |
| Centaurus            | Pandora's Box        |
| Cerberus             | Persius Shield       |
| Chaos Field          | Phoenix              |
| Chaos Theory         | Prometheus Rising    |
| Chimera Dream        | Riemann's Hypothesis |
| Comets Tail          | Sagitta              |
| Convergence          | Serpens              |
| Delphinus            | Stellar Nexus        |
| Draco                | Stokes Theorem       |
| Echo                 | Supernova Echo       |
| Equilibrium          | Symmetry             |
| Euler's Path         | Taylor Series        |
| Fractal              | Titan                |
| Galactic Core        | Vector               |
| Golden Ratio         | Zephyr               |
| Hera                 |                      |
| Infinity Loop        |                      |

---

## A Few More Project Name Ideas:

1.  **Pulsar:** (Astronomical, suggests powerful and rhythmic energy)
2.  **Axiom:** (Mathematical/logical, implies a fundamental truth or starting point)
3.  **Artemis:** (Mythological, associated with precision, exploration, and the moon)
4.  **Quantum Leap:** (Scientific, indicates a significant and sudden advancement)
5.  **Vortex:** (Implies a central point of activity, energy, or convergence)

# Structure information

Gemini 2.0 is not only able to read text but also to reason and structure about real world objects. Like in this video about a display of ceramics with handwritten prices and notes.

<video controls width="500"><source src="https://storage.googleapis.com/generativeai-downloads/videos/Pottery.mp4" type="video/mp4"></video>

In [10]:
prompt = "Give me a table of my items and notes" # @param ["Give me a table of my items and notes", "Help me come up with a selling pitch for my potteries"] {"allow-input":true}

video = pottery_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        video,
        prompt,
    ],
    config = types.GenerateContentConfig(
        system_instruction="Don't forget to escape the dollar signs",
    )
)

Markdown(response.text)

Here's a table summarizing the items and notes from the image:

| Category          | Item                | Description                                                                                                                                                                                                                             | Dimensions                     | Price   | Additional Notes                  |
| :---------------- | :------------------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------- | :------ | :-------------------------------- |
| Drinkware         | Tumblers            | Stacked and individual tumblers with an earthy brown/beige base and a light blue/white wavy glaze towards the top. Two small, round ceramic samples displaying the base and blue/grey glaze are shown next to them. | 4"h x 3"d (approx.)            | \$20    | \#5 Artichoke double dip          |
| Bowls             | Small Bowls         | Two bowls with a speckled, rustic brown/orange exterior and a darker, possibly iridescent, interior with hints of blue/green.                                                                                                        | 3.5"h x 6.5"d                  | \$35    |                                   |
| Bowls             | Medium Bowls        | Two larger bowls, similar in appearance to the small bowls with a speckled, rustic brown/orange exterior and a darker, iridescent interior with hints of blue/green.                                                               | 4"h x 7"d                      | \$40    |                                   |
| Glaze Sample/Test | Gemini Double Dip   | A rectangular ceramic tile with "6b6" inscribed, displaying a brown/rust speckled glaze on one side and a blue/grey glaze on the other.                                                                                              | N/A (sample tile)              | N/A     | \#6 Gemini double dip, Slow Cool |

As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze screen recordings for key moments

You can also use the model to analyze screen recordings. Let's say you're doing user studies on how people use your product, so you end up with lots of screen recordings, like this one, that you have to manually comb through.
With just one prompt, the model can describe all the actions in your video.

<video controls width="400"><source src="https://storage.googleapis.com/generativeai-downloads/videos/user_study.mp4" type="video/mp4"></video>

In [11]:
prompt = "Generate a paragraph that summarizes this video. Keep it to 3 to 5 sentences with corresponding timecodes." # @param ["Generate a paragraph that summarizes this video. Keep it to 3 to 5 sentences with corresponding timecodes.", "Choose 5 key shots from this video and put them in a table with the timecode, text description of 10 words or less, and a list of objects visible in the scene (with representative emojis).", "Generate bullet points for the video. Place each bullet point into an object with the timecode of the bullet point in the video."] {"allow-input":true}

video = user_study_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

The "My Garden App" is presented, showcasing a list of various plants with their prices, alongside "Like" and "Add to Cart" buttons. (0:00-0:09) The user demonstrates interacting with the app by liking several plants, which changes the "Like" button to red, and adding multiple items to the shopping cart, confirmed by an "Added!" message. (0:09-0:25, 0:41-0:45)
After adding items, the user navigates to the "Cart" tab to view the selected plants and their total cost. (0:30-0:33) Finally, the "Profile" tab provides a summary of the user's activity, displaying the number of liked plants and items in the cart. (0:33-0:35)

# Analyze youtube videos

On top of using your own videos you can also ask Gemini to get a video from Youtube and analyze it. He's an example using the keynote from Google IO 2023. Guess what the main theme was?


In [12]:
response = client.models.generate_content(
    model=MODEL_ID,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Sundar says \"AI\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=ixRanV-rdAQ')
            )
        ]
    )
)

Markdown(response.text)

Sundar says "AI" at the following instances:

1.  **0:29 - 0:32**: "As you may have heard, AI is having a very busy year."
    *   **Broader context**: Sundar is beginning his keynote address, noting the significant activity in the field of AI, indicating it will be a major topic of discussion.
2.  **0:39 - 0:40**: "Seven years into our journey as an AI-first company."
    *   **Broader context**: He emphasizes Google's long-standing commitment to AI, framing their current advancements within this established focus.
3.  **0:46 - 0:47**: "We have an opportunity to make AI even more helpful..."
    *   **Broader context**: He highlights Google's vision for AI to be universally beneficial for individuals, businesses, and communities.
4.  **0:54 - 0:56**: "We've been applying AI to make our products radically more helpful for a while."
    *   **Broader context**: He states that AI has already been integrated into Google's products to enhance their utility.
5.  **0:59 - 1:00**: "With generative AI, we are taking the next step."
    *   **Broader context**: He introduces generative AI as the next phase of development for Google's products, starting with Gmail.
6.  **1:16 - 1:19**: "Let me start with few examples of how generative AI is helping to evolve our products, starting with Gmail."
    *   **Broader context**: He transitions to practical examples of how generative AI is being used in Google's core products.
7.  **1:40 - 1:42**: "Smart Compose led to more advanced writing features powered by AI."
    *   **Broader context**: Discussing the evolution of Gmail's smart features, he attributes their advanced capabilities to AI.
8.  **3:03 - 3:05**: "Since the early days of Street View, AI has stitched together billions of panoramic images..."
    *   **Broader context**: He explains how AI has been instrumental in creating immersive experiences in Google Maps' Street View.
9.  **3:14 - 3:16**: "Immersive View, which uses AI to create a high fidelity representation of a place..."
    *   **Broader context**: He describes Immersive View's technology, crediting AI for generating realistic 3D representations.
10. **5:15 - 5:17**: "It was one of our first AI-native products."
    *   **Broader context**: Referring to Google Photos, he emphasizes its foundational reliance on AI since its inception.
11. **5:30 - 5:32**: "We also want to help you make them better. In fact, every month, 1.7 billion images are edited in Google Photos."
    *   **Broader context**: Sundar explains how Google Photos leverages AI to enable advanced photo editing features like Magic Eraser.
12. **5:41 - 5:43**: "AI advancements give us more powerful ways to do this."
    *   **Broader context**: He highlights that recent AI breakthroughs are enhancing photo editing capabilities within Google Photos.
13. **5:48 - 5:50**: "Magic Eraser, launched first on Pixel, uses AI-powered computational photography to remove unwanted distractions."
    *   **Broader context**: He specifies Magic Eraser's use of AI for computational photography.
14. **7:40 - 7:44**: "These are just a few examples of how AI can help you in moments that matter."
    *   **Broader context**: He summarizes the various product examples (Gmail, Photos, Maps) as demonstrations of AI's helpfulness.
15. **7:47 - 7:49**: "And there is so much more we can do to deliver the full potential of AI..."
    *   **Broader context**: He expresses optimism about the future potential of AI across Google's product ecosystem.
16. **8:24 - 8:26**: "Making AI helpful for everyone is the most profound way we will advance our mission."
    *   **Broader context**: He articulates Google's overarching mission: to make AI accessible and beneficial to all.
17. **8:31 - 8:33**: "And we are doing this in four important ways. First, by improving your knowledge and learning..."
    *   **Broader context**: He begins to outline Google's four key strategies for deploying AI helpfully and responsibly.
18. **8:53 - 8:57**: "And finally, by building and deploying AI responsibly so that everyone can benefit equally."
    *   **Broader context**: He concludes the four strategic pillars with the emphasis on responsible AI development and deployment.
19. **9:03 - 9:08**: "Our ability to make AI helpful for everyone relies on continuously advancing our foundation models."
    *   **Broader context**: He reiterates the goal of making AI helpful for everyone, linking it directly to the development of foundation models.
20. **11:26 - 11:31**: "It uses AI to better detect malicious scripts and can help security experts understand and resolve threats."
    *   **Broader context**: He discusses Sec-PaLM, an AI model fine-tuned for security, demonstrating AI's application in cybersecurity.
21. **13:00 - 13:02**: "These teams have contributed to a significant number of them: AlphaGo, Transformers, word2vec, WaveNet, AlphaFold, Sequence to sequence models, Distillation, Deep reinforcement learning."
    *   **Broader context**: Sundar reviews Google's historical contributions to AI breakthroughs, highlighting the achievements of their Brain and DeepMind teams.
22. **13:10 - 13:13**: "All this helps set the stage for the inflection point we are at today."
    *   **Broader context**: He concludes a segment on Google's AI foundational models and their impact, setting the stage for future developments.
23. **13:25 - 13:28**: "They are focused on building more capable systems safely and responsibly."
    *   **Broader context**: Discussing the unified Google DeepMind team, he emphasizes their commitment to building advanced AI systems with safety and responsibility.
24. **14:07 - 14:11**: "As we invest in more advanced models, we are also deeply investing in AI responsibility."
    *   **Broader context**: He highlights Google's commitment to responsible AI development alongside advancements in model capabilities.
25. **15:10 - 15:12**: "James will talk about our responsible approach to AI later."
    *   **Broader context**: He introduces an upcoming segment dedicated to discussing Google's responsible AI strategies.
26. **15:28 - 15:30**: "That's the opportunity we have with Bard, our experiment for conversational AI."
    *   **Broader context**: He introduces Bard as Google's experimental conversational AI.

# Customizing video preprocessing

The Gemini API allows you to define some preprocessing steps to enhance your abilities to understand and extract information from videos.

You can use clipping intervals (or define time offsets to focus on specific video parts) and custom FPS (to define how many frames will be considered to analyze the video.

For more details about those features, you can take a look at the [Customizing video preprocessing](https://ai.google.dev/gemini-api/docs/video-understanding#customize-video-preprocessing) at the Gemini API documentation.

## Analyze specific parts of videos using clipping intervals

Sometimes you want to look for specific parts of your videos. You can define time offsets on your request, pointing to the model which specific video interval you are more interested about.

**Note:** The `video_metadata` that you will inform must be representing the time offsets in seconds.

In this example, you are using this video, from [Google I/O 2025 keynote](https://www.youtube.com/watch?v=XEzRZ35urlk) and asking the model to consider specifically the time offset between 20min50s and 26min10s.

In [18]:
response = client.models.generate_content(
    model=MODEL_ID,
    contents=types.Content(
        parts=[
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=XEzRZ35urlk'),
                video_metadata=types.VideoMetadata(
                    start_offset='1250s',
                    end_offset='1570s'
                )
            ),
            types.Part(text='Please summarize the video in 3 sentences.')
        ]
    )
)

Markdown(response.text)

Here is a 3 sentence summary of the video:

Demis Hassabis introduces Google DeepMind's newest developments in AI, focusing on improving AI’s understanding and interaction with the world. He highlights the launch of Gemini 1.5 Flash, a multimodal model for efficiency and fast service, alongside the announcement of Project Astra, aimed to develop a universal AI agent helpful in everyday life. The talk emphasizes enhancing natural, conversational interaction and personalized responsiveness in AI assistants.

You can also use clipping intervals for videos uploaded to the File API as also inline videos on your prompts (remembering that inline data cannot exceed 20MB in size).

In [19]:
prompt = "Summarize this video in few short bullets"  # @param ["For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video.", "Organize all scenes from this video in a table, along with timecode, a short description, a list of objects visible in the scene (with representative emojis) and an estimation of the level of excitement on a scale of 1 to 10"] {"allow-input":true}

video = trailcam_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=MODEL_ID,
    contents=types.Content(
        parts=[
            types.Part(
                file_data=types.FileData(
                    file_uri=video.uri,
                    mimeType=video.mime_type),
                video_metadata=types.VideoMetadata(
                    start_offset='60s',
                    end_offset='120s'
                )
            ),
            types.Part(text=prompt)
        ]
    )
)

Markdown(response.text)

Here are the key events from the trail camera footage:

* **0:00-0:16:** Two grey foxes move around a rocky area.
* **0:17-0:34:** A mountain lion explores a wooded area.
* **0:35-0:50:** Two foxes play/fight, one gets tossed into the air.
* **0:51-1:16:** Two mountain lions, likely a mother and cub, move through a rocky area at night.
* **1:17-1:28:** Two mountain lions walk toward the camera.
* **1:29-1:50:** A bobcat explores a wooded area at night.
* **1:51-2:22:** Two young bears walk toward and investigate the trail camera.
* **2:23-2:51:** A fox and then a bear pass by a scenic overlook with a city lit up in the distance.
* **2:52-3:04:** A mountain lion approaches and scratches at something in the ground.
* **3:05-3:21:** A bear walks toward the trail camera and starts panting.
* **3:22-4:19:** A bear followed by a bear cub are seen walking in the woods, followed by a mountain lion approaching the camera. 
* **4:22-4:56:** A bobcat walks along a log and looks at the trail camera.
* **4:57-5:09:** A mountain lion explores the area, smelling the ground.

## Customize the number of video frames per second (FPS) analyzed

By default, the Gemini API extract 1 (one) FPS to analyze your videos. But this amount may be too much (for videos with less activities, like a lecture) or to preserve more detail in fast-changing visuals, a higher FPS should be selected.

In this scenario, you are using one specific interval of one Nascar pit-stop as also you will capture a higher number of FPS (in this case, 24 FPS).

In [20]:
response = client.models.generate_content(
    model=MODEL_ID,
    contents=types.Content(
        parts=[
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=McN0-DpyHzE'),
                video_metadata=types.VideoMetadata(
                    start_offset='15s',
                    end_offset='35s',
                    fps=24
                )
            ),
            types.Part(text='How many tires where changed? Front tires or rear tires?')
        ]
    )
)

Markdown(response.text)

According to the video, only the tires on the left side of the car were replaced. 

Once again, you can check the  [live demo on AI Studio](https://aistudio.google.com/starter-apps/video) shows an example on how to postprocess this output. Check the [code of that demo](https://github.com/google-gemini/starter-applets/tree/main/video) for more details.

# Next Steps

Try with you own videos using the [AI Studio's live demo](https://aistudio.google.com/starter-apps/video) or play with the examples from this notebook (in case you haven't seen, there are other prompts you can try in the dropdowns).

For more examples of the Gemini capabilities, check the other guide from the [Cookbook](https://github.com/google-gemini/cookbook/). You'll learn how to use the [Live API](../quickstarts/Get_started_LiveAPI.ipynb), juggle with [multiple tools](../quickstarts/Get_started_LiveAPI_tools.ipynb) or use Gemini 2.0 [spatial understanding](../quickstarts/Spatial_understanding.ipynb) abilities.

The [examples](https://github.com/google-gemini/cookbook/tree/main/examples/) folder from the cookbook is also full of nice code samples illustrating creative ways to use Gemini multimodal capabilities and long-context.