##### Copyright 2025 Google LLC.

In [12]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Video understanding with Gemini

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Video_understanding.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) and later bring video analysis to a whole new level as illustrated in [this video](https://www.youtube.com/watch?v=Mot-JEU26GQ):


In [13]:
#@title Building with Gemini 2.0: Video understanding
%%html
<iframe width="560" height="315" src="https://www.youtube.com/embed/Mot-JEU26GQ?si=pcb7-_MZTSi_1Zkw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

This notebook will show you how to easily use Gemini to perform the same kind of video analysis. Each of them has different prompts that you can select using the dropdown, also feel free to experiment with your own.

You can also check the [live demo](https://aistudio.google.com/starter-apps/video) and try it on your own videos on [AI Studio](https://aistudio.google.com/starter-apps/video).

## Setup

This section install the SDK, set it up using your [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.

Expand the section if you are curious, but you can also just run it (it should take a couple of minutes since there are large files) and go straight to the examples.

### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.

More details about this new SDK on the [documentation](https://ai.google.dev/gemini-api/docs/sdks) or in the [Getting started](../quickstarts/Get_started.ipynb) notebook.

In [14]:
%pip install -U -q 'google-genai'

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](../quickstarts/Authentication.ipynb) for an example.

In [15]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK you now only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [16]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best Gemini 2.5 pro model. You can also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.

For more information about all Gemini models, check the [documentation](https://ai.google.dev/gemini-api/docs/models/gemini) for extended information on each of them.


In [17]:
model_name = "gemini-2.5-pro-preview-05-06" # @param ["gemini-1.5-flash-latest","gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-flash","gemini-2.5-pro-preview-05-06"] {"allow-input":true, isTemplate: true}

### Get sample videos

You will start with uploaded videos, as it's a more common use-case, but you will also see later that you can also use Youtube videos.

In [18]:
# Load sample images
!wget https://storage.googleapis.com/generativeai-downloads/videos/Pottery.mp4 -O Pottery.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/Jukin_Trailcam_Videounderstanding.mp4 -O Trailcam.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/post_its.mp4 -O Post_its.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/user_study.mp4 -O User_study.mp4 -q

### Upload the videos

Upload all the videos using the File API. You can find modre details about how to use it in the [Get Started](../quickstarts/Get_started.ipynb#scrollTo=KdUjkIQP-G_i) notebook.

This can take a couple of minutes as the videos will need to be processed and tokenized.

In [19]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'Video processing complete: ' + video_file.uri)

  return video_file

pottery_video = upload_video('Pottery.mp4')
trailcam_video = upload_video('Trailcam.mp4')
post_its_video = upload_video('Post_its.mp4')
user_study_video = upload_video('User_study.mp4')

Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/6dw3dfnmbrxv
Waiting for video to be processed.
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/l5wpvhpk9zrc
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/28ujx90sx524
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/2z6ip1rv1s9c


### Imports

In [20]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

# Search within videos

First, try using the model to search within your videos and describe all the animal sightings in the trailcam video.

<video controls width="500"><source src="https://storage.googleapis.com/generativeai-downloads/videos/Jukin_Trailcam_Videounderstanding.mp4" type="video/mp4"></video>

In [21]:
prompt = "For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video."  # @param ["For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video.", "Organize all scenes from this video in a table, along with timecode, a short description, a list of objects visible in the scene (with representative emojis) and an estimation of the level of excitement on a scale of 1 to 10"] {"allow-input":true}

video = trailcam_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

```json
[
  {"time_code": [0, 0, 0, 766], "caption": "The camera is very close to the fur of an animal, obscuring the view."},
  {"time_code": [0, 0, 766, 0, 16, 866], "caption": "Two gray foxes are in a rocky, wooded area during the day. One fox enters from the left, sniffs the ground, and is joined by the second fox from the right. The first fox jumps onto a large rock, and the second fox looks up at it."},
  {"time_code": [0, 17, 266, 0, 34, 366], "caption": "In a black and white night vision shot, a cougar walks through a wooded area, sniffing the ground. It stops, looks around, and then continues walking out of frame to the right."},
  {"time_code": [0, 34, 866, 0, 49, 396], "caption": "In a black and white night vision shot, two foxes are in a wooded area. One fox is sniffing the ground while the other rolls on its back. The first fox then pounces on the second, and they playfully chase each other out of frame."},
  {"time_code": [0, 49, 996, 1, 4, 366], "caption": "The camera is knocked over or obscured. In a black and white night vision shot, three foxes are visible in a rocky, wooded area. Two foxes are interacting, and a third fox is in the background on a rock. One fox then jumps onto a smaller rock in the foreground, looks around, and then jumps down. The camera is obscured again briefly."},
  {"time_code": [1, 5, 6, 1, 17, 596], "caption": "In a black and white night vision shot, a cougar stands in a rocky, wooded area, looking around. It then walks to the left and out of frame."},
  {"time_code": [1, 18, 166, 1, 28, 826], "caption": "In a black and white night vision shot, a cougar is in a rocky, wooded area. Another cougar walks past it in the foreground, moving from right to left and then out of frame. The first cougar then walks to the right and out of frame."},
  {"time_code": [1, 29, 226, 1, 50, 656], "caption": "In a black and white night vision shot, a bobcat stands in a wooded area, sniffing the ground. It looks up, then continues sniffing and walks around before looking up again and then walking out of frame to the right."},
  {"time_code": [1, 51, 256, 1, 56, 156], "caption": "A large brown bear walks towards the camera in a wooded area during the day, then turns and walks away to the right."},
  {"time_code": [1, 57, 56, 2, 4, 186], "caption": "In a black and white night vision shot, a cougar walks from left to right across the frame in a wooded area."},
  {"time_code": [2, 4, 786, 2, 12, 86], "caption": "The camera is very close to the fur of an animal, obscuring the view."},
  {"time_code": [2, 12, 86, 2, 22, 816], "caption": "Two bear cubs are in a wooded area during the day. One cub walks towards the camera, sniffs the ground, and is joined by the second cub. They both sniff the ground and then walk away from the camera."},
  {"time_code": [2, 23, 226, 2, 34, 156], "caption": "In a black and white night vision shot, a fox stands on a rocky outcrop overlooking city lights in the distance. It sniffs the ground and looks around."},
  {"time_code": [2, 34, 956, 2, 41, 386], "caption": "In a black and white night vision shot, a bear walks from right to left across a rocky outcrop overlooking city lights, then walks out of frame to the left."},
  {"time_code": [2, 42, 386, 2, 51, 416], "caption": "In a black and white night vision shot, a cougar walks from left to right across a rocky outcrop overlooking city lights."},
  {"time_code": [2, 51, 916, 3, 4, 446], "caption": "In a black and white night vision shot, a cougar is in a wooded area, sniffing the ground. It then looks up and around before continuing to sniff the ground."},
  {"time_code": [3, 5, 6, 3, 21, 986], "caption": "A large brown bear stands in a wooded area during the day, looking around and sniffing the air. It then looks directly at the camera, opens its mouth slightly, and turns its head."},
  {"time_code": [3, 22, 516, 3, 32, 66], "caption": "A brown bear stands in a wooded area during the day, looking to its right. Another bear walks into the frame from the right, and they both sniff the ground."},
  {"time_code": [3, 32, 566, 4, 21, 896], "caption": "Two brown bears are in a wooded area during the day. One bear is in the background looking to its right. The bear in the foreground sniffs the ground, then looks up and around before approaching the camera. The camera is briefly obscured. The two bears are then seen from behind, sniffing the ground. One bear sits down and scratches itself, while the other continues sniffing. A third bear walks into the frame from the right, and all three bears are seen sniffing the ground before two walk towards the camera and then away."},
  {"time_code": [4, 22, 446, 4, 29, 466], "caption": "In a black and white night vision shot, a bobcat sits in a wooded area, looking around. It then walks towards the camera and out of frame to the right."},
  {"time_code": [4, 30, 266, 4, 49, 296], "caption": "In a black and white night vision shot, a fox stands in a wooded area, looking around. It then walks to the right, then back to the left, before turning and running out of frame to the right."},
  {"time_code": [4, 50, 6, 4, 56, 456], "caption": "In a black and white night vision shot, a fox walks from right to left in a wooded area and then disappears into the undergrowth."},
  {"time_code": [4, 57, 26, 5, 9, 836], "caption": "In a black and white night vision shot, a cougar sniffs the ground in a wooded area, then looks up and walks towards the camera and out of frame to the right."}
]
```

The prompt used is quite a generic one, but you can get even better results if you cutomize it to your needs (like asking specifically for foxes).

The [live demo on AI Studio](https://aistudio.google.com/starter-apps/video) shows how you can postprocess this output to jump directly to the the specific part of the video by clicking on the timecodes. If you are interested, you can check the [code of that demo on Github](https://github.com/google-gemini/starter-applets/tree/main/video).

# Extract and organize text

Gemini can also read what's in the video and extract it in an organized way. You can even use Gemini reasoning capabilities to generate new ideas for you.

<video controls width="400"><source src="https://storage.googleapis.com/generativeai-downloads/videos/post_its.mp4" type="video/mp4"></video>

In [22]:
prompt = "Transcribe the sticky notes, organize them and put it in a table. Can you come up with a few more ideas?" # @param ["Transcribe the sticky notes, organize them and put it in a table. Can you come up with a few more ideas?", "Which of those names who fit an AI product that can resolve complex questions using its thinking abilities?"] {"allow-input":true}

video = post_its_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

Okay, here are the transcribed sticky notes from the whiteboard, organized into a table, followed by some additional project name ideas in a similar vein.

**Title on Whiteboard:** Brainstorm: Project Name

**Transcribed Sticky Notes:**

| Column 1 (Roughly Left) | Column 2 (Roughly Middle-Left) | Column 3 (Roughly Middle-Right) | Column 4 (Roughly Right) |
| :---------------------- | :----------------------------- | :------------------------------ | :----------------------- |
| Leo Minor               | Convergence                    | Supernova Echo                  | Chaos Field              |
| Canis Major             | Lunar Eclipse                  | Astral Forge                    | Lynx                     |
| Andromeda's Reach       | Comet's Tail                   | Draco                           | Prometheus Rising        |
| Stellar Nexus           | Centaurus                      | Delphinus                       | Chimera Dream            |
| Orion's Belt            | Symmetry                       | Serpens                         | Persius Shield           |
| Bayes Theroem           | Golden Ratio                   | Equilibrium                     | Euler's Path             |
| Lyra                    | Fractal                        | Athena's Eye                    | Galactic Core            |
| Chaos Theory            | Infinity Loop                  | Medusa                          | Titan                    |
| Riemann's Hypothesis    | Taylor Series                  | Hera                            | Zephyr                   |
| Sagitta                 | Stokes Theorem                 | Athena                          | Echo                     |
| Pandora's Box           |                                | Orion's Sword                   | Odin                     |
| Celestial Drift         |                                | Vector                          | Aether                   |
|                         |                                | Chaos Theory                    | Phoenix                  |
|                         |                                |                                 | Cerberus                 |

*(Note: "Chaos Theory" and "Echo" appear twice. "Bayes Theroem" is transcribed as written, though "Theorem" is the standard spelling.)*

---

**Additional Project Name Ideas (Based on Themes Observed):**

**Theme: Astronomy/Cosmology**
1.  **Nebula's Heart**
2.  **Quasar Pulse**
3.  **Event Horizon**
4.  **Dark Matter**
5.  **Cosmic Ray**
6.  **Cygnus Rift**
7.  **Orion Nebula**

**Theme: Mythology (Greek, Norse, etc.)**
8.  **Icarus Ascent**
9.  **Argus Panoptes** (Argus the all-seeing)
10. **Mjolnir Strike**
11. **Valhalla Gate**
12. **Griffin's Flight**
13. **Hydra's Coil**

**Theme: Mathematical/Scientific Concepts**
14. **Quantum Leap**
15. **Boolean Gate**
16. **Fibonacci Spiral**
17. **Axiom Core**
18. **Heisenberg Matrix**
19. **Singularity Point**

**Theme: Evocative/Abstract**
20. **Keystone Initiative**
21. **Vanguard Protocol**
22. **Zenith Engine**
23. **Apex Construct**
24. **Catalyst Prime**

# Structure information

Gemini 2.0 is not only able to read text but also to reason and structure about real world objects. Like in this video about a display of ceramics with handwritten prices and notes.

<video controls width="500"><source src="https://storage.googleapis.com/generativeai-downloads/videos/Pottery.mp4" type="video/mp4"></video>

In [23]:
prompt = "Give me a table of my items and notes" # @param ["Give me a table of my items and notes", "Help me come up with a selling pitch for my potteries"] {"allow-input":true}

video = pottery_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ],
    config = types.GenerateContentConfig(
        system_instruction="Don't forget to escape the dollar signs",
    )
)

Markdown(response.text)

Okay, here's a table of the items and their notes from the image:

| Item | Notes / Dimensions | Price | Glaze/Firing Notes |
|---|---|---|---|
| Tumblers | 4"h x 3"d ~ish | \$20 | (Implied from nearby tile) #5 Artichoke double dip |
| Small bowls | 3.5"h x 6.5"d | \$35 | (Implied from nearby tile) #6 Gemini double dip, SLOW COOL |
| Med bowls | 4"h x 7"d | \$40 | (Implied from nearby tile) #6 Gemini double dip, SLOW COOL |
| Glaze Test Tile (#5) | --- | N/A | #5 Artichoke double dip |
| Glaze Test Tile (#6) | Marked "6 b 6" on tile | N/A | #6 Gemini double dip, SLOW COOL |

As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze screen recordings for key moments

You can also use the model to analyze screen recordings. Let's say you're doing user studies on how people use your product, so you end up with lots of screen recordings, like this one, that you have to manually comb through.
With just one prompt, the model can describe all the actions in your video.

<video controls width="400"><source src="https://storage.googleapis.com/generativeai-downloads/videos/user_study.mp4" type="video/mp4"></video>

In [24]:
prompt = "Generate a paragraph that summarizes this video. Keep it to 3 to 5 sentences with corresponding timecodes." # @param ["Generate a paragraph that summarizes this video. Keep it to 3 to 5 sentences with corresponding timecodes.", "Choose 5 key shots from this video and put them in a table with the timecode, text description of 10 words or less, and a list of objects visible in the scene (with representative emojis).", "Generate bullet points for the video. Place each bullet point into an object with the timecode of the bullet point in the video."] {"allow-input":true}

video = user_study_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

The video showcases "My Garden App," an e-commerce platform for purchasing plants, displaying various options with descriptions, prices, and interaction buttons. (0:00-0:08) Users can "Like" plants, causing the like button to turn red, and add items to their cart, which temporarily changes the "Add to Cart" button to "Added!" (0:09-0:14). Several plants, including Fern, Cactus, and Hibiscus, are liked and added to the shopping cart by the user. (0:12-0:25) The app allows navigation to a "Cart" tab, which lists the added items and their total price, and a "Profile" tab, which shows the count of liked plants and items in the cart. (0:29-0:35) Finally, the user returns to the home screen, interacts with more plant listings by liking a "Snake Plant" and adding an "Orchid" to the cart. (0:37-0:46)

# Analyze youtube videos

On top of using your own videos you can also ask Gemini to get a video from Youtube and analyze it. He's an example using the keynote from Google IO 2023. Guess what the main theme was?


In [25]:
response = client.models.generate_content(
    model=model_name,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Sundar says \"AI\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=ixRanV-rdAQ')
            )
        ]
    )
)

Markdown(response.text)

Okay, here are all the instances where Sundar Pichai says "AI" in the provided video, along with timestamps and the broader context:

1.  **Timestamp:** 0:29
    *   **Quote:** "As you may have heard, AI is having a very busy year."
    *   **Broader Context:** Sundar is beginning his keynote, acknowledging the significant developments and public attention surrounding Artificial Intelligence in the current year.

2.  **Timestamp:** 0:38
    *   **Quote:** "Seven years into our journey as an AI-first company, we're at an exciting inflection point."
    *   **Broader Context:** He's framing Google's long-term strategic focus on AI and highlighting that the current period represents a major turning point in AI development and application.

3.  **Timestamp:** 0:45
    *   **Quote:** "We have an opportunity to make AI even more helpful for people, for businesses, for communities, for everyone."
    *   **Broader Context:** Sundar is outlining the potential positive impact of AI, emphasizing its utility across various aspects of life and society.

4.  **Timestamp:** 0:54
    *   **Quote:** "We've been applying AI to make our products radically more helpful for a while."
    *   **Broader Context:** He is reiterating Google's history of integrating AI into its products to enhance their functionality and user experience, setting the stage for discussing newer generative AI advancements.

5.  **Timestamp:** 1:40
    *   **Quote:** "Smart Compose led to more advanced writing features powered by AI."
    *   **Broader Context:** Sundar is tracing the evolution of AI-powered writing assistance in Gmail, from Smart Reply to Smart Compose, and mentioning its widespread use in Google Workspace.

6.  **Timestamp:** 3:02
    *   **Quote:** "Since the early days of Street View, AI has stitched together billions of panoramic images so people can explore the world from their device."
    *   **Broader Context:** He's explaining how AI has been a core technology in Google Maps for features like Street View, enabling the creation of immersive digital representations of the world.

7.  **Timestamp:** 3:14
    *   **Quote:** "At last year's I/O, we introduced Immersive View, which uses AI to create a high-fidelity representation of a place so you can experience it before you visit."
    *   **Broader Context:** Sundar is describing the AI technology behind the Immersive View feature in Google Maps, which offers detailed 3D models of locations.

8.  **Timestamp:** 5:08
    *   **Quote:** "Another product made better by AI is Google Photos."
    *   **Broader Context:** He is transitioning to discuss how AI has significantly enhanced the capabilities of Google Photos.

9.  **Timestamp:** 5:15
    *   **Quote:** "It was one of our first AI-native products."
    *   **Broader Context:** Sundar is referring to Google Photos, highlighting that it was designed from its inception with AI at its core, particularly for features like photo search and organization.

10. **Timestamp:** 5:38
    *   **Quote:** "AI advancements give us more powerful ways to do this."
    *   **Broader Context:** He is discussing the editing capabilities within Google Photos and how ongoing AI progress allows for more sophisticated photo enhancement tools.

11. **Timestamp:** 5:47
    *   **Quote:** "Magic Eraser, launched first on Pixel, uses AI-powered computational photography to remove unwanted distractions."
    *   **Broader Context:** Sundar is explaining the technology behind the Magic Eraser feature, which leverages AI and computational photography for photo editing.

12. **Timestamp:** 5:58
    *   **Quote:** "...and later this year, using a combination of semantic understanding and generative AI, you can do much more with a new experience called Magic Editor."
    *   **Broader Context:** He is introducing the upcoming Magic Editor feature for Google Photos, emphasizing its use of more advanced generative AI capabilities for complex photo manipulations.

13. **Timestamp:** 7:40
    *   **Quote:** "From Gmail and Photos to Maps, these are just a few examples of how AI can help you in moments that matter."
    *   **Broader Context:** Sundar is summarizing the product examples he's shared, reiterating the theme of AI's practical helpfulness in everyday applications.

14. **Timestamp:** 7:47
    *   **Quote:** "And there is so much more we can do to deliver the full potential of AI across the products you know and love."
    *   **Broader Context:** He is expressing Google's ambition to further integrate and leverage AI's capabilities across its entire suite of products.

15. **Timestamp:** 8:22
    *   **Quote:** "And looking ahead, making AI helpful for everyone is the most profound way we will advance our mission."
    *   **Broader Context:** Sundar is connecting Google's mission to organize the world's information with their goal of making AI beneficial and accessible to all.

16. **Timestamp:** 8:53
    *   **Quote:** "...and finally, by building and deploying AI responsibly so that everyone can benefit equally."
    *   **Broader Context:** He is outlining one of Google's four key approaches to AI development, stressing the importance of ethical and responsible deployment to ensure equitable benefits.

17. **Timestamp:** 9:02
    *   **Quote:** "Our ability to make AI helpful for everyone relies on continuously advancing our foundation models."
    *   **Broader Context:** Sundar is highlighting that the progress in making AI broadly useful is dependent on the ongoing development and improvement of their core AI models.

18. **Timestamp:** 12:27
    *   **Quote:** "It uses AI to better detect malicious scripts and can help security experts understand and resolve threats."
    *   **Broader Context:** He is describing Sec-PaLM, a specialized version of their PaLM 2 model, which uses AI for cybersecurity applications like identifying malicious code.

19. **Timestamp:** 12:45
    *   **Quote:** "PaLM 2 is the latest step in our decade-long journey to bring AI in responsible ways to billions of people."
    *   **Broader Context:** Sundar is positioning the PaLM 2 model as a significant milestone in Google's long-term commitment to developing and deploying AI ethically and broadly.

20. **Timestamp:** 14:09
    *   **Quote:** "As we invest in more advanced models, we're also deeply investing in AI responsibility."
    *   **Broader Context:** He is reiterating Google's dual commitment: advancing AI capabilities while simultaneously ensuring its responsible development and deployment.

21. **Timestamp:** 15:04
    *   **Quote:** "We'll ensure every one of our AI-generated images has that metadata."
    *   **Broader Context:** Sundar is discussing methods like watermarking and metadata to help identify AI-generated content, emphasizing transparency.

22. **Timestamp:** 15:11
    *   **Quote:** "James will talk about our responsible approach to AI later."
    *   **Broader Context:** He is teeing up a later segment of the keynote where another speaker (James) will delve deeper into Google's framework for responsible AI.

23. **Timestamp:** 15:29
    *   **Quote:** "That's the opportunity we have with Bard, our experiment for conversational AI."
    *   **Broader Context:** Sundar is introducing Bard as Google's experimental platform for exploring and developing conversational AI applications.

Once again, you can check the  [live demo on AI Studio](https://aistudio.google.com/starter-apps/video) shows an example on how to postprocess this output. Check the [code of that demo](https://github.com/google-gemini/starter-applets/tree/main/video) for more details.

# Next Steps

Try with you own videos using the [AI Studio's live demo](https://aistudio.google.com/starter-apps/video) or play with the examples from this notebook (in case you haven't seen, there are other prompts you can try in the dropdowns).

For more examples of the Gemini capabilities, check the other guide from the [Cookbook](https://github.com/google-gemini/cookbook/). You'll learn how to use the [Live API](../quickstarts/Get_started_LiveAPI.ipynb), juggle with [multiple tools](../quickstarts/Get_started_LiveAPI_tools.ipynb) or use Gemini 2.0 [spatial understanding](../quickstarts/Spatial_understanding.ipynb) abilities.

The [examples](https://github.com/google-gemini/cookbook/tree/main/examples/) folder from the cookbook is also full of nice code samples illustrating creative ways to use Gemini multimodal capabilities and long-context.