# Lesson: Video Intelligence and Understanding

Welcome to this lesson on analyzing video with the Gemini API. Gemini's powerful multimodal capabilities allow it to "watch" a video, understand its content, and answer detailed questions about it.

In this notebook, we will:
1.  **Process a "How-To" Video:** We'll analyze a short video about making a paper airplane to ask questions and extract a time-stamped list of steps.
2.  **Perform Creative Analysis:** We'll use a short, scenic drone video of Yosemite National Park and ask the model to perform a creative writing task.

In [1]:
#@title 1. Setup
# Install the Google AI Python SDK
!pip install -q -U google-genai

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/45.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.3/45.3 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/236.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.2/236.2 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-adk 1.15.1 requires google-genai!=1.37.0,!=1.38.0,!=1.39.0,<=1.40.0,>=1.21.1, but you have google-genai 1.42.0 which is incompatible.[0m[31m
[0m

In [2]:
#@title 2. Configure your API Key
# Use the "Secrets" tab in Colab (click the key icon on the left) to store your
# API key with the name "GOOGLE_API_KEY".

from google import genai
from google.colab import userdata

try:
    GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
    client = genai.Client(api_key=GOOGLE_API_KEY)
except userdata.SecretNotFoundError as e:
    print('Secret not found. Please add your GOOGLE_API_KEY to the Colab Secrets Manager.')

## Part 1: Analyzing a "How-To" Video

First, we'll download a short instructional video from YouTube, upload it to the Gemini API, and ask questions about its content.

In [3]:
#@title Download and Upload the "How-To" Video
# A working video on how to make an easy paper airplane

URL = "https://raw.githubusercontent.com/gopidon/gemini-advanced-api-course/main/Section_2_Advanced_Multimodality/videos/airplane.mp4"
!wget -q $URL -O airplane.mp4

In [5]:
#@title Download and Upload the airplane video
print("Uploading video file to the File API...")
airplane_video = client.files.upload(file='airplane.mp4')

print(f"File uploaded successfully")

Uploading video file to the File API...
File uploaded successfully


In [6]:
#@title Describe what's in the video!
from IPython.display import display, Markdown

prompt = "Provide a short description of what this video teaches."
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        airplane_video,
        prompt,
    ]
)

Markdown(response.text)

This video teaches viewers how to fold a high-performance paper airplane called the "Road Runner" in under 60 seconds. It demonstrates a simple, quick folding method for a plane designed for long flights (over 100 feet) with excellent speed and glide.

In [7]:
#@title Extract a time-stamped list of steps

prompt = "Provide a list of the key steps shown in this video, including their start and end timestamps."
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        airplane_video,
        prompt,
    ]
)

display(Markdown(response.text))


Here are the key steps for making the paper airplane, with their corresponding timestamps:

*   **0:00 - 0:19**: Introduction to the paper airplane and the challenge.
*   **0:20 - 0:24**: Fold the paper in half lengthwise.
*   **0:25 - 0:35**: Open the paper and fold the top corners inwards towards the center crease, leaving a small gap.
*   **0:36 - 0:48**: Fold the new side edges inwards again towards the center, maintaining a small gap.
*   **0:49 - 1:02**: Fold the outer edges inwards one more time, making the plane narrow and leaving a gap.
*   **1:03 - 1:05**: Fold the entire paper airplane in half lengthwise.
*   **1:05 - 1:11**: Fold down the first wing by aligning the top edge with the bottom edge.
*   **1:11 - 1:17**: Unfold the first wing, then fold down the second wing in the same manner.
*   **1:18 - 1:25**: The paper airplane is complete (mention of optional tape).
*   **1:26 - 1:39**: Demonstrating the paper airplane flying.

## Part 2: Creative Analysis of a Scenic Video (Yosemite)

Now, let's use a different video to perform a more creative task. We'll download a short, beautiful drone video of Yosemite National Park and ask Gemini to write a travel blog introduction based on the visuals.

In [8]:
#@title Download and Upload the Yosemite Video
URL = "https://raw.githubusercontent.com/gopidon/gemini-advanced-api-course/main/Section_2_Advanced_Multimodality/videos/yosemite.mp4"
!wget -q $URL -O yosemite.mp4

In [9]:
print("Uploading video file to the File API...")
yosemite_video = client.files.upload(file='yosemite.mp4')

print(f"File uploaded successfully")

Uploading video file to the File API...
File uploaded successfully


In [10]:
#@title Perform a Creative Writing Task
print("\n--- Asking for a creative travel blog intro based on the Yosemite video ---")

prompt = "Based on this scenic video clip of Yosemite National Park, write a short and captivating introduction for a travel blog post."
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        yosemite_video,
        prompt,
    ]
)

Markdown(response.text)


--- Asking for a creative travel blog intro based on the Yosemite video ---


Here's a captivating introduction for a travel blog post about Yosemite National Park, inspired by the video:

---

**Yosemite: Where Nature's Grandeur Reaches New Heights**

Prepare to be awe-struck. Tucked away in the heart of California's Sierra Nevadas lies Yosemite National Park, a sprawling masterpiece covering over 1,000 square miles of untamed beauty. This isn't just a park; it's a symphony of majestic granite cliffs, thunderous waterfalls, tranquil meadows, and ancient forests, all bursting with vibrant life.

From the dizzying heights of Glacier Point, offering panoramic views of Half Dome and Yosemite Valley, to the thundering spray of North America's tallest waterfall, Yosemite Falls, every corner of this park promises an unforgettable experience. Whether you're navigating scenic routes, walking among giant sequoias that have stood for millennia, or discovering alpine wildflowers in bloom, Yosemite truly is a marvel. And as you explore, keep your eyes peeled – deer graze peacefully, bears fish in the rivers, and golden eagles soar overhead, reminding you that you're in one of nature's last great wild sanctuaries.

Ready to discover the magic? Join us as we explore the 5 must-see attractions that make Yosemite an essential stop on any nature lover's itinerary.

## Part 3: Analyse Youtube Videos

On top of using your own videos you can also ask Gemini to get a video from Youtube and analyze it. He's an example using the keynote from Google IO 2023. Guess what the main theme was?

In [11]:
from google.genai import types

response = client.models.generate_content(
    model= "gemini-2.5-flash",
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Sundar says \"AI\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=ixRanV-rdAQ')
            )
        ]
    )
)

Markdown(response.text)

Here are all the instances where Sundar says "AI" in the video, along with timestamps and broader context:

1.  **0:29**: "...As you may have heard, **AI** is having a very busy year."
    *   **Context**: Sundar is opening Google I/O and immediately highlights the importance and rapid development of AI, setting the tone for the keynote.

2.  **0:39**: "...Seven years into our journey as an **AI** first company, we are at an exciting inflection point."
    *   **Context**: He refers to Google's long-term commitment to AI and positions the current moment as a significant advancement in AI capabilities.

3.  **0:47**: "...We have an opportunity to make **AI** even more helpful..."
    *   **Context**: He emphasizes the potential of AI to benefit people, businesses, communities, and everyone.

4.  **0:54**: "...We've been applying **AI** to make our products radically more helpful for a while."
    *   **Context**: He explains that AI has long been integrated into Google products to enhance their utility.

5.  **0:59**: "...With generative **AI**, we are taking the next step."
    *   **Context**: He introduces generative AI as the next phase of AI integration into Google's core products, including Search.

6.  **1:17**: "...Let me start with few examples of how generative **AI** is helping to evolve our products..."
    *   **Context**: He transitions to showcasing specific product examples, starting with Gmail, where generative AI is used for features like "Help me write."

7.  **1:41**: "...Smart Compose led to more advanced writing features powered by **AI**."
    *   **Context**: He references previous AI-powered features in Gmail like Smart Compose and how they laid the groundwork for new generative AI capabilities.

8.  **1:45**: "...**AI** features used 180 billion times last year."
    *   **Context**: He highlights the widespread adoption of AI features in Google Workspace, indicating significant user engagement.

9.  **2:54**: "...And just like with Smart Compose, you will see it get better over time."
    *   **Context**: Sundar Pichai mentions how Smart Compose uses AI and hints at the continuous improvement of such features.

10. **3:02**: "...Since the early days of Street View, **AI** has stitched together billions of panoramic images..."
    *   **Context**: He talks about how AI has been instrumental in creating immersive experiences like Street View in Google Maps.

11. **3:13**: "...At last year's I/O, we introduced Immersive View, which uses **AI** to create a high-fidelity representation of a place..."
    *   **Context**: He discusses the role of AI in generating realistic 3D representations of cities and landmarks in Google Maps.

12. **5:07**: "...Another product made better by **AI** is Google Photos."
    *   **Context**: He introduces Google Photos as another example of a product enhanced by AI.

13. **5:16**: "...It was one of our first **AI**-native products."
    *   **Context**: He recalls that Google Photos was designed with AI at its core from its inception in 2015.

14. **5:40**: "...**AI** advancements give us more powerful ways to do this."
    *   **Context**: He is talking about improving photos and how AI enables more advanced editing features like Magic Eraser and Magic Editor.

15. **5:48**: "...Magic Eraser, launched first on Pixel, uses **AI**-powered computational photography to remove unwanted distractions."
    *   **Context**: He explains the AI technology behind Magic Eraser in Google Photos.

16. **7:41**: "...These are just a few examples of how **AI** can help you in moments that matter."
    *   **Context**: He concludes a segment showcasing various Google products (Gmail, Maps, Photos) and their AI enhancements, emphasizing AI's practical benefits.

17. **7:49**: "...And there is so much more we can do to deliver the full potential of **AI** across the products you know and love."
    *   **Context**: He looks ahead to future possibilities of integrating AI further into Google's product ecosystem.

18. **8:24**: "...Making **AI** helpful for everyone is the most profound way we will advance our mission."
    *   **Context**: He outlines Google's core mission with AI: to make it universally helpful.

19. **8:52**: "...And finally, by building and deploying **AI** responsibly so that everyone can benefit equally."
    *   **Context**: He emphasizes the importance of responsible AI development and deployment.

20. **9:04**: "...Our ability to make **AI** helpful for everyone relies on continuously advancing our foundation models."
    *   **Context**: He connects Google's AI mission to the development of powerful foundation models.

21. **11:27**: "...It uses **AI** to better detect malicious scripts and can help security experts understand and resolve threats."
    *   **Context**: He describes how Sec-PaLM, an AI model, is fine-tuned for security use cases.

22. **12:14**: "...You can imagine an **AI** collaborator that helps radiologists interpret images and communicate the results."
    *   **Context**: He discusses future capabilities of Med-PaLM 2, an AI model fine-tuned for medical knowledge, suggesting its role as a medical imaging assistant.

23. **12:47**: "...PaLM 2 is the latest step in our decade-long journey to bring **AI** in responsible ways to billions of people."
    *   **Context**: He reiterates Google's long-term commitment to AI development and its responsible deployment.

24. **13:00**: "...Looking back at the defining **AI** breakthroughs over the last decade..."
    *   **Context**: He acknowledges the significant contributions of Google's AI teams to various breakthroughs in the field.

25. **13:28**: "...This includes our next generation foundation model, Gemini, which is still in training."
    *   **Context**: Sundar Pichai introduces Google's next-generation AI foundation model, Gemini.

26. **14:10**: "...As we invest in more advanced models, we are also deeply investing in **AI** responsibility."
    *   **Context**: He emphasizes Google's commitment to responsible AI development alongside technological advancements.

27. **15:06**: "...We'll ensure every one of our **AI**-generated images has that metadata."
    *   **Context**: He discusses embedding metadata into AI-generated images as part of responsible AI practices.

## Part 4: Reasoning About Real-World Objects & Text

This part demonstrates Gemini's advanced ability to not only "see" objects but also to read handwritten text or labels associated with them, and then structure that information. This is powerful for tasks like inventory management, price checking, or digitizing real-world displays.

In [12]:
#@title Ask for Structured Information about Objects and Text
# The prompt instructs the model to identify items, read their prices/labels,
# and return the information in a structured format.

prompt = """
Analyze this video of an antique display. For each unique item where a price or label is visible,
list the item and its associated price or note. If no price is visible, state "Price not visible".
Provide the output as a numbered list.
"""

response = client.models.generate_content(
    model= "gemini-2.5-flash",
    contents=types.Content(
        parts=[
            types.Part(text=prompt),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=0DGGwo_qe4A')
            )
        ]
    )
)

Markdown(response.text)

Here is a numbered list of unique items from the video with visible prices or notes:

1.  **Black apothecary chest with brass drawer pulls:** Price: $55 (visible at 2:28)
2.  **Magnolia Home green canister:** Price: $4.99 (visible at 3:57)
3.  **White floral embossed tall dresser with gold knobs:** Price: $195 (visible at 5:39)
4.  **Silver ornate oval mirror:** Price: $70 (visible at 5:17)
5.  **White floral embossed three-drawer dresser with dark metal pulls:** Price: $175 (visible at 5:43)
6.  **Black and brown secretary desk with gold hardware:** Price: $225 (visible at 6:54)
7.  **Tall natural wood candle holder:** Price: $25 (visible at 7:23)
8.  **Shorter natural wood candle holder:** Price: $20 (visible at 7:23)
9.  **Black round corner cabinet with marble top:** Price: $150 (visible at 7:44)
10. **Strand of pearls:** Price: $15 (visible at 8:03)

The following items are visible but do not have an associated price tag shown in the video:

11. **Three ornate oval gold mirrors:** Price not visible
12. **Small rectangular wooden window-frame-like mirror:** Price not visible
13. **Green glass pedestal bowl:** Price not visible
14. **White display cabinet (far left):** Price not visible
15. **Ornate gold rectangular mirror (vertical):** Price not visible
16. **Ornate white rectangular mirror:** Price not visible
17. **Ornate gold rectangular mirror (horizontal):** Price not visible
18. **Vintage Hollywood Regency lamp (cherub base, green glass top):** Price not visible
19. **Vintage mister bottle:** Price not visible
20. **Gray coffee table with glass inserts:** Price not visible
21. **White display cabinet with blue and white porcelain:** Price not visible
22. **Small rectangular table with blue and white design drawers:** Price not visible
23. **White octagonal side table with gold floral detail:** Price not visible
24. **White corner shelf with assorted decor:** Price not visible
25. **Decorative signs on white shelf:** Price not visible
26. **Black ornate oval mirror:** Price not visible
27. **Wooden letter "A" (hollow):** Price not visible
28. **Wooden letter "A" (grooved):** Price not visible
29. **Large antique keys:** Price not visible
30. **Gold hand statue:** Price not visible
31. **White bust statue:** Price not visible
32. **Black bust statue:** Price not visible
33. **White wooden bench with heart cutouts and blue pillow:** Price not visible