In [66]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## Uncovering the YouTube Moments that matter

| | |
|-|-|
| Author(s) | [Laxmi Harikumar](https://github.com/laxmi-genai) |

## Overview

[Gemini 2.0 Flash-Lite](https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2#2.0-flash-lite) is a new multimodal generative ai model from the Gemini family developed by [Google DeepMind](https://deepmind.google/). It is Google's fastest and most cost efficient Flash model. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.

Gemini 2.0 Flash-Lite includes:

- Multimodal input, text output
- 1M token input context window
- 8k token output context window

### Pricing
Information on the pricing for Gemini 2.0 Flash is available on our [Pricing page](https://cloud.google.com/vertex-ai/generative-ai/pricing).


## Objective

In this notebook, you'll explore how to do direct analysis of publicly available [YouTube](https://www.youtube.com/) videos with Gemini 2.0 Flash-Lite in GenAI SDK

You will complete the following tasks:
- Summarizing a single YouTube video using Gemini 1.5 Flash Lite
- Identify the commercial entities appearing and record their timestamps.

## Getting Started

### Install Google Gen AI SDK for Python

In [None]:
!pip install --upgrade --quiet google-genai

### Restart runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

The restart might take a minute or longer. After it's restarted, continue to the next step.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>
</div>

### Authenticate your notebook environment (Colab only)
If you are running this notebook on Google Colab, run the cell below to authenticate your environment.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Connect with GenAI SDK

This notebook shows how to use the Google Gen AI SDK with the Vertex AI to build enterprise-ready projects on Google Cloud.

### Import libraries

In [None]:
from IPython.display import Markdown, display, HTML
from google import genai
from google.genai.types import (
    FunctionDeclaration,
    GenerateContentConfig,
    Part,
    SafetySetting,

)
import os

### Set Google Cloud project information

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [67]:
PROJECT_ID = ""  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

In [68]:
client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

## Use the Gemini 2.0 Flash-Lite model

In [69]:
MODEL_ID = "gemini-2.0-flash-lite-preview-02-05"  # @param {type: "string"}

## Safety Filters

The Gemini API provides safety filters that you can adjust across multiple filter categories to restrict or allow certain types of content. You can use these filters to adjust what's appropriate for your use case. See the Configure safety filters page for details.

When you make a request to Gemini, the content is analyzed and assigned a safety rating. You can inspect the safety ratings of the generated content by printing out the model responses.

In [None]:
safety_settings = [
    SafetySetting(
        category="HARM_CATEGORY_DANGEROUS_CONTENT",
        threshold="BLOCK_LOW_AND_ABOVE",
    ),
    SafetySetting(
        category="HARM_CATEGORY_HARASSMENT",
        threshold="BLOCK_LOW_AND_ABOVE",
    ),
    SafetySetting(
        category="HARM_CATEGORY_HATE_SPEECH",
        threshold="BLOCK_LOW_AND_ABOVE",
    ),
    SafetySetting(
        category="HARM_CATEGORY_SEXUALLY_EXPLICIT",
        threshold="BLOCK_LOW_AND_ABOVE",
    ),
]

## Configure model parameters

You can include parameter values in each call that you send to a model to control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change.

- Learn more about [experimenting with parameter values](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values).

- See a list of all [Gemini API parameters](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#parameters).

In [None]:
generate_content_config=GenerateContentConfig(
        temperature=0.3,
        top_p=0.95,
        top_k=20,
        max_output_tokens=4096,
        safety_settings=safety_settings
    )

## Load the YouTube Video URL

In [70]:
# Provide link to a public YouTube video
YOUTUBE_VIDEO_URL = "https://www.youtube.com/watch?v=8peKcSEDFB4"  # @param {type:"string"}


youtube_video_embed_url = YOUTUBE_VIDEO_URL.replace("/watch?v=", "/embed/")

# Create HTML code to directly embed video
youtube_video_embed_html_code = f"""
<iframe width="560" height="315" src="{youtube_video_embed_url}"
title="YouTube video player" frameborder="0" allow="accelerometer; autoplay;
clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen>
</iframe>
"""

# Display embedded YouTube video
display(HTML(youtube_video_embed_html_code))

## Summarize the YT video

Short summary of what is happening in the video

In [71]:
# Call Gemini API with prompt to summarize the video

prompt = f"""
Summarize the video
"""

yt_video = Part.from_uri(
    file_uri= YOUTUBE_VIDEO_URL,
    mime_type="video/mp4",
)


response = client.models.generate_content(
    model= MODEL_ID,
    config = generate_content_config,
    contents=[
        yt_video,
        prompt
    ],
)

display(Markdown(response.text))

Sure, here's a summary of the video:

The video is a commercial for Hyundai. It begins with a man looking at a sign that says "2022 World Car of the Year, Search Hyundai". He then uses voice search on his phone to search for "Hyundai". The video then shows a series of people mispronouncing the name "Hyundai" in different ways, and then the correct pronunciation is revealed. The video ends with a shot of a Hyundai car.

## Identify the commercial entities

Identify all the commercial entities appearing in the video along with the timestamps

In [88]:
# Call Gemini API with prompt to identify the commercial entities appearing and record their timestamps.

prompt = f"""
Analyze the video and:
1. List all unique commercial entities present.
2. Generate a markdown table "TimeOnScreen" with columns "Entity," "TimeOnScreenStart" (seconds),
and "TimeOnScreenEnd" (seconds). Each entity appearance should have its own row.
3. Timestamp Justification:  For each timestamp in the "TimeOnScreen" table,
briefly explain your reasoning.  For example, "TimeOnScreenStart: 15 seconds
- Logo appears in the top right corner."
"""

yt_video = Part.from_uri(
    file_uri= YOUTUBE_VIDEO_URL,
    mime_type="video/mp4",
)


response = client.models.generate_content(
    model= MODEL_ID,
    config = generate_content_config,
    contents=[
        yt_video,
        prompt
    ],
)

display(Markdown(response.text))

Here's the analysis of the video:

**1. Unique Commercial Entities:**

*   Hyundai
*   High 'n' Dye
*   Hawaiian Tie
*   High End Pie
*   Highland Eye

**2. TimeOnScreen Table:**

| Entity          | TimeOnScreenStart (seconds) | TimeOnScreenEnd (seconds) |
|-----------------|---------------------------|-------------------------|
| Hyundai         | 0                         | 1                       |
| Hyundai         | 1                         | 1                       |
| High 'n' Dye   | 5                         | 8                       |
| Hyundai         | 24                        | 29                      |
| Hawaiian Tie    | 10                        | 11                      |
| High End Pie    | 15                        | 16                      |
| Highland Eye    | 19                        | 20                      |
| Hyundai         | 27                        | 29                      |

**3. Timestamp Justification:**

*   **Hyundai, 0-1 seconds:** The Hyundai logo appears on the advertisement.
*   **Hyundai, 1-1 seconds:** The Hyundai logo appears on the advertisement.
*   **High 'n' Dye, 5-8 seconds:** The storefront of "High 'n' Dye" is shown.
*   **Hyundai, 24-29 seconds:** The Hyundai car is shown, and the Hyundai logo appears on the screen.
*   **Hawaiian Tie, 10-11 seconds:** The storefront of "Hawaiian Tie" is shown.
*   **High End Pie, 15-16 seconds:** The food truck "High End Pie" is shown.
*   **Highland Eye, 19-20 seconds:** The storefront of "Highland Eye" is shown.
*   **Hyundai, 27-29 seconds:** The Hyundai logo appears on the screen.