# Disclaimer & Copyright

Copyright 2024 Forusone : shins777@gmail.com

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# Gemini - Youtube video analysis
* This notebook explains how to use Gemini to understand images in multimodality features of Gemini. This code shows how to use Gemini to analyze a Youtube videos with the feature.
* The youtube video that is used in this demo is "https://www.youtube.com/watch?v=nXVvvRhiGjI", the usage of the videos is just for the educational purpose.
* The video owner is KI Campus, please contact them to get permission if you want to use it.
* Don't use this Youtube video for the other purpose.
* Refer to the link for more information about the Gemini
 * ***https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview***

# Configuration
## Install python packages
* Vertex AI SDK for Python
  * https://cloud.google.com/python/docs/reference/aiplatform/latest
* Vertex AI initialization : aiplatform.init(..)
  * https://cloud.google.com/python/docs/reference/aiplatform/latest#initialization
* Install pytube to download the Youtube video
  * https://github.com/pytube/pytube
  * pip install pytube

In [1]:
%pip install --upgrade --quiet google-cloud-aiplatform

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/5.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/5.1 MB[0m [31m4.1 MB/s[0m eta [36m0:00:02[0m[2K     [91m━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/5.1 MB[0m [31m19.6 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m5.1/5.1 MB[0m [31m46.4 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m5.1/5.1 MB[0m [31m46.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.1/5.1 MB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
%pip install -q -U pytube

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/57.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━[0m [32m41.0/57.6 kB[0m [31m1.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
from IPython.display import display, Markdown

## Authentication to access to the GCP & Google drive

* Use OAuth to access the GCP environment.
 * Refer to the authentication methods in GCP : https://cloud.google.com/docs/authentication?hl=ko

In [4]:
#  For only colab to authenticate to get an access to the GCP.
import sys

if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()

* Mount to the google drive to access the .ipynb files in the repository.



In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Download the Youtube video to anlyze.
* The youtube video that is used in this demo is "https://www.youtube.com/watch?v=nXVvvRhiGjI", the usage of the videos is just for the educational purpose.
* The video owner is Google, please contact them to get permission if you want to use it.

In [6]:
from pytube import YouTube

# Download youtube video to your local drive in colab.
YouTube('https://www.youtube.com/watch?v=nXVvvRhiGjI').streams.first().download()


'/content/Project Astra Our vision for the future of AI assistants.mp4'

In [7]:
import moviepy.editor
moviepy.editor.ipython_display("Project Astra Our vision for the future of AI assistants.mp4", maxduration = 150)

Output hidden; open in https://colab.research.google.com to view.

# Execute the example
## Set the environment on GCP Project
* Configure project information
  * Model name : LLM model name : https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models
  * Project Id : prodect id in GCP
  * Region : region name in GCP

In [8]:
MODEL_NAME="gemini-1.5-flash"
PROJECT_ID="ai-hangsik"
REGION="asia-northeast3"

### Vertex AI initialization
Configure Vertex AI and access to the foundation model.

In [9]:
import vertexai
from vertexai.preview.generative_models import GenerativeModel, Part
import vertexai.preview.generative_models as generative_models

# Initalizate the current vertex AI execution environment.
vertexai.init(project=PROJECT_ID, location=REGION)

# Access to the generative model.
model = GenerativeModel(MODEL_NAME)

Encoding function for multimodality

In [10]:
import base64

def get_encoded_content(location_type, location, mime_type ):
  """
  Get the encoded content object.

  location_type :
    The type of the location. ( local or GCS )
  location :
    The file location of the content.
  mime_type :
    The mime type of the content.

  Returns:
    The encoded content object.

  """

  content_obj = None

  if location_type == "local":
    with open(location, 'rb') as f:
      raw_obj = base64.b64encode(f.read()).decode('utf-8')
      content_obj = Part.from_data(data=base64.b64decode(raw_obj), mime_type=mime_type)

  elif location_type == "GCS":
        content_obj = Part.from_uri(location, mime_type=mime_type)
  else:
    raise ValueError("Invalid location type.")

  return content_obj

### Function to get the response

In [11]:
def generate(content_obj, query:str):
    """
    Generate a response from the model.

    content_obj :
      encoded object being analyzed in the process
    query :
      query to be sent to the model

    Returns:
      The generated response.

    """

    # Set model parameter : https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/send-multimodal-prompts#set_model_parameters
    generation_config = {
        "max_output_tokens": 8192,
        "temperature": 1,
        "top_p": 0.95,
    }

    # Configure satey setting : https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes
    # Refer to the link to remove : https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes#how_to_remove_automated_response_blocking_for_select_safety_attributes
    safety_settings = {
        generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    }

    responses = model.generate_content(
        [content_obj, query],
        generation_config=generation_config,
        safety_settings=safety_settings,
        stream=False,
    )

    return responses.text

## Run example

In [12]:

from time import perf_counter

t1_start = perf_counter()

# When using local storage for the file location.
location_type = "local"
mime_type = "video/mp4"

repository_root = "."
file_path = "/Project Astra Our vision for the future of AI assistants.mp4"
location = repository_root + file_path

content_obj = get_encoded_content(location_type, location, mime_type )

prompt = """
You are a helpful assistant that finds the factual information in the video.
Follow the instructions below to generate a response.

1. Transcribe this video in English and summarize it in Korean
2. What was the object beside an glasses on the desk?
3. What was the colour of the speaker?
4. What kind of system architecture was drawn on the whiteboard

"""

outcome = generate(content_obj, prompt)

t1_end  = perf_counter()
print(f"Time : {t1_end - t1_start} seconds\n\n")

display(Markdown(outcome))


# When using Google Cloud Storage for the location type.

# location_type = "GCS"
# mime_type = "video/mp4"
# repository_root = "gs://bucket_name"
# file_path = "/Project Astra Our vision for the future of AI assistants.mp4"
# location = repository_root + file_path

Time : 29.535318595000035 seconds




Okay, let’s do some tests. Tell me when you see something that makes sound. I see a speaker which makes sound. What is that part of the speaker called? That is the tweeter. It produces high-frequency sounds. Give me a creative alliteration about these. Creative crayons colour cheerfully. They certainly craft colorful creations. What does that part of the code do? This code defines encryption and decryption functions. It seems to use AES-CBC encryption to encode and decode data based on a key and an initialization vector, IV. That’s right. What neighborhood do you think I’m in? This appears to be the King’s Cross area of London. It is known for its railway station and transportation connections. Do you remember where you saw my glasses? Yes, I do. Your glasses were on the desk near a red apple. Do you remember where you saw my glasses? Yes, I do. Your glasses were on the desk near a red apple. All right, give me a band name for this duo. Golden Stripes. Nice. Thanks Gemini.


## Korean Summary

이 영상은 구글 딥마인드의 AI 어시스턴트 미래 비전인 프로젝트 아스트라의 데모 영상입니다.  영상 속 사람은 AI 어시스턴트와 여러 가지 대화를 나눕니다. 스피커의 부품 이름을 묻거나, 물건에 대한 재미있는 말을 만들어달라고 요청합니다. 또한, 코드의 역할을 설명해달라고 하거나, 현재 위치를 추측해달라고 하기도 합니다. 마지막으로,  강아지와 호랑이 인형을 보고 듀오 밴드 이름을 지어달라고 요청합니다. AI 어시스턴트는 이러한 모든 질문에 명확하고 정확하게 답변하며, 사용자와 자연스럽게 대화를 나눕니다. 

## Answers to the questions

1. The object beside the glasses was a red apple.
2. The speaker was white.
3. The system architecture was client-NLB-server-DB.