# Gemini API: Prompting with Video

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Video.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

This notebook provides a quick example of how to prompt Gemini 1.5 Pro using a video file. In this case, you'll use a short clip of [Big Buck Bunny](https://peach.blender.org/about/).

In [1]:
!pip install h5py
!pip install typing-extensions
!pip install wheel
!pip install -U -q google-generativeai
!pip install -qq pyannote.audio==3.1.1
!pip install -qq ipython==7.34.0
!wget -q "https://github.com/pyannote/pyannote-audio/raw/develop/tutorials/assets/sample.wav"
!wget -q "https://github.com/pyannote/pyannote-audio/raw/develop/tutorials/assets/sample.rttm"
!wget -q -P ./assets/ "https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/assets/download-model.png"
!wget -q -P ./assets/ "https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/assets/download-pipeline.png"


Collecting h5py
  Downloading h5py-3.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.3/5.3 MB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting numpy>=1.17.3
  Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Installing collected packages: numpy, h5py
Successfully installed h5py-3.11.0 numpy-1.26.4
Collecting wheel
  Downloading wheel-0.43.0-py3-none-any.whl (65 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.8/65.8 KB[0m [31m775.6 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: wheel
Successfully installed wheel-0.43.0


In [4]:
import google.generativeai as genai

### Authentication Overview

**Important:** The File API uses API keys for authentication and access. Uploaded files are associated with the API key's cloud project. Unlike other Gemini APIs that use API keys, your API key also grants access data you've uploaded to the File API, so take extra care in keeping your API key secure. For best practices on securing API keys, refer to Google's [documentation](https://support.google.com/googleapi/answer/6310037).

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [6]:
GOOGLE_API_KEY=''
genai.configure(api_key=GOOGLE_API_KEY)

## Extract frames

The Gemini API currently does not support video files directly. Instead, you can provide a series of timestamps and image files.

We will extract 1 frame a second from a the short film "Big Buck Bunny" file using [OpenCV](https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html).

> "Big Buck Bunny" is (c) copyright 2008, Blender Foundation / www.bigbuckbunny.org and [licensed](https://peach.blender.org/about/) under the [Creative Commons Attribution 3.0](http://creativecommons.org/licenses/by/3.0/) License.

Note: You can also [upload your own files](https://github.com/google-gemini/cookbook/tree/main/examples/Upload_files.ipynb) to use.

In [7]:
video_file_name = "DG Check-in-20230705_084639-Meeting Recording (online-video-cutter.com) (1).mp4"

Use OpenCV to extract image frames from the video at 1 frame per second.

In [10]:
import cv2
import os
import shutil

# Create or cleanup existing extracted image frames directory.
FRAME_EXTRACTION_DIRECTORY = "frames"
FRAME_PREFIX = "_frame"
def create_frame_output_dir(output_dir):
  if not os.path.exists(output_dir):
    os.makedirs(output_dir)
  else:
    shutil.rmtree(output_dir)
    os.makedirs(output_dir)

def extract_frame_from_video(video_file_path):
  print(f"Extracting {video_file_path} at 1 frame per second. This might take a bit...")
  create_frame_output_dir(FRAME_EXTRACTION_DIRECTORY)
  vidcap = cv2.VideoCapture(video_file_path)
  fps = vidcap.get(cv2.CAP_PROP_FPS)
  frame_duration = 1 / fps  # Time interval between frames (in seconds)
  output_file_prefix = os.path.basename(video_file_path).replace('.', '_')
  frame_count = 0
  count = 0
  while vidcap.isOpened():
      success, frame = vidcap.read()
      if not success: # End of video
          break
      if int(count / fps) == frame_count: # Extract a frame every second
          min = frame_count // 60
          sec = frame_count % 60
          time_string = f"{min:02d}:{sec:02d}"
          image_name = f"{output_file_prefix}{FRAME_PREFIX}{time_string}.jpg"
          output_filename = os.path.join(FRAME_EXTRACTION_DIRECTORY, image_name)
          cv2.imwrite(output_filename, frame)
          frame_count += 1
      count += 1
  vidcap.release() # Release the capture object\n",
  print(f"Completed video frame extraction!\n\nExtracted: {frame_count} frames")

extract_frame_from_video(video_file_name)

Extracting DG Check-in-20230705_084639-Meeting Recording (online-video-cutter.com) (1).mp4 at 1 frame per second. This might take a bit...
Completed video frame extraction!

Extracted: 271 frames


## Upload frames using the File API

Once we have the frames extracted, we are ready to upload the frames to the API.

The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API.

We will just upload 10 frames so this example runs quickly. You can modify the code below to upload the entire video.

In [11]:
import os

class File:
  def __init__(self, file_path: str, display_name: str = None):
    self.file_path = file_path
    if display_name:
      self.display_name = display_name
    self.timestamp = get_timestamp(file_path)

  def set_file_response(self, response):
    self.response = response

def get_timestamp(filename):
  """Extracts the frame count (as an integer) from a filename with the format
     'output_file_prefix_frame00:00.jpg'.
  """
  parts = filename.split(FRAME_PREFIX)
  if len(parts) != 2:
      return None  # Indicates the filename might be incorrectly formatted
  return parts[1].split('.')[0]

# Process each frame in the output directory
files = os.listdir(FRAME_EXTRACTION_DIRECTORY)
files = sorted(files)
files_to_upload = []
for file in files:
  files_to_upload.append(
      File(file_path=os.path.join(FRAME_EXTRACTION_DIRECTORY, file)))

# Upload the files to the API
# Only upload a 10 second slice of files to reduce upload time.
# Change full_video to True to upload the whole video.
full_video = True

uploaded_files = []
print(f'Uploading {len(files_to_upload) if full_video else 10} files. This might take a bit...')

for file in files_to_upload if full_video else files_to_upload[40:50]:
  print(f'Uploading: {file.file_path}...')
  response = genai.upload_file(path=file.file_path)
  file.set_file_response(response)
  uploaded_files.append(file)

print(f"Completed file uploads!\n\nUploaded: {len(uploaded_files)} files")

Uploading 271 files. This might take a bit...
Uploading: frames/DG Check-in-20230705_084639-Meeting Recording (online-video-cutter_com) (1)_mp4_frame00:00.jpg...
Uploading: frames/DG Check-in-20230705_084639-Meeting Recording (online-video-cutter_com) (1)_mp4_frame00:01.jpg...
Uploading: frames/DG Check-in-20230705_084639-Meeting Recording (online-video-cutter_com) (1)_mp4_frame00:02.jpg...
Uploading: frames/DG Check-in-20230705_084639-Meeting Recording (online-video-cutter_com) (1)_mp4_frame00:03.jpg...
Uploading: frames/DG Check-in-20230705_084639-Meeting Recording (online-video-cutter_com) (1)_mp4_frame00:04.jpg...
Uploading: frames/DG Check-in-20230705_084639-Meeting Recording (online-video-cutter_com) (1)_mp4_frame00:05.jpg...
Uploading: frames/DG Check-in-20230705_084639-Meeting Recording (online-video-cutter_com) (1)_mp4_frame00:06.jpg...
Uploading: frames/DG Check-in-20230705_084639-Meeting Recording (online-video-cutter_com) (1)_mp4_frame00:07.jpg...
Uploading: frames/DG Check

## List Files

After uploading the file, you can verify the API has successfully received the files by calling `files.list`.

`files.list` lets you see all files that have been uploaded to the File API that are associated with the Cloud project your API key belongs to. Only the `name` (and by extension, the `uri`) are unique.

In [12]:
# List files uploaded in the API
for n, f in zip(range(len(uploaded_files)), genai.list_files()):
  print(f.uri)

https://generativelanguage.googleapis.com/v1beta/files/q6lm3az0psvk
https://generativelanguage.googleapis.com/v1beta/files/6h08tphddk39
https://generativelanguage.googleapis.com/v1beta/files/jxiho2av7dln
https://generativelanguage.googleapis.com/v1beta/files/3b41m5ceoyf3
https://generativelanguage.googleapis.com/v1beta/files/s0qags8me65n
https://generativelanguage.googleapis.com/v1beta/files/b8jn9j19ff5
https://generativelanguage.googleapis.com/v1beta/files/ymbkhsaa1md3
https://generativelanguage.googleapis.com/v1beta/files/glipci61f84o
https://generativelanguage.googleapis.com/v1beta/files/u2nnot9yiqwi
https://generativelanguage.googleapis.com/v1beta/files/fl4j57a6kyx4
https://generativelanguage.googleapis.com/v1beta/files/wp9vpdc90i75
https://generativelanguage.googleapis.com/v1beta/files/5pqr0dg1egoc
https://generativelanguage.googleapis.com/v1beta/files/dr3dmr455yn2
https://generativelanguage.googleapis.com/v1beta/files/2qdj8lae771v
https://generativelanguage.googleapis.com/v1beta/

## Generate Content

After the file has been uploaded, you can make `GenerateContent` requests that reference the File API URI.

To understand videos with Gemini 1.5 Pro, provide 2 consecutive `Part`s for each frame: a `text` part with the **timestamp** and `fileData` part with the frame's **image URI**:

```
part { text = "00:00" }
part { fileData = fileData {
  fileUri = "https://generativelanguage.googleapis.com/v1/files/frame-0"
  mimeType = "image/jpeg"
}}
```

In [16]:
ROOT_DIR = "<path-to-pyannote-github-repo>/pyannote-audio"
AUDIO_FILE = "DG Check-in-20230705_084639-Meeting Recording (online-video-cutter.com) (1).wav"

In [18]:
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="hugging_face_kry")

# send pipeline to GPU (when available)
import torch
pipeline.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))

# apply pretrained pipeline
diarization = pipeline("DG Check-in-20230705_084639-Meeting Recording (online-video-cutter.com) (1).wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

start=0.4s stop=2.1s speaker_SPEAKER_02
start=6.3s stop=8.5s speaker_SPEAKER_02
start=9.2s stop=9.9s speaker_SPEAKER_02
start=11.5s stop=12.6s speaker_SPEAKER_02
start=11.9s stop=12.4s speaker_SPEAKER_01
start=13.4s stop=13.4s speaker_SPEAKER_01
start=13.4s stop=15.2s speaker_SPEAKER_02
start=14.0s stop=14.3s speaker_SPEAKER_01
start=16.3s stop=17.8s speaker_SPEAKER_02
start=18.6s stop=23.9s speaker_SPEAKER_01
start=19.6s stop=21.0s speaker_SPEAKER_02
start=24.7s stop=27.0s speaker_SPEAKER_01
start=27.6s stop=32.5s speaker_SPEAKER_01
start=34.0s stop=34.2s speaker_SPEAKER_01
start=34.7s stop=36.0s speaker_SPEAKER_01
start=37.8s stop=41.9s speaker_SPEAKER_01
start=42.4s stop=45.0s speaker_SPEAKER_01
start=43.0s stop=43.5s speaker_SPEAKER_02
start=46.3s stop=46.7s speaker_SPEAKER_03
start=47.9s stop=50.2s speaker_SPEAKER_03
start=50.9s stop=52.0s speaker_SPEAKER_03
start=55.2s stop=56.8s speaker_SPEAKER_02
start=57.9s stop=59.2s speaker_SPEAKER_02
start=59.3s stop=59.4s speaker_SPEAKER_0

In [20]:
audio_file = genai.upload_file(path='DG Check-in-20230705_084639-Meeting Recording (online-video-cutter.com) (1).mp3')

In [22]:
# Create the prompt.
prompt = "based on whose name is highlighted in the meeting (on right side of screen) at each point in the audio file, determine accurately who speaks a lot and who does not. name lighting up = speaking. also give feedback for this meeting. you have to give feedback to each member who spoke, level of participation, as well as overall meeting productivity. also note important things discussed, and possible future tasks"
#prompt = "What was the first thing said in the meeting?"
# Set the model to Gemini 1.5 Pro.
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")

# Make GenerateContent request with the structure described above.
def make_request(prompt, files):
  request = [prompt]
  for file in files:
    request.append(file.timestamp)
    request.append(file.response)
  return request

# Make the LLM request.
request = make_request(prompt, uploaded_files)
request.append(audio_file)
response = model.generate_content(request,
                                  request_options={"timeout": 600})
print(response.text)

RetryError: Timeout of 60.0s exceeded, last exception: 503 failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B2607:f8b0:4009:81a::200a%5D:443: Network is unreachable

## Delete Files

Files are automatically deleted after 2 days or you can manually delete them using `files.delete()`.

In [None]:
print(f'Deleting {len(uploaded_files)} images. This might take a bit...')
for file in uploaded_files:
  genai.delete_file(file.response.name)
  print(f'Deleted {file.file_path} at URI {file.response.uri}')
print(f"Completed deleting files!\n\nDeleted: {len(uploaded_files)} files")

## Learning more

The File API lets you upload a variety of multimodal MIME types, including images and audio formats. The File API handles inputs that can be used to generate content with [`model.generateContent`](https://ai.google.dev/api/rest/v1/models/generateContent) or [`model.streamGenerateContent`](https://ai.google.dev/api/rest/v1/models/streamGenerateContent).

The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API.

* Learn more about the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb) with the quickstart.

* Learn more about prompting with [media files](https://ai.google.dev/tutorials/prompting_with_media) in the docs, including the supported formats and maximum length.