## Overview

The Gemini File API provides a simple way for developers to upload files and use them with the Gemini API in multimodal scenarios. This notebook shows how to use the File API to upload an image and include it in a `GenerateContent` call to the Gemini API. For more information, refer to the [File API FAQ](https://docs.google.com/document/d/1WBVc5W6PZvgaHLV43UGSrtwHqUmofPT0K0oHuNd7GHA) or the [API documentation](https://ai.google.dev/api/rest/v1beta/files).

Note: This API is currently in beta and is [only available in certain regions](https://ai.google.dev/available_regions).

## Setup


### Install the Gemini API Python SDK



In [None]:
# Install the Python SDK
!pip install google-generativeai

### Authentication Overview

**Important:** The File API uses API keys for authentication and access. Uploaded files are associated with the API key's cloud project. Unlike other Gemini APIs that use API keys, your API key also grants access data you've uploaded to the File API, so take extra care in keeping your API key secure. For best practices on securing API keys, refer to Google's [documentation](https://support.google.com/googleapi/answer/6310037).

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/gemini-api-cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [None]:
import google.generativeai as genai
from IPython.display import Markdown
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

## Extract Video Image Frames

The Gemini API currently does not support video files directly. Instead, you can provide a series of timestamps and image files. In this quickstart, we will extract 1 frame a second from a the short film "Big Buck Bunny" file using [OpenCV](https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html).

> "Big Buck Bunny" is (c) copyright 2008, Blender Foundation / www.bigbuckbunny.org and [licensed](https://peach.blender.org/about/) under the [Creative Commons Attribution 3.0](http://creativecommons.org/licenses/by/3.0/) License.

In [None]:
video_file_name = "https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4"

To upload your own video, see the [Appendix section](#scrollTo=OexqR86FIiCu).

### Extract frames using OpenCV

The following code uses OpenCV to extract image frames from the video at 1 frame per second.

If you uploaded your own file in the colab, change `video_file_name` to the full file name like: `gemini.mp4`.

In [None]:
import cv2
import os
import shutil

# Create or cleanup existing extracted image frames directory.
FRAME_EXTRACTION_DIRECTORY = "/content/frames"
FRAME_PREFIX = "_frame"
def create_frame_output_dir(output_dir):
  if not os.path.exists(output_dir):
    os.makedirs(output_dir)
  else:
    shutil.rmtree(output_dir)
    os.makedirs(output_dir)

def extract_frame_from_video(video_file_path):
  print(f"Extracting {video_file_path} at 1 frame per second. This might take a bit...")
  create_frame_output_dir(FRAME_EXTRACTION_DIRECTORY)
  vidcap = cv2.VideoCapture(video_file_path)
  fps = int(vidcap.get(cv2.CAP_PROP_FPS))
  output_file_prefix = os.path.basename(video_file_path).replace('.', '_')
  success,image = vidcap.read()
  frame_count = 0  # Initialize a frame counter
  count = 0
  while vidcap.isOpened():
      success, frame = vidcap.read()
      if not success:  # End of video
          break
      if count % int(fps) == 0:  # Extract a frame every second
          image_name = f"{output_file_prefix}{FRAME_PREFIX}{frame_count:04d}.jpg"
          output_filename = os.path.join(FRAME_EXTRACTION_DIRECTORY, image_name)
          cv2.imwrite(output_filename, frame)
          frame_count += 1
      count += 1
  vidcap.release()  # Release the capture object
  print(f"Completed video frame extraction!\n\nExtracted: {frame_count} frames")

extract_frame_from_video(video_file_name)

## Upload files to the File API

Once we have the frames extracted, we are now ready to upload the frames to the API.

The File API lets you upload a variety of multi-modal MIME types including images. The File API is only intended as input to generate content and has the following attributes:

* Can only be used with [`model.generateContent`](https://ai.google.dev/api/rest/v1beta/models/generateContent) or [`model.streamGenerateContent`](https://ai.google.dev/api/rest/v1beta/models/streamGenerateContent)
* Automatic file deletion after 2 days
* Maximum 2GB per file, 20GB limit per project
* No downloads allowed

The following code uploads all the extracted frames to the File API while tracking the URIs and timestamps for the frames in `files_to_upload`. "Big Buck Bunny" may take 4 minutes to upload.

Note: The uploads in this colab are made serially. For faster uploads, you can make parallel `CreateFile` requests.

In [None]:
import os

class File:
  def __init__(self, file_path: str, display_name: str = None,
               timestamp_seconds: int = None):
    self.file_path = file_path
    if display_name:
      self.display_name = display_name
    if timestamp_seconds != None:
      self.timestamp = seconds_to_time_string(timestamp_seconds)

  def set_file_response(self, response):
    self.response = response

def seconds_to_time_string(seconds):
  """Converts an integer number of seconds to a string in the format '00:00'.
     Format is the expected format for Gemini 1.5.
  """
  minutes = seconds // 60
  seconds = seconds % 60
  return f"{minutes:02d}:{seconds:02d}"

def get_timestamp_seconds(filename):
  """Extracts the frame count (as an integer) from a filename with the format
     'output_file_prefix_frame0000.jpg'.
  """
  parts = filename.split(FRAME_PREFIX)
  if len(parts) != 2:
      return None  # Indicate that the filename might be incorrectly formatted

  frame_count_str = parts[1].split(".")[0]
  return int(frame_count_str)

# Process each frame in the output directory
files = os.listdir(FRAME_EXTRACTION_DIRECTORY)
files = sorted(files)  # Sort alphabetically
files_to_upload = []
for file in files:
  files_to_upload.append(
      File(file_path=os.path.join(FRAME_EXTRACTION_DIRECTORY, file),
           timestamp_seconds=get_timestamp_seconds(file)))

# Upload the files to the API
uploaded_files = []
print(f'Uploading {len(files_to_upload)} files. This might take a bit...')
for file in files_to_upload:
  print(f'Uploading: {file.file_path}...')
  response = genai.upload_file(path=file.file_path)
  file.set_file_response(response)
  uploaded_files.append(file)

print(f"Completed file uploads!\n\nUploaded: {len(uploaded_files)} files")

## List Files

`files.list` lets you see all files that have been uploaded to the File API that are associated with the Cloud project your API key belongs to. Only the `name` (and by extension, the `uri`) are unique. Only use the `displayName` to identify files if you manage uniqueness yourself.

In [None]:
# List files uploaded in the API
for n, f in zip(range(len(uploaded_files)), genai.list_files()):
  print(f.uri)

## Generate Content

After the file has been uploaded, you can make `GenerateContent` requests that reference the File API URI.

To understand videos with Gemini 1.5 Pro, provide 2 consecutive `Part`s for each frame: a `text` part with the **timestamp** and `fileData` part with the frame's **image URI**:

```
part { text = "00:00" }
part { fileData = fileData {
  fileUri = "https://generativelanguage.googleapis.com/v1/files/frame-0"
  mimeType = "image/jpeg"
}}
```

In [None]:
# Create the prompt.
prompt = "Tell me about this video."

# Set the model to Gemini 1.5 Pro.
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")

# Make GenerateContent request with the structure described above.
def makeRequest(prompt, files):
  request = [prompt]
  for file in files:
    request.append(file.timestamp)
    request.append(file.response)
  return request

# Make the LLM request.
response = model.generate_content(makeRequest(prompt, uploaded_files),
                                  request_options={"timeout": 600})
Markdown(">" + response.text)

## Delete Files

Files are automatically deleted after 2 days or you can manually delete them using `files.delete()`.

In [None]:
# Delete the files with its resource name
print(f'Deleting {len(files_to_upload)} images. This might take a bit...')
for file in uploaded_files:
  genai.delete_file(file.response.name)
  print(f'Deleted {file.file_path} as URI {file.uri}')
print(f"Completed deleting files!\n\nDeleted: {len(uploaded_files)} files")

## Appendix: Use your own files by uploading to colab

This notebook useds the Files Api with files that were downloaded from the internet. If you're running thin in Colab and want to use your own files you first need to uplload them to the colab instance.

First, click **Files** on the left sidebar, then click the **Upload** button:

<img width=400 src="https://ai.google.dev/tutorials/images/colab_upload.png">

Next, we'll extract the files and upload the frames to the File API. In the form for the code cell below, enter the filename for the file you uploaded and provide an appropriate display name for the file, then run the cell.


In [None]:
# Specify the file to upload
my_filename = "gemini_logo.png" # @param {type:"string"}
my_file_display_name = "Gemini Logo" # @param {type:"string"}

extract_frame_from_video(video_file_name)

# Process each frame in the output directory
files = os.listdir(FRAME_EXTRACTION_DIRECTORY)
files = sorted(files)  # Sort alphabetically
files_to_upload = []
for file in files:
  files_to_upload.append(
      File(file_path=os.path.join(FRAME_EXTRACTION_DIRECTORY, file),
           timestamp_seconds=get_timestamp_seconds(file)))

# Upload the files to the API.
uploaded_files = []
print(f'Uploading {len(files_to_upload)} files. This might take a bit...')
for file in files_to_upload:
  print(f'Uploading: {file.file_path}...')
  response = genai.upload_file(path=file.file_path)
  file.set_file_response(response)
  uploaded_files.append(file)
print(f"Completed file uploads!\n\nUploaded: {len(uploaded_files)} files")

# Make the LLM request.
response = model.generate_content(makeRequest(prompt, uploaded_files))
Markdown(">" + response.text)