<a href="https://colab.research.google.com/github/mapsguy/programming-gemini/blob/main/understanding_images_audio_video.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [32]:
#step 1: install/upgrade the latest genai SDK
%pip install google-genai --upgrade --quiet

In [2]:
#import the genai library
from google import genai

In [4]:
#step 2: AIStudio: read the api key from the user data
from google.colab import userdata
client = genai.Client(api_key=userdata.get("GEMINI_API_KEY"))

#If you want to read from environment keys
#import os
#client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

In [5]:
#step 3: Get model details
model_name = "models/gemini-2.5-flash-preview-05-20"
try:
    model_details = client.models.get(model=model_name) #
    print(f"Details for model '{model_name}':")
    print(f"Model Name: {model_details.name}")
    print(f"Input Token Limit: {model_details.input_token_limit}")
    print(f"Output Token Limit: {model_details.output_token_limit}")
except Exception as e:
    print(f"Error retrieving model details for '{model_name}': {e}")

Details for model 'models/gemini-2.5-flash-preview-05-20':
Model Name: models/gemini-2.5-flash-preview-05-20
Input Token Limit: 1048576
Output Token Limit: 65536


In [10]:
#step 4: inline image example (smaller, single-use images)

import PIL.Image
img = PIL.Image.open('/content/geostationaryimage.jpeg')
prompt = ["What is in this picture?", img]
response = client.models.generate_content(
    model=model_name,
    contents = prompt)
print(response.text)

This is a picture of **Earth**, viewed from space.

You can clearly see the continents of Africa and parts of Europe and Asia, surrounded by oceans and swirling white clouds.


In [15]:
#step 5: File API Image example (for files > 4MB, or needed for multiple references
# 1. Upload the file
sample_file = client.files.upload(file="/content/geostationaryimage.jpeg")

# 2. Include the file in the prompt
prompt = ["Describe this drawing.", sample_file]
response = client.models.generate_content(
    model = model_name,
    contents = prompt)

print(response.text)

This drawing depicts a vibrant, full-disc view of **planet Earth** set against a solid black background, suggesting the vastness of space.

Here are the key features:

*   **Shape and Background:** The Earth is rendered as a perfect sphere, taking up most of the frame, with the deep black background providing a stark contrast that highlights its features.
*   **Continents:**
    *   **Africa** is prominently centered, showcasing its diverse terrains from the arid browns of the Sahara Desert in the north to more verdant, darker shades towards the equator and south.
    *   **Europe** is visible above Africa, partially obscured by clouds.
    *   A portion of **South America** can be seen on the left side of the sphere, with its eastern coastline and some interior landmass.
    *   To the right of Africa, parts of the **Arabian Peninsula** are visible, extending into what would be Asia.
*   **Oceans:** The vast expanse of the Atlantic Ocean is visible to the west of Africa, appearing dar

In [16]:
# Optional: Delete the file after use
client.files.delete(name=sample_file.name)

DeleteFileResponse()

In [31]:
#step 6: File API video/audio example

# 1. Upload video or audio file
media_file = client.files.upload(file="/content/video.mp4")

# Or: media_file = genai.upload_file(path="audio.mp3", display_name="Sample Audio")

# 2. Wait for processing (important for video/audio)
while media_file.state.name == "PROCESSING":
    print('.', end='')
    #time.sleep(10) # Check status periodically
    media_file = client.files.get(name=media_file.name)
if media_file.state.name == "FAILED":
    raise ValueError(media_file.state.name)

# 3. Prompt with the file
prompt = ["Summarize the message in this video.", media_file]
# Or: prompt = ["Transcribe the first minute of this audio.", media_file]
response = client.models.generate_content(
    model = model_name,
    contents = prompt)

print(response.text)

....................The speaker describes a "healthy competitive culture" where the success of one team motivates and drives other teams to also achieve success.


In [23]:
prompt = ["Transcribe the first one minute of this audio.", media_file]
response = client.models.generate_content(
    model = model_name,
    contents = prompt)

print(response.text)

They have a very healthy competitive culture where the success of one team starts driving that that motivation to experience success on another team. Um and so they really uh benefit, I think, from that that uh


In [30]:
#step 7: combining modalities
prompt = [
    "Describe the differences between these two images.",
    client.files.upload(file="/content/tabbycat.jpeg"), # PIL.Image object or File object
    client.files.upload(file="/content/calicocat.jpeg") # PIL.Image object or File object
]
response = client.models.generate_content(
    model = model_name,
    contents = prompt)
print(response.text)

Here are the differences between the two images:

1.  **Cat's Coat Pattern and Color:**
    *   **Image 1 (Top):** Features a **tabby cat** with a brown and grey striped (mackerel tabby) coat pattern.
    *   **Image 2 (Bottom):** Features a **calico cat** with distinct patches of black, orange/brown, and white fur.

2.  **Cat's Pose and Framing:**
    *   **Image 1 (Top):** Shows the cat in a full body shot, sitting upright, looking directly at the viewer.
    *   **Image 2 (Bottom):** Is a closer-up shot, focusing on the cat's head and upper body, with its head slightly tilted.

3.  **Cat's Eye Color:**
    *   **Image 1 (Top):** The tabby cat has predominantly green eyes.
    *   **Image 2 (Bottom):** The calico cat has more yellowish-green eyes.

4.  **Background Environment:**
    *   **Image 1 (Top):** The background consists of bare tree branches against a light blue or cloudy sky, possibly indicating a cooler season (winter or early spring) or a specific type of tree. The cat a