<a href="https://colab.research.google.com/github/mapsguy/programming-gemini/blob/main/image_understanding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [15]:
#step 1: install/upgrade the latest genai SDK
%pip install google-genai --upgrade --quiet

In [2]:
#import the genai library
from google import genai

In [3]:
#step 2: AIStudio: read the api key from the user data
from google.colab import userdata
client = genai.Client(api_key=userdata.get("GEMINI_API_KEY"))

#If you want to read from environment keys
#import os
#client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

In [4]:
model_name = "models/gemini-2.5-flash-preview-05-20"

Methods for Providing Images to Gemini

In [5]:
#PIL Objects: convenient for local image files
import PIL.Image
img = PIL.Image.open("/content/cat.jpeg")

prompt = ["What is in this image?", img]

response = client.models.generate_content(
    model=model_name,
    contents=prompt)

print(response.text)

This image contains a close-up of a **cat's face**.

It appears to be a brown and black tabby cat with prominent stripes, especially on its forehead. It has striking yellow or amber eyes and long white whiskers. The cat is looking directly at the viewer with a somewhat serious or intense expression.


In [6]:
#Inline Base64 Strings: embed image data into request as a base64 encoded string
import base64

with open("/content/cat.jpeg", "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode('utf-8')

image_part = {"inline_data": {"mime_type": "image/jpeg", "data": image_data}}

response = client.models.generate_content(
    model=model_name,
    contents=["Describe this scene.", image_part])

print(response.text)


This is a striking close-up portrait of a domestic cat, focusing on its head and upper chest.

Here's a breakdown of the scene:

*   **Subject:** The central figure is a beautiful **brown tabby cat**. Its fur features distinct dark brown/black stripes and swirls against a lighter, warm brown or tan base. The classic "M" pattern is visible on its forehead.
*   **Eyes:** The most captivating feature is its large, round, **amber or golden eyes**, which stare directly forward with an intense, almost stern, or unimpressed expression. The slight narrowing of the eyelids contributes to a look often described as "grumpy" or "thoughtful."
*   **Facial Features:** The cat has a broad, somewhat rounded face. A small, pinkish-brown nose sits centrally. Numerous long, prominent white whiskers fan out widely from its muzzle. Its ears are erect, pointed, and covered in fur, with some lighter tufts visible inside.
*   **Body:** Only the cat's head and upper chest are clearly visible, suggesting it mig

In [9]:
#File API: larger files, reuse across calls

#Upload the file first
sample_file = client.files.upload(file="/content/cat.jpeg")

#Then use the file in a prompt
response = client.models.generate_content(
    model=model_name,
     contents=["Caption this image:", sample_file])

print(response.text)


Here are a few options for a caption, playing on the cat's very serious expression:

**Playing on Disapproval/Judgment:**

*   "I'm not mad, I'm just disappointed."
*   "The ultimate judge of your life choices."
*   "We need to talk."
*   "That look when you've clearly messed up."
*   "My face when I hear unsolicited advice."
*   "Pretty sure this cat just audited my life."

**Relatable & Humorous:**

*   "Monday morning vibes."
*   "When the Zoom meeting could have been an email."
*   "The librarian cat who shushes you with a glare."
*   "Still waiting for that second breakfast."
*   "Plotting world domination, one glare at a time."

**Short & Punchy:**

*   "Unimpressed."
*   "Serious business."
*   "The intense stare."
*   "Judging you silently."
*   "No nonsense."

Choose the one that best fits the vibe you're going for!


Image Interpretation: Captioning, VQA, and Spatial Awareness

In [10]:
#image captioning
img = PIL.Image.open("/content/cat.jpeg")

response = client.models.generate_content(
    model=model_name,
    contents=["Generate a short caption for this image.", img])

print(response.text)


Here are a few short caption options:

*   Those piercing golden eyes.
*   Serious cat is serious.
*   Silently judging.
*   Intense gaze.
*   Not amused.


In [13]:
#visual question answering (VQA): ask specific questions about content

response = client.models.generate_content(
    model=model_name,
    contents=["What is the main color of the cat in the image?", img])
print(response.text)

The cat in the image is a **dark brown and black tabby**.

*   Its fur is predominantly **dark brown** with distinct **black stripes** (characteristic of a tabby pattern).
*   There are also lighter brown/tan areas, particularly around its muzzle and under its chin.
*   Its eyes are a striking **amber/gold** color.


In [14]:
#spatial awareness and object recognition
img = PIL.Image.open("/content/animals.png")

response = client.models.generate_content(
    model=model_name,
    contents=["What is the animal to the left of the dog called?", img])
print(response.text)


The animal to the left of the dog is a **cat**.
