<a href="https://colab.research.google.com/github/samw8/aiie2025-/blob/main/Copy_of_ImageGeneration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gemini for images

References:

* https://ai.google.dev/gemini-api/docs/image-generation#before_you_begin
* https://ai.google.dev/gemini-api/docs/image-understanding

# Setting up

In [1]:
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64

In [4]:
from google.colab import userdata

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

In [5]:
client = genai.Client(api_key=GOOGLE_API_KEY)

In [6]:
#for model_info in client.models.list():
#    print(model_info.name)

# Generate an image

In [7]:
# Prompt
contents = ('Hi, can you create a 3d rendered image of a cat '
            'with long ears on top of an airplane '
            'realistic and bright')

In [8]:
response = client.models.generate_content(
    model="gemini-2.0-flash-exp-image-generation",
    contents=contents,
    config=types.GenerateContentConfig(
      response_modalities=['TEXT', 'IMAGE']
    )
)

In [11]:
for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO((part.inline_data.data)))
    image.save('gemini-native-image2.png') #file name
    image.show()

I will generate a 3D rendering of a realistic scene featuring a cat with exceptionally long ears perched on the wing of a bright, modern airplane under a clear sky. The cat will be detailed with realistic fur texture, and the airplane will reflect sunlight, giving the scene a vibrant and playful yet believable quality.



# Describe a local image

In [13]:
my_file = client.files.upload(file="/content/gemini-native-image2.png")

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[my_file, "Caption this image in less than 5 words."],
)

print(response.text)

Sky-high feline adventure.


In [14]:
my_file = client.files.upload(file="/content/gemini-native-image2.png")

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[my_file, "Caption this image."],
)

print(response.text)

Here are a few captions for the image:

* "Buckle up, we're in for a purr-fectly turbulent flight."
* "Looks like someone's not afraid of heights."
* "New fear unlocked: Cat on a plane's wing."
* "Just a cat, living the high life."
* "This is my purr-sonal jet."


## Describe an image from a URL

In [15]:
import requests

image_path = "https://goo.gle/instrument-img"
image_bytes = requests.get(image_path).content
image = types.Part.from_bytes(
  data=image_bytes, mime_type="image/jpeg"
)

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents=["What is this image?", image],
)

print(response.text)

The image shows a console for a pipe organ. It features multiple keyboards (manuals), ranks of stops for selecting different sounds, foot pedals, and various controls. The console is made of wood and has an intricate design.


# Edit an image

In [16]:
image = Image.open('/content/gemini-native-image2.png') #path to file

client = genai.Client(api_key='AIzaSyBCJf9S_5eIp_gSX0vpUujGhu4QifJyoW8') #api key

text_input = ('Hi, This is a picture of a cat on an airplane.' #edit prompts
            'Can you add a llama next to it?',)

response = client.models.generate_content(
    model="gemini-2.0-flash-preview-image-generation",
    contents=[text_input, image],
    config=types.GenerateContentConfig(
      response_modalities=['TEXT', 'IMAGE']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO(part.inline_data.data))
    image.show()
    image.save('gemini-edited-image.png') #new file name

I will add a fluffy white llama standing calmly next to the tabby cat on the airplane wing, against the backdrop of the bright blue sky and puffy white clouds below.




# Differences between images

In [19]:
# Upload the first image
image1_path = "/content/gemini-edited-image.png"
uploaded_file = client.files.upload(file=image1_path)

# Prepare the second image as inline data
image2_path = "/content/gemini-native-image2.png"
with open(image2_path, 'rb') as f:
    img2_bytes = f.read()

# Create the prompt with text and multiple images
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        "What is different between these two images?",
        uploaded_file,  # Use the uploaded file reference
        types.Part.from_bytes(
            data=img2_bytes,
            mime_type='image/png'
        )
    ]
)

print(response.text)

The two images show different animals on the wing of an airplane.

*   **Image 1:** Shows a llama standing on an airplane wing.
*   **Image 2:** Shows a cat standing on an airplane wing.
