# Gemini 2.5 Flash Image Generation using REST API

This notebook demonstrates how to use Gemini 2.5 Flash Image generation capabilities through REST API calls instead of the Google Gen AI SDK.

## Overview

Gemini 2.5 Flash Image is a powerful, generalist multimodal model that offers state-of-the-art image generation and conversational image editing capabilities. This enables you to converse with Gemini and create or edit images with interwoven text.

In this tutorial, you'll learn how to use Gemini 2.5 Flash Image in Vertex AI using REST API calls to try out the following scenarios:
  - Image generation:
    - Text-to-image generation
    - Interleaved image and text sequences
  - Image editing:
    - Image-to-image with subject customization and style transfer
    - Multi-turn image editing with localization
    - Editing with multiple reference images

## Setup and Configuration

In [None]:
# Install required libraries
!pip install --upgrade --quiet requests pillow matplotlib ipython base64 google-cloud-aiplatform

In [None]:
import requests
import json
import base64
import os
from io import BytesIO
from PIL import Image as PIL_Image
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from IPython.display import Image, Markdown, display
import uuid
from urllib.parse import urlparse

import base64
from vertexai.generative_models import GenerativeModel, Part, SafetySetting
from google.oauth2 import service_account
import os
import vertexai
from google.cloud import storage
import json
from google.cloud.aiplatform_v1beta1.types import (
    content as gapic_content_types,
)
import requests
from google.auth.transport.requests import Request
import time
from PIL import Image
import logging
import google
import base64

import vertexai.generative_models as generative_models

from google.colab import auth
import google.auth
import google.auth.transport.requests
from google.cloud import storage
from PIL import Image
import io
import requests
import json
import base64
from diffusers.utils import make_image_grid

In [None]:
auth.authenticate_user()
creds, PROJECT = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)

### Understanding HTTP Requests with Python Requests Library

This notebook demonstrates how to use the **`requests`** library to make HTTP calls to the Gemini 2.5 Flash Image API. The `requests` library is the standard Python library for making HTTP requests and provides a simple, elegant API for interacting with REST endpoints.

#### Key `requests` Features Used:
- **`requests.get()`**: Download images from URLs  
- **`requests.post()`**: Send JSON payloads to the Gemini API
- **Headers Management**: Set Content-Type and authentication headers
- **Error Handling**: Check HTTP status codes and handle exceptions
- **JSON Handling**: Automatic JSON encoding/decoding
- **Timeout Support**: Prevent hanging requests
- **Retry Logic**: Handle rate limiting and transient errors

#### HTTP Request Structure:
```python
# Basic pattern used throughout this notebook:
response = requests.post(
    url=API_ENDPOINT,
    headers={'Content-Type': 'application/json'},
    json=payload,
    timeout=300
)
```

### Set API Configuration

In [None]:
# Configuration
API_ENDPOINT = "aiplatform.googleapis.com"
MODEL_ID = "gemini-2.5-flash-image-preview"
GENERATE_CONTENT_API = "generateContent"
PROJECT_ID="frankie1-422709"

# Base URL for API calls
BASE_URL = f"https://{API_ENDPOINT}/v1/projects/{PROJECT_ID}/locations/global/publishers/google/models/{MODEL_ID}:{GENERATE_CONTENT_API}"

### Helper Functions

In [None]:
def encode_image_to_base64(image_path):
    """Encode local image file to base64 string"""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def download_image(url, filename):
    """Download image from URL and save locally"""
    response = requests.get(url)
    if response.status_code == 200:
        with open(filename, 'wb') as f:
            f.write(response.content)
        return filename
    else:
        raise Exception(f"Failed to download image: {response.status_code}")

def display_image_from_base64(base64_data, width=400):
    """Display image from base64 data"""
    image_data = base64.b64decode(base64_data)
    display(Image.open(io.BytesIO(image_data)))

def create_request_payload(contents, temperature=1, max_output_tokens=32768):
    """Create the request payload for Gemini API"""
    return {
        "contents": contents,
        "generation_config": {
            "temperature": temperature,
            "max_output_tokens": max_output_tokens,
            "response_modalities": ["TEXT", "IMAGE"],
            "topP": 0.95
        },
        "safetySettings": [
            {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "OFF"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF"},
            {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF"},
            {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"}
        ]
    }

def send_gemini_request(payload):
    """Send request to Gemini API and return response"""
    request = Request()
    creds.refresh(request=request)
    access_token = creds.token
    headers = {
        'Authorization': f'Bearer {access_token}',
        'Content-Type': 'application/json;charset=utf-8',
    }

    response = requests.post(BASE_URL, headers=headers, json=payload)

    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"API request failed: {response.status_code} - {response.text}")

def process_response(response_data):
    """Process and display the response from Gemini API"""
    # Handle streaming response format
    if isinstance(response_data, list):
        # For streaming responses, combine all parts
        all_parts = []
        for item in response_data:
            if 'candidates' in item and len(item['candidates']) > 0:
                if 'content' in item['candidates'][0] and 'parts' in item['candidates'][0]['content']:
                    all_parts.extend(item['candidates'][0]['content']['parts'])
    else:
        # For non-streaming responses
        if 'candidates' in response_data and len(response_data['candidates']) > 0:
            if 'content' in response_data['candidates'][0] and 'parts' in response_data['candidates'][0]['content']:
                all_parts = response_data['candidates'][0]['content']['parts']
            else:
                all_parts = []
        else:
            all_parts = []

    print(all_parts)
    # Display text and images
    for part in all_parts:
        if 'text' in part:
            display(Markdown(part['text']))
        elif 'inlineData' in part:
            if 'data' in part['inlineData']:
                display_image_from_base64(part['inlineData']['data'])

    return all_parts

## Image Generation

First, you'll send text prompts to Gemini 2.5 Flash Image describing the images you'd like to generate.

### Text to Image

In the cell below, you'll create a request payload and call the Gemini API with the following configuration:
- `responseModalities`: To generate an image, you must include `IMAGE` in the `responseModalities` list. Note that `IMAGE` cannot be the only value specified; it must be accompanied by `TEXT`.
- `safetySettings`: Configuration for content safety filtering

All generated images include a SynthID watermark, which can be verified via the Media Studio in Vertex AI Studio.

In [None]:
# Text to image generation example
text_prompt = "a cartoon infographic on flying sneakers"

contents = [
    {
        "role": "user",
        "parts": [
            {"text": text_prompt}
        ]
    }
]

payload = create_request_payload(contents)

print(f"Generating image for prompt: '{text_prompt}'  ")
print(f"{payload}")
print("Please wait...")

try:
    response = send_gemini_request(payload)
    process_response(response)
except Exception as e:
    print(f"Error: {e}")

### Text to Image and Text (Interleaved Sequences)

In addition to generating images, Gemini can also create interleaved sequences of images and text.

For example, you could ask the model to generate a recipe for banana bread with images showing different stages of the cooking process. Or, you could ask the model to generate images of different wildflowers with accompanying titles and descriptions.

Let's try out the interleaved text and image functionality by prompting Gemini 2.5 Flash Image to create a tutorial for assembling a peanut butter and jelly sandwich.

In [None]:
# Interleaved text and image generation
tutorial_prompt = "Create a tutorial explaining how to make a peanut butter and jelly sandwich in three easy steps. For each step, provide a title with the number of the step, an explanation, and also generate an image to illustrate the content. Label each image with the step number but no other words."

contents = [
    {
        "role": "user",
        "parts": [
            {"text": tutorial_prompt}
        ]
    }
]

payload = create_request_payload(contents)

print(f"Generating tutorial with interleaved text and images...")
print("Please wait...")

try:
    response = send_gemini_request(payload)
    process_response(response)
except Exception as e:
    print(f"Error: {e}")

## Image Editing

Gemini 2.5 Flash Image can generate image-to-image outputs from multiple reference images. This is useful for tasks like ensuring character consistency, generating logos, transferring styles, and inserting or removing objects.

### Subject Customization

Let's try out a subject customization example by asking Gemini 2.5 Flash Image to create an image of a dog in both a pencil sketch and watercolor style.

In [None]:
# Download the dog image
dog_image_url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/dog-1.jpg"
dog_image_path = "dog-1.jpg"

try:
    download_image(dog_image_url, dog_image_path)
    print(f"Downloaded dog image to {dog_image_path}")

    # Display the downloaded image
    img = mpimg.imread(dog_image_path)
    plt.figure(figsize=(6, 8))
    plt.imshow(img)
    plt.axis('off')
    plt.title('Original Dog Image')
    plt.show()

except Exception as e:
    print(f"Error downloading image: {e}")

In [None]:
# Subject customization with the dog image
dog_base64 = encode_image_to_base64(dog_image_path)

customization_prompt = "Create a pencil sketch image of this dog wearing a cowboy hat in a western-themed setting. Generate another image of this dog in a watercolor style floating down a river on a paddle board."

contents = [
    {
        "role": "user",
        "parts": [
            {
                "inlineData": {
                    "mimeType": "image/jpeg",
                    "data": dog_base64
                }
            },
            {"text": customization_prompt}
        ]
    }
]

payload = create_request_payload(contents)

print("Generating customized images of the dog...")
print("Please wait...")

try:
    response = send_gemini_request(payload)
    process_response(response)
except Exception as e:
    print(f"Error: {e}")

### Style Transfer

In this next example, you'll use the style from a living room to reimagine a kitchen in the same style.

In [None]:
# Download the living room image
living_room_url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/living-room.png"
living_room_path = "living-room.png"

try:
    download_image(living_room_url, living_room_path)
    print(f"Downloaded living room image to {living_room_path}")

    # Display the downloaded image
    img = mpimg.imread(living_room_path)
    plt.figure(figsize=(6, 8))
    plt.imshow(img)
    plt.axis('off')
    plt.title('Living Room Style Reference')
    plt.show()

except Exception as e:
    print(f"Error downloading image: {e}")

In [None]:
# Style transfer example
living_room_base64 = encode_image_to_base64(living_room_path)

style_transfer_prompt = "Using the concepts, colors, and themes from this living room generate a kitchen and dining room with the same aesthetic."

contents = [
    {
        "role": "user",
        "parts": [
            {
                "inlineData": {
                    "mimeType": "image/png",
                    "data": living_room_base64
                }
            },
            {"text": style_transfer_prompt}
        ]
    }
]

payload = create_request_payload(contents)

print("Generating kitchen with living room style transfer...")
print("Please wait...")

try:
    response = send_gemini_request(payload)
    process_response(response)
except Exception as e:
    print(f"Error: {e}")

### Multi-turn Image Editing

In this next section, you supply a starting image and iteratively alter certain aspects of the image by chatting with Gemini 2.5 Flash Image.

For multi-turn conversations, we'll simulate the chat functionality by maintaining conversation history.

In [None]:
# Display the perfume bottle image that we'll be editing
perfume_url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/perfume.jpg"

try:
    response = requests.get(perfume_url)
    if response.status_code == 200:
        perfume_image = PIL_Image.open(BytesIO(response.content))
        plt.figure(figsize=(6, 8))
        plt.imshow(perfume_image)
        plt.axis('off')
        plt.title('Original Perfume Bottle')
        plt.show()
    else:
        print(f"Failed to load image: {response.status_code}")
except Exception as e:
    print(f"Error loading image: {e}")

In [None]:
# First edit: Change perfume color to light purple
first_edit_prompt = "change the perfume color to a light purple"

contents = [
    {
        "role": "user",
        "parts": [
            {
                "fileData": {
                    "mimeType": "image/jpeg",
                    "fileUri": "gs://cloud-samples-data/generative-ai/image/perfume.jpg"
                }
            },
            {"text": first_edit_prompt}
        ]
    }
]

payload = create_request_payload(contents)

print("First edit: Changing perfume color to light purple...")
print("Please wait...")

try:
    response = send_gemini_request(payload)
    parts = process_response(response)

    # Save the generated image data for the next step
    generated_image_data = None
    for part in parts:
        if 'inlineData' in part and 'data' in part['inlineData']:
            generated_image_data = part['inlineData']['data']
            break

    print("First edit completed!")

except Exception as e:
    print(f"Error: {e}")

In [None]:
# Second edit: Add French text to the perfume bottle
if 'generated_image_data' in locals() and generated_image_data:
    second_edit_prompt = "inscribe the word flowers in French on the perfume bottle in a delicate white cursive font"

    contents = [
        {
            "role": "user",
            "parts": [
                {
                    "inlineData": {
                        "mimeType": "image/jpeg",
                        "data": generated_image_data
                    }
                },
                {"text": second_edit_prompt}
            ]
        }
    ]

    payload = create_request_payload(contents)

    print("Second edit: Adding French text to the perfume bottle...")
    print("Please wait...")

    try:
        response = send_gemini_request(payload)
        process_response(response)
        print("Second edit completed!")
    except Exception as e:
        print(f"Error: {e}")
else:
    print("No image data from previous step. Please run the previous cell first.")

### Multiple Reference Images

When editing images with Gemini 2.5 Flash Image, you can also supply multiple input images to create new ones. In this next example, you'll prompt Gemini with an image of a woman and a suitcase. You'll then ask Gemini to combine the objects from these images in order to create a new one.

In [None]:
# Display the reference images
person_url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/woman.jpg"
suitcase_url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/suitcase.png"

try:
    # Load and display both images
    person_response = requests.get(person_url)
    suitcase_response = requests.get(suitcase_url)

    if person_response.status_code == 200 and suitcase_response.status_code == 200:
        person_image = PIL_Image.open(BytesIO(person_response.content))
        suitcase_image = PIL_Image.open(BytesIO(suitcase_response.content))

        fig, axes = plt.subplots(1, 2, figsize=(12, 6))
        axes[0].imshow(person_image)
        axes[0].set_title('Woman')
        axes[0].axis('off')

        axes[1].imshow(suitcase_image)
        axes[1].set_title('Suitcase')
        axes[1].axis('off')

        plt.tight_layout()
        plt.show()
    else:
        print("Failed to load one or both reference images")

except Exception as e:
    print(f"Error loading images: {e}")

In [None]:
# Generate new image combining multiple references
multi_ref_prompt = "Generate an image of the woman pulling the suitcase in an airport. Separately, write a short caption for this image that would be suitable for a social media post."

contents = [
    {
        "role": "user",
        "parts": [
            {
                "fileData": {
                    "mimeType": "image/png",
                    "fileUri": "gs://cloud-samples-data/generative-ai/image/suitcase.png"
                }
            },
            {
                "fileData": {
                    "mimeType": "image/jpeg",
                    "fileUri": "gs://cloud-samples-data/generative-ai/image/woman.jpg"
                }
            },
            {"text": multi_ref_prompt}
        ]
    }
]

payload = create_request_payload(contents)

print("Generating image combining multiple references...")
print("Please wait...")

try:
    response = send_gemini_request(payload)
    process_response(response)
except Exception as e:
    print(f"Error: {e}")

### Additional Image Editing Example (Croissant with Chocolate Drizzle)

Let's implement the exact example from your REST API sample - adding chocolate drizzle to croissants with text overlay.

In [None]:
# Example from the original REST API sample
croissant_edit_prompt = 'Add some chocolate drizzle to the croissants. Include text across the top of the image that says "Made Fresh Daily".'

contents = [
    {
        "role": "user",
        "parts": [
            {
                "fileData": {
                    "mimeType": "image/jpeg",
                    "fileUri": "gs://cloud-samples-data/generative-ai/image/croissant.jpeg"
                }
            },
            {"text": croissant_edit_prompt}
        ]
    }
]

payload = create_request_payload(contents)

print("Editing croissant image: Adding chocolate drizzle and text...")
print("Please wait...")

try:
    response = send_gemini_request(payload)
    process_response(response)
except Exception as e:
    print(f"Error: {e}")

## Advanced REST API Examples

### Using Raw cURL Commands (Alternative Approach)

If you prefer to use direct cURL commands as shown in your original REST API example, here's how you can do it:

In [None]:
# Create a request.json file for cURL usage
curl_request = {
    "contents": [
        {
            "role": "user",
            "parts": [
                {
                    "fileData": {
                        "mimeType": "image/jpeg",
                        "fileUri": "gs://cloud-samples-data/generative-ai/image/croissant.jpeg"
                    }
                },
                {
                    "text": "Add some chocolate drizzle to the croissants. Include text across the top of the image that says 'Made Fresh Daily'."
                }
            ]
        }
    ],
    "generation_config": {
        "temperature": 1,
        "max_output_tokens": 32768,
        "response_modalities": ["TEXT", "IMAGE"],
        "topP": 0.95
    },
    "safetySettings": [
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "OFF"},
        {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF"},
        {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF"},
        {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"}
    ]
}

# Save to request.json file
with open('request.json', 'w') as f:
    json.dump(curl_request, f, indent=2)

print("Created request.json file for cURL usage")
print("You can now use the following cURL command:")
print(f"""curl \\
-X POST \\
-H "Content-Type: application/json" \\
"https://aiplatform.googleapis.com/v1/publishers/google/models/gemini-2.5-flash-image-preview:streamGenerateContent?key=YOUR_API_KEY" \\
-d '@request.json'""")

### Error Handling and Best Practices

In [None]:
def robust_gemini_request(payload, max_retries=3, timeout=300):
    """Send request to Gemini API with retry logic and better error handling"""
    headers = {"Content-Type": "application/json"}

    for attempt in range(max_retries):
        try:
            response = requests.post(BASE_URL, headers=headers, json=payload, timeout=timeout)

            if response.status_code == 200:
                return response.json()
            elif response.status_code == 429:  # Rate limit
                print(f"Rate limited. Waiting before retry {attempt + 1}/{max_retries}...")
                time.sleep(60)  # Wait 1 minute
            elif response.status_code == 500:  # Server error
                print(f"Server error. Retrying {attempt + 1}/{max_retries}...")
                time.sleep(10)
            else:
                raise Exception(f"API request failed: {response.status_code} - {response.text}")

        except requests.exceptions.Timeout:
            print(f"Request timeout. Retrying {attempt + 1}/{max_retries}...")
        except requests.exceptions.RequestException as e:
            print(f"Request exception: {e}. Retrying {attempt + 1}/{max_retries}...")

    raise Exception(f"Failed after {max_retries} attempts")

# Example usage with robust error handling
def generate_image_with_retry(prompt):
    """Generate image with robust error handling"""
    contents = [
        {
            "role": "user",
            "parts": [{"text": prompt}]
        }
    ]

    payload = create_request_payload(contents)

    try:
        response = robust_gemini_request(payload)
        return process_response(response)
    except Exception as e:
        print(f"Failed to generate image: {e}")
        return None

print("Robust error handling functions defined.")

## Summary and Next Steps

This notebook has demonstrated how to use Gemini 2.5 Flash Image generation capabilities through REST API calls, covering:

### Features Implemented:
1. **Text-to-Image Generation**: Create images from text descriptions
2. **Interleaved Text and Images**: Generate tutorials with mixed content
3. **Subject Customization**: Style transfer and character consistency
4. **Multi-turn Image Editing**: Conversational image editing
5. **Multiple Reference Images**: Combine multiple images into new creations

### Key Benefits of REST API Approach:
- **Direct HTTP Control**: Full control over request/response handling
- **Language Agnostic**: Can be used with any programming language
- **Custom Error Handling**: Implement your own retry logic and error management
- **Debugging**: Easy to inspect raw API requests and responses

### Usage Notes:
1. **API Key**: Replace `YOUR_API_KEY` with your actual Vertex AI API key
2. **File Access**: For Google Cloud Storage files, use `fileUri` with `gs://` prefix
3. **Local Files**: Use `inlineData` with base64 encoded image data
4. **Safety Settings**: Adjust safety thresholds based on your requirements
5. **Response Modalities**: Always include both `TEXT` and `IMAGE` for image generation

### Best Practices:
- Implement proper error handling and retry logic
- Use appropriate timeouts for image generation requests
- Handle rate limiting gracefully
- Validate image file formats and sizes before sending
- Store generated images appropriately for your use case

### Further Exploration:
- Experiment with different temperature and topP values
- Try various prompt engineering techniques
- Implement batch processing for multiple images
- Integrate with your application's authentication flow
- Explore advanced safety settings and content filtering

This REST API approach provides the same powerful image generation capabilities as the SDK while giving you maximum flexibility and control over the integration.