<a href="https://colab.research.google.com/github/perrynow/How-to-Predict-Stock-Prices-Easily-Demo/blob/master/Copy_of_%5BMake_a_copy%5D_VertexAI_Anon_Bob.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# EAP: Anon Bob on Vertex AI

| Author |
| --- |
| [Katie Nguyen](https://github.com/katiemn) |

## Overview

**CONFIDENTIAL, DO NOT CIRCULATE**

<font color='red'> Per the Early Access Program terms, do not publicly share or publish information about this feature (including its outputs) until given permission by the Vertex AI team. Do not use the model in production or with external end users.</font>

This notebook will show you how to use the upcoming image model (codenamed `anon-bob`). This new model is able to show its work, allowing you to see the 'thought process' behind the generated output.

In this tutorial, you'll learn how to use the model in Vertex AI using the Google Gen AI SDK to try out the following scenarios:

- Image generation:
  - Text-to-image generation
  - Model thoughts
  - Grounding with search
- Image editing:
  - Localization
  - Multi-turn image editing (chat)
  - Editing with multiple reference images

**NOTE:** Expect higher latency when using this model compared to Gemini 2.5 Flash Image (Nano Banana) as a result of the more advanced capabilities.

**DISCLAIMER: Features are subject to change at any time. This model is experimental, so it may be unstable. There is very limited capacity for testing, so please limit to <font color='red'>10QPM</font>.**

# We want your feedback!

Your experience and insights are important to us as we refine Image-out for general availability. We'd particularly love to hear about:

* What use cases do you have in mind for Anon-Bob? Please provide a quote and example image for how you are using Anon-Bob and where it was helpful.
* How does the new text rendering capability impact your workflows?
* Are you finding the new Google Search integration useful for creating grounded imagery?
* What are your thoughts on the latency versus quality trade-off for this larger model?


Give us your feedback on this [form](https://forms.gle/HFVNzat1NaMZLPaL8). Also tell us if you are interested in **being featured in our launch communications**.

## Get started

### Install Google Gen AI SDK for Python

For confidentiality reasons, the SDK is in a private GCP bucket.

In [None]:
%pip install --upgrade --quiet google-genai

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Import libraries

In [None]:
from IPython.display import Image, Markdown, display
from google import genai
from google.genai import types

### Set Google Cloud project information and create client

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
import os

PROJECT_ID = ""  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = "global"

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

### Load the model

In [None]:
MODEL_ID = "anon-bob"

## Image generation

### Text-to-image

In the cell below, you'll call the `generate_content` method and modify the following arguments:

  - `prompt`: A text only user message describing the image to be generated.
  - `config`: A config for specifying content settings.
    - `response_modalities`: To generate an image, you must include `IMAGE` in the `response_modalities` list. To get both text and images, specify `IMAGE` and `TEXT`.
    - `ImageConfig`: Set the `aspect_ratio`. Valid ratios are: 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9

All generated images include a [SynthID watermark](https://deepmind.google/technologies/synthid/), which can be verified via the Media Studio in [Vertex AI Studio](https://cloud.google.com/generative-ai-studio?hl=en).

In [None]:
prompt = """
Generate a hyper-realistic infographic of a gourmet cheeseburger, deconstructed to show the texture of the toasted brioche bun, the seared crust of the patty, and the glistening melt of the cheese.
"""
response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt,
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE', 'TEXT'],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
        ),
    ),
)

# Check for errors if an image is not generated
if response.candidates[0].finish_reason != types.FinishReason.STOP:
    reason = response.candidates[0].finish_reason
    raise ValueError(f"Prompt Content Error: {reason}")

for part in response.candidates[0].content.parts:
    if part.thought:
        continue # skip displaying thoughts
    if part.inline_data:
        display(Image(data=part.inline_data.data, width=1000))

### See the thoughts

This is a thinking model, you can check the thoughts that led to the image being produced.

In [None]:
for part in response.parts:
  if part.thought:
    if part.text:
      display(Markdown(part.text))
    elif part.inline_data:
      display(Image(data=part.inline_data.data, width=500))

### Use search grounding

Note that the model is only grounded on text results and not images that can be found on Google Search.

In [None]:
prompt = """
Visualize the current weather forecast for the next 5 days in San Francisco in a clean, modern weather chart. Add a visual on what I could wear each day.
"""
google_search = types.Tool(google_search=types.GoogleSearch())

response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt,
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="21:9",
        ),
        tools=[google_search],
    )
)

for part in response.candidates[0].content.parts:
    if part.text:
        display(Markdown(part.text))
    if part.inline_data:
        display(Image(data=part.inline_data.data, width=500))

## Image editing

You can also edit images with this model, simply pass the original image as part of the prompt.

### Localization

You can also translate the text in images through image editing. Start by downloading the image and displaying it below.

In [None]:
!wget https://storage.googleapis.com/cloud-samples-data/generative-ai/image/flying-sneakers.png

starting_image = "flying-sneakers.png"
display(Image(filename=starting_image, width=500))

In [None]:
with open(starting_image, "rb") as f:
    image = f.read()

response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        types.Part.from_bytes(
            data=image,
            mime_type="image/png",
        ),
        "Change the text in this infographic from English to Spanish.",
    ],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            image_size="1K",
        ),
    )
)

for part in response.candidates[0].content.parts:
    if part.text:
        display(Markdown(part.text))
    if part.inline_data:
        display(Image(data=part.inline_data.data, width=500))

### Multi-turn image editing (chat)

In this next section, you'll generate a starting image and iteratively alter certain aspects of the image by chatting with the model.

In [None]:
chat = client.chats.create(
    model=MODEL_ID,
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE']
    )
)

message = "Create an image of a clear perfume bottle sitting on a vanity."
response = chat.send_message(message)

for part in response.candidates[0].content.parts:
    if part.text:
        display(Markdown(part.text))
    if part.inline_data:
        display(Image(data=part.inline_data.data, width=500))

In [None]:
message = "Make the perfume bottle purple and add a vase of hydrangeas next to the bottle."
response = chat.send_message(message)

for part in response.candidates[0].content.parts:
    if part.text:
        display(Markdown(part.text))
    if part.inline_data:
        display(Image(data=part.inline_data.data, width=500))

### Multiple reference images

While Gemini 2.5 Flash Image is limited to 3 reference images, this model supports up to 6.

Run the following cell to visualize the starting images stored in Cloud Storage.

In [None]:
import requests
from PIL import Image as PIL_Image
from io import BytesIO
import matplotlib.pyplot as plt

image_urls = [
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/woman.jpg",
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/suitcase.png",
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/armchair.png",
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/man-in-field.png",
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/shoes.jpg",
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/living-room.png",
]

fig, axes = plt.subplots(2, 3, figsize=(12, 8))
for i, ax in enumerate(axes.flatten()):
    ax.imshow(PIL_Image.open(BytesIO(requests.get(image_urls[i]).content)))
    ax.axis("off")
plt.show()

The process for sending the request is similar to previous image editing calls. The main difference is that you will provide multiple `Part.from_uri` instances, one for each reference image.

In [None]:
response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        types.Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/woman.jpg",
            mime_type="image/jpeg",
        ),
        types.Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/suitcase.png",
            mime_type="image/png",
        ),
        types.Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/armchair.png",
            mime_type="image/png",
        ),
        types.Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/man-in-field.png",
            mime_type="image/png",
        ),
        types.Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/shoes.jpg",
            mime_type="image/jpeg",
        ),
        types.Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/living-room.png",
            mime_type="image/png",
        ),
        "Generate an image of a woman sitting in a living room with a man, both wearing sneakers. The woman is sitting in a white armchair with a blue suitcase next to her.",
    ],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
        ),
    ),
)


for part in response.candidates[0].content.parts:
    if part.text:
        display(Markdown(part.text))
    if part.inline_data:
        display(Image(data=part.inline_data.data, width=500))

# We want your feedback!

Your experience and insights are important to us as we refine Image-out for general availability. We'd particularly love to hear about:

* What use cases do you have in mind for Anon-Bob? Please provide a quote and example image for how you are using Anon-Bob and where it was helpful.
* How does the new text rendering capability impact your workflows?
* Are you finding the new Google Search integration useful for creating grounded imagery?
* What are your thoughts on the latency versus quality trade-off for this larger model?


Give us your feedback on this [form](https://forms.gle/HFVNzat1NaMZLPaL8). Also tell us if you are interested in **being featured in our launch communications**.