# Image Analysis Agent with GCS Artifact Persistence and Vertex AI Agent Engine Deployment

This notebook demonstrates how to build and deploy an image analysis application using the **Google Agent Development Kit (ADK)** and **Vertex AI Agent Engine**. It specifically showcases how to handle persistent image data using **Google Cloud Storage (GCS)** as an artifact service.

---

### Key Components

* **LlmAgent (Root Agent):** Configured with `gemini-2.0-flash` (updated to 2.5 in your code). It is given specific instructions to analyze images and is equipped with the `load_artifacts` tool to retrieve image content when it isn't directly in the prompt context.
* **GcsArtifactService:** Manages the storage of files in a GCS bucket. This allows the application to handle large files (like 4K images) or multiple files without bloating the immediate chat history.
* **SaveFilesAsArtifactsPlugin:** An ADK plugin that automatically intercepts files sent in a message and saves them as managed artifacts in the GCS bucket.
* **AdkApp:** The core application wrapper that integrates the agent, the artifact service, and the plugins into a single deployable unit.

### Workflow Summary

1. **Local Execution:** * The user sends a base64-encoded image to the `AdkApp` via a `ChatClient`.
* The `SaveFilesAsArtifactsPlugin` saves this image to GCS.
* The agent uses the `load_artifacts` tool to "see" the image, then provides a detailed description.
* In subsequent turns, the agent maintains context, correctly identifying the number of people in the image without needing the file re-uploaded.


2. **Remote Deployment:**
* The application is packaged and deployed to **Vertex AI Agent Engine**.
* The notebook verifies the remote deployment by running the same image analysis query against the cloud-hosted instance.



---

### Why this architecture matters

By using `GcsArtifactService`, you avoid passing massive amounts of raw data back and forth in every turn of a conversation. Instead, the data stays in storage, and the agent only "loads" what it needs, making the application more scalable and cost-effective for multi-modal tasks.

## Install packages

In [None]:
%pip install --upgrade --user google-adk

In [None]:
# Reboot kernel
import IPython
app = IPython.Application.instance()
_ = app.kernel.do_shutdown(True)

## Preparation

In [1]:
import base64
import os
import vertexai
from vertexai.agent_engines.templates.adk import AdkApp
from google.adk.agents import LlmAgent
from google.adk.artifacts import GcsArtifactService
from google.adk.plugins.save_files_as_artifacts_plugin import SaveFilesAsArtifactsPlugin
from google.adk.tools import load_artifacts
from google.genai.types import Part, Content

[PROJECT_ID] = !gcloud config list --format 'value(core.project)'
LOCATION = 'us-central1'

vertexai.init(project=PROJECT_ID, location=LOCATION)

os.environ['GOOGLE_CLOUD_PROJECT'] = PROJECT_ID
os.environ['GOOGLE_CLOUD_LOCATION'] = LOCATION
os.environ['GOOGLE_GENAI_USE_VERTEXAI'] = 'True'

BUCKET = f'{PROJECT_ID}_artifacts'
!gsutil ls -b gs://{BUCKET} 2>/dev/null || \
 gsutil mb -l {LOCATION} gs://{BUCKET}

gs://etsuji-15pro-poc_artifacts/


In [2]:
# Chat client to test AdkApp
class ChatClient:
    def __init__(self, app, user_id='default_user'):
        self._app = app
        self._user_id = user_id
        self._session_id = None
        
    async def async_stream_query(self, message):
        if not self._session_id:
            session = await self._app.async_create_session(
                user_id=self._user_id,
            )
            self._session_id = getattr(session, 'id', None) or session['id']

        result = []
        async for event in self._app.async_stream_query(
            user_id=self._user_id,
            session_id=self._session_id,
            message=message,
        ):
            print('====')
            print(event)
            print('====')
            if ('content' in event and 'parts' in event['content']):
                response = '\n'.join(
                    [p['text'] for p in event['content']['parts'] if 'text' in p]
                )
                if response:
                    print(response)
                    result.append(response)
        return result

## Define root agent and AdkApp

In [3]:
root_agent = LlmAgent(
    name='image_analyst_agent',
    model='gemini-2.5-flash',
    instruction='''
Your role is to analyze given image files.
Use load_artifacts() if the image content is not in the context.
''',
    tools=[load_artifacts]
)

def artifact_builder():
    return GcsArtifactService(bucket_name=BUCKET)

app = AdkApp(
    agent=root_agent,
    app_name='iamge_analyzer_app',
    artifact_service_builder=artifact_builder,
    plugins=[SaveFilesAsArtifactsPlugin()],
)

## Test the local AdkApp

In [4]:
def get_image_data(file_path: str):
    with open(file_path, 'rb') as f:
        image_bytes = f.read()
    return base64.b64encode(image_bytes).decode('utf-8')

In [5]:
client = ChatClient(app)

image_base64 = get_image_data('testimage.png')
message_input = {
        'role': 'user',
        'parts': [
            {'text': 'describe the image'},
            {
                'inline_data': {
                    'mime_type': 'image/png',
                    'data': image_base64
                }
            }
        ]
}

# For the first turn, the agent loads the image content from artifacts into the session context.
_ = await client.async_stream_query(message_input)



====
{'model_version': 'gemini-2.5-flash', 'content': {'parts': [{'function_call': {'id': 'adk-c25d88c5-a7df-485f-a7db-3547960ffe10', 'args': {'artifact_names': ['artifact_e-8b78d592-1c74-443d-a454-8b4f08048a07_1']}, 'name': 'load_artifacts'}, 'thought_signature': 'CvkBAY89a1_SieRM-xcgUPpZnXTGoYJo1xmGjHvmx2wjKehJg5mD0G8qd4B39TN2YXc_Go50e4xXLZqwSFi2NUiBsltFk1CDBPnmVCahFs9uNcsxQdgPANZ-2sygNicglf0jdezGXFD-chWkqibYFD3YfTHIlzwZuJg-hjKvxoz6BWJf5x986fLpDpU_EwBPAInyZR26DOF6y2kLLWK_463Z_MFZlBUasZc1IeEnbCay7UZZag12qRHQwqcdH6kHKc8pnS-oIwz2if56FO_ZKV5PPNTKuLslcT4t_GgwXDagN0XOCcuirk0hTHQkQ0slNqUvUcshvK3RWTmL'}], 'role': 'model'}, 'finish_reason': 'STOP', 'usage_metadata': {'candidates_token_count': 48, 'candidates_tokens_details': [{'modality': 'TEXT', 'token_count': 48}], 'prompt_token_count': 1550, 'prompt_tokens_details': [{'modality': 'IMAGE', 'token_count': 1290}, {'modality': 'TEXT', 'token_count': 260}], 'thoughts_token_count': 79, 'total_token_count': 1677, 'traffic_type': 'ON_DEMAND'}, 'av

In [6]:
# For the second turn, the agent reuses the image content in the session context.
_ = await client.async_stream_query('The number of people in the image?')

====
{'model_version': 'gemini-2.5-flash', 'content': {'parts': [{'text': 'There are three people in the image: one adult (father) and two children.', 'thought_signature': 'CuEBAY89a18AjCCvjCU5gW2P2JZo2UibJkaBX7zXqOvhtoBNkcLHpEjQYaW2TGsZ0cwPwDagR0yt_UZ4QJFgsPfHotEFLOkQSKeRBjvFOFlR0MbB72qI4P19T1ufL9-j3iHhMjvuBreCm21DNO5jw_UCUa-0kZiEpYhmOtcUbkhWOYiQ_3-iCrKMpkw4caPBoA1yAR9he87GV-kFkOIhzWMwtDIu4jwXb5GCuHKp9Sp3Q136Ovn6KbJpVOyt6HzUeNM1ThaXpy0tZEtmAkM0PCbxIbWoNzzrIyj2X1FFpM0m2nQu'}], 'role': 'model'}, 'finish_reason': 'STOP', 'usage_metadata': {'cache_tokens_details': [{'modality': 'IMAGE', 'token_count': 1103}, {'modality': 'TEXT', 'token_count': 491}], 'cached_content_token_count': 1594, 'candidates_token_count': 17, 'candidates_tokens_details': [{'modality': 'TEXT', 'token_count': 17}], 'prompt_token_count': 1786, 'prompt_tokens_details': [{'modality': 'IMAGE', 'token_count': 1290}, {'modality': 'TEXT', 'token_count': 575}], 'thoughts_token_count': 38, 'total_token_count': 1841, 'traffic_t

## Deploy AdkApp on Agent Engine

In [7]:
agent_engines = vertexai.Client().agent_engines
display_name = 'Image Analyzer App'

remote_app = None
for item in agent_engines.list():
    if item.api_resource.display_name == display_name:
        remote_app = agent_engines.get(name=item.api_resource.name)
        break

if not remote_app:
    remote_app = agent_engines.create(
        agent=app,
        config={
            'agent_framework': 'google-adk',
            'requirements': ['google-adk==1.21.0'],
            'staging_bucket': f'gs://{PROJECT_ID}',
            'display_name': display_name,
        }
    )

## Test the deployed AdkApp

In [8]:
client = ChatClient(remote_app)

image_base64 = get_image_data('testimage.png')
message_input = {
        'role': 'user',
        'parts': [
            {'text': 'describe the image'},
            {
                'inline_data': {
                    'mime_type': 'image/png',
                    'data': image_base64
                }
            }
        ]
}

_ = await client.async_stream_query(message_input)

====
{'model_version': 'gemini-2.5-flash', 'content': {'parts': [{'function_call': {'id': 'adk-7b8811a4-31ff-4aaa-9ddd-ef70a7d64d32', 'args': {'artifact_names': ['artifact_e-a09308fc-899e-4553-9887-cdde304a3029_1']}, 'name': 'load_artifacts'}, 'thought_signature': 'Cm4Bjz1rX4o4c5SpnflSCf7eF-6NCwlCeMsIV8J7Coz-JA5La3Mf8e_auYavqMl--RWNF1inRtMSMY2MO9ULnZbkjEbD9jcUszpAQnK8-hoWPlYaZslokaCXEd85fGV2k0K5sgRK8jhAXq1EXu7t_A=='}], 'role': 'model'}, 'finish_reason': 'STOP', 'usage_metadata': {'candidates_token_count': 45, 'candidates_tokens_details': [{'modality': 'TEXT', 'token_count': 45}], 'prompt_token_count': 1544, 'prompt_tokens_details': [{'modality': 'IMAGE', 'token_count': 1290}, {'modality': 'TEXT', 'token_count': 254}], 'thoughts_token_count': 17, 'total_token_count': 1606, 'traffic_type': 'ON_DEMAND'}, 'avg_logprobs': -0.16514309777153863, 'invocation_id': 'e-a09308fc-899e-4553-9887-cdde304a3029', 'author': 'image_analyst_agent', 'actions': {'state_delta': {}, 'artifact_delta': {}, 'req

In [9]:
_ = await client.async_stream_query('The number of people in the image?')

====
{'model_version': 'gemini-2.5-flash', 'content': {'parts': [{'text': 'There are three people in the image: one adult and two children.', 'thought_signature': 'CuMBAY89a18MM66vQY__1wk7Y0twAXETEG1MM0S84izRSRqO7QR1VP61W5IsYaVNA9PUThz2oQR-EDqh6QlLfqhABTp4CT0onfR8yqhKd7QXR6iseZoq4VtgjBDQTSCyKrUS8aQnIgag6yNhdTs_wPV_hS-otvr_2p-9-LziGTPlBCx4mu0ulm2I0jNTwKmWyGWwo-3fiKGXhoY3uFjiZlvo1_psNAQ_E4DXkD6H6vcwNnoLK9q174C1_20UX1SqBUBPYHZKuaSRKRMg-rwT2UcQTa-VGNrN8a0pCUT1wLjSUaRIEeM='}], 'role': 'model'}, 'finish_reason': 'STOP', 'usage_metadata': {'cache_tokens_details': [{'modality': 'TEXT', 'token_count': 477}, {'modality': 'IMAGE', 'token_count': 1112}], 'cached_content_token_count': 1589, 'candidates_token_count': 14, 'candidates_tokens_details': [{'modality': 'TEXT', 'token_count': 14}], 'prompt_token_count': 1827, 'prompt_tokens_details': [{'modality': 'TEXT', 'token_count': 554}, {'modality': 'IMAGE', 'token_count': 1290}], 'thoughts_token_count': 37, 'total_token_count': 1878, 'traffic_type':