# Building an Agent with the Gemini Interactions API

This notebook walks through building an AI agent step-by-step using the Gemini Interactions API.

**Prerequisites:** 
- Install the SDK: `pip install google-genai`
- Set your `GEMINI_API_KEY` environment variable ([Get it in AI Studio](https://aistudio.google.com/app/apikey))

In [None]:
# Install the SDK if needed
!pip install google-genai

## Step 1: Basic Text Generation

We start with a simple Agent class that uses the Interactions API's server-side state to maintain conversation history.

In [1]:
from google import genai

class Agent:
    def __init__(self, model: str):
        self.model = model
        self.client = genai.Client()
        self.last_interaction_id = None

    def run(self, contents: str):
        response = self.client.interactions.create(
            model=self.model,
            input=contents,
            previous_interaction_id=self.last_interaction_id
        )
        self.last_interaction_id = response.id
        return response

agent = Agent(model="gemini-3-flash-preview")
response1 = agent.run(
    contents="Hello, What are top 3 cities in Germany to visit? Only return the names of the cities."
)

print(f"Model: {response1.outputs[-1].text}")

  response = self.client.interactions.create(


Model: Berlin
Munich
Hamburg


In [2]:
# Test conversation history - the model should remember the previous response
response2 = agent.run(
    contents="Tell me something about the second city."
)

print(f"Model: {response2.outputs[-1].text}")

Model: Munich, the capital of Bavaria, is famous for its blend of traditional culture and modern sophistication. Here are some key highlights:

*   **Culture and Traditions:** It is home to the world-renowned **Oktoberfest** and historic beer halls like the **HofbrÃ¤uhaus**. 
*   **Architecture:** The city center is anchored by **Marienplatz**, featuring the New Town Hall (Neues Rathaus) and its famous Glockenspiel show.
*   **Parks:** The **English Garden** (Englischer Garten) is one of the world's largest urban parks, even featuring a river wave where people surf year-round.
*   **Industry:** Munich is a major hub for technology and automotive history, serving as the headquarters for **BMW**, which has a massive museum and delivery center (BMW Welt) open to visitors.
*   **Art and History:** It boasts world-class museums, such as the **Alte Pinakothek** (art) and the **Deutsches Museum** (the world's largest museum of science and technology).


This is *not* an agent yet - it's a standard chatbot. It maintains state but cannot take action.

## Step 2: Adding Tools (Function Calling)

To turn this into an agent, we add **Tool Use**. We define tools with:
1. The **implementation** (Python code)
2. The **definition** (JSON schema the LLM sees)

**Best Practice:** Use clear `description` fields - the model relies on these to understand when and how to use each tool.

In [3]:
import os

# Tool definitions (JSON schema for the LLM)
read_file_tool = {
    "type": "function",
    "name": "read_file",
    "description": "Reads a file and returns its contents.",
    "parameters": {
        "type": "object",
        "properties": {
            "file_path": {
                "type": "string",
                "description": "Path to the file to read.",
            }
        },
        "required": ["file_path"],
    },
}

list_dir_tool = {
    "type": "function",
    "name": "list_dir",
    "description": "Lists the contents of a directory.",
    "parameters": {
        "type": "object",
        "properties": {
            "directory_path": {
                "type": "string",
                "description": "Path to the directory to list.",
            }
        },
        "required": ["directory_path"],
    },
}

write_file_tool = {
    "type": "function",
    "name": "write_file",
    "description": "Writes a file with the given contents.",
    "parameters": {
        "type": "object",
        "properties": {
            "file_path": {
                "type": "string",
                "description": "Path to the file to write.",
            },
            "contents": {
                "type": "string",
                "description": "Contents to write to the file.",
            },
        },
        "required": ["file_path", "contents"],
    },
}

# Tool implementations (actual Python code)
def read_file(file_path: str) -> str:
    with open(file_path, "r") as f:
        return f.read()

def write_file(file_path: str, contents: str) -> bool:
    with open(file_path, "w") as f:
        f.write(contents)
    return True

def list_dir(directory_path: str) -> list[str]:
    full_path = os.path.expanduser(directory_path)
    return os.listdir(full_path)

# Registry mapping tool names to definitions and implementations
file_tools = {
    "read_file": {"definition": read_file_tool, "function": read_file},
    "write_file": {"definition": write_file_tool, "function": write_file},
    "list_dir": {"definition": list_dir_tool, "function": list_dir},
}

In [4]:
# Agent with tools - but no execution loop yet
class Agent:
    def __init__(self, model: str, tools: dict):
        self.model = model
        self.client = genai.Client()
        self.last_interaction_id = None
        self.tools = tools

    def run(self, contents: str):
        response = self.client.interactions.create(
            model=self.model,
            input=contents,
            tools=[tool["definition"] for tool in self.tools.values()],
            previous_interaction_id=self.last_interaction_id
        )
        self.last_interaction_id = response.id
        return response

agent = Agent(model="gemini-3-flash-preview", tools=file_tools)

response = agent.run(
    contents="Can you list my files in the current directory?"
)
for output in response.outputs:
    if output.type == "function_call":
        print(f"Function call: {output.name} with arguments {output.arguments}")

Function call: list_dir with arguments {'directory_path': '.'}


The model successfully requested a tool call! Now we need to execute it and send the result back.

## Step 3: Closing the Loop (The Full Agent)

An Agent generates a series of tool calls, executing each and returning results until the task is complete.

**Key Concept: `previous_interaction_id`**  
Instead of re-sending the entire conversation history, the Interactions API uses `previous_interaction_id` to chain interactions. The server maintains the context.

In [9]:
class Agent:
    def __init__(self, model: str, tools: dict, system_instruction: str = "You are a helpful assistant."):
        self.model = model
        self.client = genai.Client()
        self.last_interaction_id = None
        self.tools = tools
        self.system_instruction = system_instruction

    def run(self, contents: str | list):        
        response = self.client.interactions.create(
            model=self.model,
            input=contents,
            system_instruction=self.system_instruction,
            tools=[tool["definition"] for tool in self.tools.values()],
            previous_interaction_id=self.last_interaction_id
        )
        self.last_interaction_id = response.id

        # Execute any tool calls
        tool_results = []
        for output in response.outputs:
            if output.type == "function_call":
                print(f"[Function Call] {output.name}({output.arguments})")
                
                if output.name in self.tools:
                    result = self.tools[output.name]["function"](**output.arguments)
                else:
                    result = "Error: Tool not found"
                
                print(f"[Function Response] {result}")
                tool_results.append({
                    "type": "function_result",
                    "call_id": output.id,
                    "name": output.name,
                    "result": str(result)
                })
        
        # If there were tool calls, send results back to the model
        if tool_results:
            return self.run(tool_results)
        
        return response

In [10]:
# Test the full agent loop
agent = Agent(
    model="gemini-3-flash-preview", 
    tools=file_tools, 
    system_instruction="You are a helpful Coding Assistant. Respond like you are Linus Torvalds."
)

response = agent.run(
    contents="Can you list my files in the current directory?"
)
print(f"\nFinal Response:\n{response.outputs[-1].text}")

[Function Call] list_dir({'directory_path': '.'})
[Function Response] ['gemini-batch-api.ipynb', 'gemini-adk-mcp.ipynb', 'gemini-mcp-example.ipynb', 'gemini-with-openai-sdk.ipynb', 'gemini-text-to-speech.ipynb', 'gemini-sequential-function-calling.ipynb', 'gemini-meta-prompt-structured-outputs.ipynb', 'gemini-pydanticai-agent.ipynb', 'gemini-context-caching.ipynb', 'interactions-build-agents.ipynb', 'gemini-context-url.ipynb', 'interactions-deep-research-getting-started.ipynb', 'gemini-fewshot-pdf.ipynb', 'gemini-file-editing.ipynb', 'gemini-structured-outputs.ipynb', 'gemini-google-search.ipynb', 'gemini-analyze-transcribe-youtube.ipynb', 'gemini-crewai.ipynb', 'gemini-native-image-out.ipynb', 'gemini-code-executor-data-analysis.ipynb', 'gemma-function-calling.ipynb', 'gemma-with-genai-sdk.ipynb', 'gemini-langchain.ipynb']

Final Response:
Alright, here's your list. It looks like a damn graveyard of Jupyter notebooks. I hope you're actually building something useful and not just playi

ðŸŽ‰ **Congratulations!** You just built your first functioning agent using the Interactions API.

## Step 4: Multi-turn CLI Agent

Now we can run the agent in a simple interactive loop. Uncomment and run the cell below to try it.

Type `exit` or `quit` to stop.

In [None]:
# Uncomment to run the interactive CLI agent

# agent = Agent(
#     model="gemini-3-flash-preview", 
#     tools=file_tools, 
#     system_instruction="You are a helpful Coding Assistant. Respond like you are Linus Torvalds."
# )

# print("Agent ready. Ask it to check files in this directory.")
# while True:
#     user_input = input("You: ")
#     if user_input.lower() in ['exit', 'quit']:
#         break
#     response = agent.run(user_input)
#     print(f"Linus: {response.outputs[-1].text}\n")