# Lesson 6: Tools

This notebook explores **Tools (Function Calling)**, one of the key building blocks of an AI agent. 

We will use the `google-genai` library to interact with Google's Gemini models.

**Learning Objectives:**

1.  **Understand and implement tool use (function calling)** from scratch to allow an LLM to interact with external systems.
2.  **Build a custom tool calling framework** using decorators similar to production frameworks like LangGraph.
3.  **Use Gemini's native tool calling API** for production-ready implementations.
4.  **Implement structured data extraction** using Pydantic models as tools for reliable structured outputs.
5.  **Run tools in a loop** to handle multi-step tasks and understand the limitations that lead to the popular ReAct pattern.

## 1. Setup

First, we define some standard Magic Python commands to autoreload Python packages whenever they change:

In [1]:
%load_ext autoreload
%autoreload 2

### Set Up Python Environment

To set up your Python virtual environment using `uv` and load it into the notebook, follow the step-by-step instructions in the `Course Admin` lesson at the beginning of the course.

**TL;DR:** Make sure the correct kernel, pointing to your `uv` virtual environment, is selected.

### Configure Gemini API

To configure the Gemini API, follow the step-by-step instructions in the `Course Admin` lesson.

Here is a quick checklist of what you need to run this notebook:

1.  Get your key from [Google AI Studio](https://aistudio.google.com/app/apikey).
2.  From the root of your project, run: `cp .env.example .env` 
3.  Within the `.env` file, fill in the `GOOGLE_API_KEY` variable:

Now, the code below will load the key from the `.env` file:

In [None]:
from utils import env

env.load(required_env_vars=["GOOGLE_API_KEY"])

Trying to load environment variables from `/Users/pauliusztin/Documents/01_projects/TAI/course-ai-agents/.env`
Environment variables loaded successfully.


### Import Key Packages

In [None]:
import json
from typing import Any

from google import genai
from google.genai import types
from pydantic import BaseModel, Field

from utils import pretty_print

### Initialize the Gemini Client

In [4]:
client = genai.Client()

### Define Constants

We will use the `gemini-2.5-flash` model, which is fast and cost-effective. We also define a sample financial document that will be used throughout our examples.

In [5]:
MODEL_ID = "gemini-2.5-flash"

DOCUMENT = """
# Q3 2023 Financial Performance Analysis

The Q3 earnings report shows a 20% increase in revenue and a 15% growth in user engagement, 
beating market expectations. These impressive results reflect our successful product strategy 
and strong market positioning.

Our core business segments demonstrated remarkable resilience, with digital services leading 
the growth at 25% year-over-year. The expansion into new markets has proven particularly 
successful, contributing to 30% of the total revenue increase.

Customer acquisition costs decreased by 10% while retention rates improved to 92%, 
marking our best performance to date. These metrics, combined with our healthy cash flow 
position, provide a strong foundation for continued growth into Q4 and beyond.
"""

## 2. Implementing tool calls from scratch

LLMs are trained on text and can't perform actions in the real world on their own. Tools (or function calling) are the mechanism we use to bridge this gap. We provide the LLM with a list of available tools, and it can decide which one to use and with what arguments to fulfill a user's request.

The process of calling a tool looks as follows:

1. **You:** Send the LLM a prompt and a list of available tools.
2. **LLM:** Responds with a function call request, specifying the tool and arguments.
3. **You:** Execute the requested function in your code.
4. **You:** Send the function's output back to the LLM.
5. **LLM:** Uses the tool's output to generate a final, user-facing response.


### Define Mock Tools

Let's create three simple, mocked functions. One simulates searching Google Drive, another simulates sending a Discord message, and the last one simulates summarizing a document. 

The function signature (input parameters and output type) and docstrings are crucial, as the LLM uses them to understand what each tool does.

In [6]:
def search_google_drive(query: str) -> dict:
    """
    Searches for a file on Google Drive and returns its content or a summary.

    Args:
        query (str): The search query to find the file, e.g., 'Q3 earnings report'.

    Returns:
        dict: A dictionary representing the search results, including file names and summaries.
    """

    # Here, we mock the response for demonstration.
    # In a real scenario, this would interact with the Google Drive API.
    return {
        "files": [
            {
                "name": "Q3_Earnings_Report_2024.pdf",
                "id": "file12345",
                "content": DOCUMENT,
            }
        ]
    }


def send_discord_message(channel_id: str, message: str) -> dict:
    """
    Sends a message to a specific Discord channel.

    Args:
        channel_id (str): The ID of the channel to send the message to, e.g., '#finance'.
        message (str): The content of the message to send.

    Returns:
        dict: A dictionary confirming the action, e.g., {"status": "success"}.
    """

    # Mocking a successful API call to Discord.
    return {
        "status": "success",
        "status_code": 200,
        "channel": channel_id,
        "message_preview": f"{message[:50]}...",
    }


def summarize_financial_report(text: str) -> str:
    """
    Summarizes a financial report.

    Args:
        text (str): The text to summarize.

    Returns:
        str: The summary of the text.
    """

    # Mocked summary for demonstration.
    return "The Q3 2023 earnings report shows strong performance across all metrics \
with 20% revenue growth, 15% user engagement increase, 25% digital services growth, and \
improved retention rates of 92%."

Now, we need to define the metadata for each function, which will be used as input to the LLM to understand which tool to use and how to call it:

In [7]:
search_google_drive_schema = {
    "name": "search_google_drive",
    "description": "Searches for a file on Google Drive and returns its content or a summary.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query to find the file, e.g., 'Q3 earnings report'.",
            }
        },
        "required": ["query"],
    },
}

send_discord_message_schema = {
    "name": "send_discord_message",
    "description": "Sends a message to a specific Discord channel.",
    "parameters": {
        "type": "object",
        "properties": {
            "channel_id": {
                "type": "string",
                "description": "The ID of the channel to send the message to, e.g., '#finance'.",
            },
            "message": {
                "type": "string",
                "description": "The content of the message to send.",
            },
        },
        "required": ["channel_id", "message"],
    },
}

summarize_financial_report_schema = {
    "name": "summarize_financial_report",
    "description": "Summarizes a financial report.",
    "parameters": {
        "type": "object",
        "properties": {
            "text": {
                "type": "string",
                "description": "The text to summarize.",
            },
        },
        "required": ["text"],
    },
}


Ultimately, we will aggregate all the tools in a single dictionary, known as the tools registry:

In [8]:
TOOLS = {
    "search_google_drive": {
        "handler": search_google_drive,
        "schema": search_google_drive_schema,
    },
    "send_discord_message": {
        "handler": send_discord_message,
        "schema": send_discord_message_schema,
    },
    "summarize_financial_report": {
        "handler": summarize_financial_report,
        "schema": summarize_financial_report_schema,
    },
}
TOOLS_BY_NAME = {tool_name: tool["handler"] for tool_name, tool in TOOLS.items()}
TOOLS_SCHEMA = [tool["schema"] for tool in TOOLS.values()]

Let's take a look at them:

In [9]:
for tool_name, tool in TOOLS_BY_NAME.items():
    print(f"Tool name: {tool_name}")
    print(f"Tool handler: {tool}")
    print("-" * 75)

Tool name: search_google_drive
Tool handler: <function search_google_drive at 0x10ad91e40>
---------------------------------------------------------------------------
Tool name: send_discord_message
Tool handler: <function send_discord_message at 0x10fef5b20>
---------------------------------------------------------------------------
Tool name: summarize_financial_report
Tool handler: <function summarize_financial_report at 0x10fef5bc0>
---------------------------------------------------------------------------


In [10]:
pretty_print.wrapped(json.dumps(TOOLS_SCHEMA[0], indent=2), title="`search_google_drive` Tool Schema")

[93m-------------------------------- `search_google_drive` Tool Schema --------------------------------[0m
  {
  "name": "search_google_drive",
  "description": "Searches for a file on Google Drive and returns its content or a summary.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "The search query to find the file, e.g., 'Q3 earnings report'."
      }
    },
    "required": [
      "query"
    ]
  }
}
[93m----------------------------------------------------------------------------------------------------[0m


In [11]:
pretty_print.wrapped(json.dumps(TOOLS_SCHEMA[1], indent=2), title="`send_discord_message` Tool Schema")

[93m-------------------------------- `send_discord_message` Tool Schema --------------------------------[0m
  {
  "name": "send_discord_message",
  "description": "Sends a message to a specific Discord channel.",
  "parameters": {
    "type": "object",
    "properties": {
      "channel_id": {
        "type": "string",
        "description": "The ID of the channel to send the message to, e.g., '#finance'."
      },
      "message": {
        "type": "string",
        "description": "The content of the message to send."
      }
    },
    "required": [
      "channel_id",
      "message"
    ]
  }
}
[93m----------------------------------------------------------------------------------------------------[0m


Now, let's see how to call these tools using an LLM. First, we need to define the system prompt:

In [12]:
TOOL_CALLING_SYSTEM_PROMPT = """
You are a helpful AI assistant with access to tools that enable you to take actions and retrieve information to better 
assist users.

## Tool Usage Guidelines

**When to use tools:**
- When you need information that is not in your training data
- When you need to perform actions in external systems and environments
- When you need real-time, dynamic, or user-specific data
- When computational operations are required

**Tool selection:**
- Choose the most appropriate tool based on the user's specific request
- If multiple tools could work, select the one that most directly addresses the need
- Consider the order of operations for multi-step tasks

**Parameter requirements:**
- Provide all required parameters with accurate values
- Use the parameter descriptions to understand expected formats and constraints
- Ensure data types match the tool's requirements (strings, numbers, booleans, arrays)

## Tool Call Format

When you need to use a tool, output ONLY the tool call in this exact format:

```tool_call
{{"name": "tool_name", "args": {{"param1": "value1", "param2": "value2"}}}}
```

**Critical formatting rules:**
- Use double quotes for all JSON strings
- Ensure the JSON is valid and properly escaped
- Include ALL required parameters
- Use correct data types as specified in the tool definition
- Do not include any additional text or explanation in the tool call

## Response Behavior

- If no tools are needed, respond directly to the user with helpful information
- If tools are needed, make the tool call first, then provide context about what you're doing
- After receiving tool results, provide a clear, user-friendly explanation of the outcome
- If a tool call fails, explain the issue and suggest alternatives when possible

## Available Tools

<tool_definitions>
{tools}
</tool_definitions>

Your goal is to be maximally helpful to the user. Use tools when they add value, but don't use them unnecessarily.
"""

Let's try the prompt with a few examples.

In [13]:
USER_PROMPT = """
Can you help me find the latest quarterly report and share key insights with the team?
"""

messages = [TOOL_CALLING_SYSTEM_PROMPT.format(tools=str(TOOLS_SCHEMA)), USER_PROMPT]

response = client.models.generate_content(
    model=MODEL_ID,
    contents=messages,
)

pretty_print.wrapped(response.text, title="LLM Tool Call Response")

[93m-------------------------------------- LLM Tool Call Response --------------------------------------[0m
  ```tool_call
{"name": "search_google_drive", "args": {"query": "latest quarterly report"}}
```
[93m----------------------------------------------------------------------------------------------------[0m


In [14]:
USER_PROMPT = """
Send a greeting message to the #finance channel on Discord.
"""

messages = [TOOL_CALLING_SYSTEM_PROMPT.format(tools=str(TOOLS_SCHEMA)), USER_PROMPT]

response = client.models.generate_content(
    model=MODEL_ID,
    contents=messages,
)
pretty_print.wrapped(response.text, title="LLM Tool Call Response")

[93m-------------------------------------- LLM Tool Call Response --------------------------------------[0m
  ```tool_call
{"name": "send_discord_message", "args": {"channel_id": "#finance", "message": "Hello everyone!"}}
```
[93m----------------------------------------------------------------------------------------------------[0m


The next step is to parse the LLM response and call the tool using Python.

First, we parse the LLM output to extract the JSON from the response:

In [15]:
def extract_tool_call(response_text: str) -> str:
    """
    Extracts the tool call from the response text.
    """
    return response_text.split("```tool_call")[1].split("```")[0].strip()


tool_call_str = extract_tool_call(response.text)
tool_call_str

'{"name": "send_discord_message", "args": {"channel_id": "#finance", "message": "Hello everyone!"}}'

Next, we parse the stringified JSON to a Python dict:

In [16]:
tool_call = json.loads(tool_call_str)
tool_call

{'name': 'send_discord_message',
 'args': {'channel_id': '#finance', 'message': 'Hello everyone!'}}

Now, we retrieve the tool handler, which is a Python function:

In [17]:
tool_handler = TOOLS_BY_NAME[tool_call["name"]]
tool_handler

<function __main__.send_discord_message(channel_id: str, message: str) -> dict>

Ultimately, we call the Python function using the arguments generated by the LLM:

In [18]:
tool_result = tool_handler(**tool_call["args"])
pretty_print.wrapped(tool_result, indent=2, title="LLM Tool Call Response")

[93m-------------------------------------- LLM Tool Call Response --------------------------------------[0m
  {
  "status": "success",
  "status_code": 200,
  "channel": "#finance",
  "message_preview": "Hello everyone!..."
}
[93m----------------------------------------------------------------------------------------------------[0m


We can summarize the tool execution in the following function:

In [19]:
def call_tool(response_text: str, tools_by_name: dict) -> Any:
    """
    Call a tool based on the response from the LLM.

    Args:
        response_text (str): The raw response text from the LLM containing the tool call.
        tools_by_name (dict): Dictionary mapping tool names to their handler functions.

    Returns:
        Any: The result of executing the tool with the provided arguments.
    """

    tool_call_str = extract_tool_call(response_text)
    tool_call = json.loads(tool_call_str)
    tool_name = tool_call["name"]
    tool_args = tool_call["args"]
    tool = tools_by_name[tool_name]

    return tool(**tool_args)

In [20]:
pretty_print.wrapped(
    json.dumps(call_tool(response.text, tools_by_name=TOOLS_BY_NAME), indent=2), title="LLM Tool Call Response"
)

[93m-------------------------------------- LLM Tool Call Response --------------------------------------[0m
  {
  "status": "success",
  "status_code": 200,
  "channel": "#finance",
  "message_preview": "Hello everyone!..."
}
[93m----------------------------------------------------------------------------------------------------[0m


Usually, before showing it to the user, we want the LLM to interpret the tool output:

In [21]:
response = client.models.generate_content(
    model=MODEL_ID,
    contents=f"Interpret the tool result: {json.dumps(tool_result, indent=2)}",
)
pretty_print.wrapped(response.text, title="LLM Tool Call Response")

[93m-------------------------------------- LLM Tool Call Response --------------------------------------[0m
  This tool result indicates the following:

*   **Overall Success:** The operation performed by the tool completed successfully, as indicated by `"status": "success"` and the standard HTTP success code `"status_code": 200`.
*   **Target Channel:** The action was specifically directed towards, or involved, the Slack/chat channel named **`#finance`**.
*   **Message Content:** A message was likely sent or processed, and its beginning content is previewed as **`"Hello everyone!..."`**.

In summary, **a message starting with "Hello everyone!..." was successfully sent to or processed within the #finance channel.**
[93m----------------------------------------------------------------------------------------------------[0m


That's the basic concept of tool calling! We've successfully implemented function calling from scratch.

## 3. Implementing a tool calling framework from scratch

For a better analogy with what we see in frameworks such as LangGraph or MCP, let's define a `@tool` decorator that automatically computes the schemas defined above based on the function signature and docstring.

First, we will define the `ToolFunction` class that aggregates the function's schema:

In [22]:
from inspect import Parameter, signature
from typing import Any, Callable, Dict


class ToolFunction:
    def __init__(self, func: Callable, schema: Dict[str, Any]) -> None:
        self.func = func
        self.schema = schema
        self.__name__ = func.__name__
        self.__doc__ = func.__doc__

    def __call__(self, *args: Any, **kwargs: Any) -> Any:
        return self.func(*args, **kwargs)


Now, let's define a `tools` registry that will aggregate all our decorated tools:

In [23]:
tools: list[ToolFunction] = []

Ultimately, let's define the actual `@tool` decorator:

In [24]:
def tool() -> Callable[[Callable], ToolFunction]:
    """
    A decorator that creates a tool schema from a function.

    Returns:
        A decorator function that wraps the original function and adds a schema
    """

    def decorator(func: Callable) -> ToolFunction:
        # Get function signature
        sig = signature(func)

        # Create parameters schema
        properties = {}
        required = []

        for param_name, param in sig.parameters.items():
            # Skip self for methods
            if param_name == "self":
                continue

            param_schema = {
                "type": "string",  # Default to string, can be enhanced with type hints
                "description": f"The {param_name} parameter",  # Default description
            }

            # Add to required if parameter has no default value
            if param.default == Parameter.empty:
                required.append(param_name)

            properties[param_name] = param_schema

        # Create the tool schema
        schema = {
            "name": func.__name__,
            "description": func.__doc__,
            "parameters": {
                "type": "object",
                "properties": properties,
                "required": required,
            },
        }

        # Create the tool function and add it to the tools registry
        tool = ToolFunction(func, schema)
        tools.append(tool)

        return tool

    return decorator

Let's redefine our tools leveraging the `@tool` decorator:

In [25]:
@tool()
def search_google_drive_example(query: str) -> dict:
    """Search for files in Google Drive."""
    return {"files": ["Q3 earnings report"]}


@tool()
def send_discord_message_example(channel_id: str, message: str) -> dict:
    """Send a message to a Discord channel."""
    return {"message": "Message sent successfully"}


@tool()
def summarize_financial_report_example(text: str) -> str:
    """Summarize the contents of a financial report."""
    return "Financial report summarized successfully"

Let's inspect the `tools` registry to look at all the available tools:

In [26]:
tools

[<__main__.ToolFunction at 0x10ffd9010>,
 <__main__.ToolFunction at 0x10ffdd6d0>,
 <__main__.ToolFunction at 0x10ffdc050>]

The first tool from the registry:

In [27]:
tools[0].schema["name"]

'search_google_drive_example'

We can see that the first tool from the registry is `search_google_drive_example`. As expected, after the function has been decorated, it has been wrapped into a `ToolFunction` object:

In [28]:
type(tools[0])

__main__.ToolFunction

It has automatically computed the tool schema that will be passed to the LLM:

In [29]:
pretty_print.wrapped(json.dumps(tools[0].schema, indent=2), title="Search Google Drive Example")

[93m----------------------------------- Search Google Drive Example -----------------------------------[0m
  {
  "name": "search_google_drive_example",
  "description": "Search for files in Google Drive.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "The query parameter"
      }
    },
    "required": [
      "query"
    ]
  }
}
[93m----------------------------------------------------------------------------------------------------[0m


...and contains the actual function handler:

In [30]:
search_google_drive_example.func

<function __main__.search_google_drive_example(query: str) -> dict>

Let's see how this new method works with LLMs. First, we have to create our tool mappings:

In [31]:
tools_by_name = {tool.schema["name"]: tool.func for tool in tools}
tools_schema = [tool.schema for tool in tools]

In [32]:
pretty_print.wrapped(json.dumps(tools_schema, indent=2), title="Tools Schema")

[93m------------------------------------------- Tools Schema -------------------------------------------[0m
  [
  {
    "name": "search_google_drive_example",
    "description": "Search for files in Google Drive.",
    "parameters": {
      "type": "object",
      "properties": {
        "query": {
          "type": "string",
          "description": "The query parameter"
        }
      },
      "required": [
        "query"
      ]
    }
  },
  {
    "name": "send_discord_message_example",
    "description": "Send a message to a Discord channel.",
    "parameters": {
      "type": "object",
      "properties": {
        "channel_id": {
          "type": "string",
          "description": "The channel_id parameter"
        },
        "message": {
          "type": "string",
          "description": "The message parameter"
        }
      },
      "required": [
        "channel_id",
        "message"
      ]
    }
  },
  {
    "name": "summarize_financial_report_example",
    "descri

Now, let's call the LLM by passing the tool schemas, as before:

In [33]:
USER_PROMPT = """
Can you help me find the latest quarterly report and share key insights with the team?
"""

messages = [TOOL_CALLING_SYSTEM_PROMPT.format(tools=str(tools_schema)), USER_PROMPT]

response = client.models.generate_content(
    model=MODEL_ID,
    contents=messages,
)
pretty_print.wrapped(response.text, title="LLM Tool Call Response")

[93m-------------------------------------- LLM Tool Call Response --------------------------------------[0m
  ```tool_call
{"name": "search_google_drive_example", "args": {"query": "latest quarterly report"}}
```
[93m----------------------------------------------------------------------------------------------------[0m


In [34]:
pretty_print.wrapped(
    json.dumps(call_tool(response.text, tools_by_name=tools_by_name), indent=2), title="LLM Tool Call Response"
)

[93m-------------------------------------- LLM Tool Call Response --------------------------------------[0m
  {
  "files": [
    "Q3 earnings report"
  ]
}
[93m----------------------------------------------------------------------------------------------------[0m


Voilà! We have our little tool calling framework.

## 4. Implementing production-level tool calls with Gemini

In production, most of the time, we don't implement tool calling from scratch. Instead, we leverage the native interface of a specific API such as Gemini or OpenAI. So, let's see how we can use Gemini's built-in tool calling capabilities instead of our custom implementation.

In [35]:
tools = [
    types.Tool(
        function_declarations=[
            types.FunctionDeclaration(**search_google_drive_schema),
            types.FunctionDeclaration(**send_discord_message_schema),
        ]
    )
]
config = types.GenerateContentConfig(
    tools=tools,
    # Constrained to always predict a function call
    tool_config=types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="ANY")),
)


As you can see, when calling the LLM, we don't have to explicitly define a system prompt that guides the LLM on how to use the tools. Instead, we pass the tool schema to the LLM provider through the config, which will handle tool calling internally. This is more efficient, as they take care of optimizing tool/function calling for each specific model:

In [36]:
pretty_print.wrapped(USER_PROMPT, title="User Prompt")
response = client.models.generate_content(
    model=MODEL_ID,
    contents=USER_PROMPT,
    config=config,
)
pretty_print.wrapped(str(response.candidates[0].content.parts[0].function_call), title="LLM Response - Function Call")

[93m------------------------------------------- User Prompt -------------------------------------------[0m
  
Can you help me find the latest quarterly report and share key insights with the team?

[93m----------------------------------------------------------------------------------------------------[0m
[93m----------------------------------- LLM Response - Function Call -----------------------------------[0m
  id=None args={'query': 'latest quarterly report'} name='search_google_drive'
[93m----------------------------------------------------------------------------------------------------[0m


To simplify the implementation even more, Google's genai supports taking Python functions directly as input. Now, the SDK creates the schema based on the signature, type hints and pydocs:

In [37]:
client = genai.Client()
config = types.GenerateContentConfig(
    tools=[search_google_drive, send_discord_message],
    tool_config=types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="ANY")),
)

Now, let's call the LLM again using the new config:

In [38]:
pretty_print.wrapped(USER_PROMPT, title="User Prompt")
response = client.models.generate_content(
    model=MODEL_ID,
    contents=USER_PROMPT,
    config=config,
)
pretty_print.wrapped(str(response.candidates[0].content.parts[0].function_call), title="LLM Response - Function Call")

[93m------------------------------------------- User Prompt -------------------------------------------[0m
  
Can you help me find the latest quarterly report and share key insights with the team?

[93m----------------------------------------------------------------------------------------------------[0m
[93m----------------------------------- LLM Response - Function Call -----------------------------------[0m
  id=None args={'channel_id': '#finance', 'message': 'Key insights from the latest quarterly report:\n- 20% increase in revenue\n- 15% growth in user engagement\n- Beating market expectations\n- Successful product strategy and strong market positioning\n- Digital services leading growth at 25% year-over-year\n- Expansion into new markets contributing to 30% of the total revenue increase.\n- Customer acquisition costs decreased by 10%.\n- Retention rates improved to 92%.\n- Healthy cash flow position, providing a strong foundation for continued growth into Q4 and beyond.'} n

Let's look at the LLM response better:

In [39]:
response_message_part = response.candidates[0].content.parts[0]
function_call = response_message_part.function_call
function_call

FunctionCall(id=None, args={'channel_id': '#finance', 'message': 'Key insights from the latest quarterly report:\n- 20% increase in revenue\n- 15% growth in user engagement\n- Beating market expectations\n- Successful product strategy and strong market positioning\n- Digital services leading growth at 25% year-over-year\n- Expansion into new markets contributing to 30% of the total revenue increase.\n- Customer acquisition costs decreased by 10%.\n- Retention rates improved to 92%.\n- Healthy cash flow position, providing a strong foundation for continued growth into Q4 and beyond.'}, name='send_discord_message')

In [40]:
pretty_print.wrapped(function_call.args, title="Function Call Args")

[93m---------------------------------------- Function Call Args ----------------------------------------[0m
  {
  "channel_id": "#finance",
  "message": "Key insights from the latest quarterly report:\n- 20% increase in revenue\n- 15% growth in user engagement\n- Beating market expectations\n- Successful product strategy and strong market positioning\n- Digital services leading growth at 25% year-over-year\n- Expansion into new markets contributing to 30% of the total revenue increase.\n- Customer acquisition costs decreased by 10%.\n- Retention rates improved to 92%.\n- Healthy cash flow position, providing a strong foundation for continued growth into Q4 and beyond."
}
[93m----------------------------------------------------------------------------------------------------[0m


In [41]:
tool_handler = TOOLS_BY_NAME[function_call.name]
tool_handler

<function __main__.send_discord_message(channel_id: str, message: str) -> dict>

In [42]:
tool_handler(**function_call.args)

{'status': 'success',
 'status_code': 200,
 'channel': '#finance',
 'message_preview': 'Key insights from the latest quarterly report:\n- 2...'}

Now let's create a simplified function that works with Gemini's native function call objects:

In [43]:
def call_tool(function_call) -> Any:
    tool_name = function_call.name
    tool_args = function_call.args

    tool_handler = TOOLS_BY_NAME[tool_name]

    return tool_handler(**tool_args)

In [44]:
tool_result = call_tool(response_message_part.function_call)
pretty_print.wrapped(tool_result, indent=2, title="Tool Result")

[93m------------------------------------------- Tool Result -------------------------------------------[0m
  {
  "status": "success",
  "status_code": 200,
  "channel": "#finance",
  "message_preview": "Key insights from the latest quarterly report:\n- 2..."
}
[93m----------------------------------------------------------------------------------------------------[0m


## 5. Using Pydantic models as tools for on-demand structured outputs

When it comes to structured outputs, a more elegant and powerful pattern is to treat our Pydantic model *as a tool*. We can ask the model to "call" this Pydantic tool, and the arguments it generates will be our structured data.

This combines the power of function calling with the robustness of Pydantic for structured data extraction. It's the recommended approach for complex data extraction tasks.

Let's define the same Pydantic model as in the structured outputs lesson:

In [45]:
class DocumentMetadata(BaseModel):
    """Pydantic class to hold structured metadata for a document."""

    summary: str = Field(description="A concise, 1-2 sentence summary of the document.")
    tags: list[str] = Field(description="A list of 3-5 high-level tags relevant to the document.")
    keywords: list[str] = Field(description="A list of specific keywords or concepts mentioned.")
    quarter: str = Field(description="The quarter of the financial year described in the document (e.g., Q3 2023).")
    growth_rate: str = Field(description="The growth rate of the company described in the document (e.g., 10%).")

Now, let's see how to use it as a tool:

In [46]:
# The Pydantic class 'DocumentMetadata' is now our 'tool'
extraction_tool = types.Tool(
    function_declarations=[
        types.FunctionDeclaration(
            name="extract_metadata",
            description="Extracts structured metadata from a financial document.",
            parameters=DocumentMetadata.model_json_schema(),
        )
    ]
)

Ultimately, we define the config:

In [47]:
config = types.GenerateContentConfig(
    tools=[extraction_tool],
    tool_config=types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="ANY")),
)

Now we call the LLM:

In [48]:
prompt = f"""
Please analyze the following document and extract its metadata.

Document:
<document>
{DOCUMENT}
</document>
"""

response = client.models.generate_content(model=MODEL_ID, contents=prompt, config=config)
response_message_part = response.candidates[0].content.parts[0]

Print the output:

In [49]:
function_call = response_message_part.function_call
pretty_print.function_call(function_call, title="Function Call")

[93m------------------------------------------ Function Call ------------------------------------------[0m
  [38;5;208mFunction Name:[0m `extract_metadata
  [38;5;208mFunction Arguments:[0m `{
  "summary": "The Q3 2023 earnings report shows a significant 20% increase in revenue and 15% growth in user engagement, exceeding market expectations due to successful product strategy and market expansion. Improved customer acquisition costs and retention rates further strengthen the company's financial position for continued growth.",
  "keywords": [
    "Q3 earnings report",
    "revenue",
    "user engagement",
    "product strategy",
    "market positioning",
    "digital services",
    "new markets",
    "customer acquisition costs",
    "retention rates",
    "cash flow"
  ],
  "quarter": "Q3 2023",
  "growth_rate": "20%",
  "tags": [
    "Financial Performance",
    "Earnings Report",
    "Revenue Growth",
    "User Engagement",
    "Market Expansion"
  ]
}`
[93m------------------

Let's validate the output using Pydantic:

In [50]:
try:
    document_metadata = DocumentMetadata(**function_call.args)
    pretty_print.wrapped("Validation successful!")
except Exception as e:
    pretty_print.wrapped(str(e), title="Validation Error")

[93m----------------------------------------------------------------------------------------------------[0m
  Validation successful!
[93m----------------------------------------------------------------------------------------------------[0m


## 6. The downsides of running tools in a loop

Now, let's implement a more sophisticated approach where we put tool calling in a loop with a conversation history. This allows the agent to perform multi-step tasks by calling multiple tools in sequence. Let's create a scenario where we ask the agent to find a report on Google Drive and then communicate its findings on Discord.

First, we define the config:

In [51]:
tools = [
    types.Tool(
        function_declarations=[
            types.FunctionDeclaration(**search_google_drive_schema),
            types.FunctionDeclaration(**send_discord_message_schema),
            types.FunctionDeclaration(**summarize_financial_report_schema),
        ]
    )
]
config = types.GenerateContentConfig(
    tools=tools,
    tool_config=types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="ANY")),
)

Next, the user prompt:

In [52]:
USER_PROMPT = """
Please find the Q3 earnings report on Google Drive and send a summary of it to 
the #finance channel on Discord.
"""

Now, we make the first LLM call as always:

In [53]:
messages = [USER_PROMPT]

pretty_print.wrapped(USER_PROMPT, title="User Prompt")
response = client.models.generate_content(
    model=MODEL_ID,
    contents=messages,
    config=config,
)
response_message_part = response.candidates[0].content.parts[0]
pretty_print.function_call(response_message_part.function_call, title="Function Call")

messages.append(response.candidates[0].content)

[93m------------------------------------------- User Prompt -------------------------------------------[0m
  
Please find the Q3 earnings report on Google Drive and send a summary of it to 
the #finance channel on Discord.

[93m----------------------------------------------------------------------------------------------------[0m
[93m------------------------------------------ Function Call ------------------------------------------[0m
  [38;5;208mFunction Name:[0m `search_google_drive
  [38;5;208mFunction Arguments:[0m `{
  "query": "Q3 earnings report"
}`
[93m----------------------------------------------------------------------------------------------------[0m


Ultimately, we add the LLM in a loop until it doesn't return new `function_call` objects or it hits the `max_iterations` limit:

In [54]:
max_iterations = 3
while hasattr(response_message_part, "function_call") and max_iterations > 0:
    tool_result = call_tool(response_message_part.function_call)
    pretty_print.wrapped(tool_result, title="Tool Result", indent=2)

    # Add the tool result to the messages creating the following structure:
    # - user prompt
    # - tool call
    # - tool result
    # - tool call
    # - tool result
    # ...
    function_response_part = types.Part.from_function_response(
        name=response_message_part.function_call.name,
        response={"result": tool_result},
    )
    messages.append(function_response_part)

    # Ask the LLM to continue with the next step (which may involve calling another tool)
    response = client.models.generate_content(
        model=MODEL_ID,
        contents=messages,
        config=config,
    )

    response_message_part = response.candidates[0].content.parts[0]
    pretty_print.function_call(response_message_part.function_call, only_name=True, title="Function Call")

    messages.append(response.candidates[0].content)

    max_iterations -= 1

pretty_print.function_call(response.candidates[0].content.parts[0].function_call, title="Final Agent Response")


[93m------------------------------------------- Tool Result -------------------------------------------[0m
  {
  "files": [
    {
      "name": "Q3_Earnings_Report_2024.pdf",
      "id": "file12345",
      "content": "\n# Q3 2023 Financial Performance Analysis\n\nThe Q3 earnings report shows a 20% increase in revenue and a 15% growth in user engagement, \nbeating market expectations. These impressive results reflect our successful product strategy \nand strong market positioning.\n\nOur core business segments demonstrated remarkable resilience, with digital services leading \nthe growth at 25% year-over-year. The expansion into new markets has proven particularly \nsuccessful, contributing to 30% of the total revenue increase.\n\nCustomer acquisition costs decreased by 10% while retention rates improved to 92%, \nmarking our best performance to date. These metrics, combined with our healthy cash flow \nposition, provide a strong foundation for continued growth into Q4 and beyond.\n"


Running tools in a loop is powerful for multi-step tasks, but this naive approach has limitations. 

It doesn't provide explicit opportunities for the model to reason about tool outputs before deciding on the next action. The agent immediately moves to the next function call without pausing to think about what it learned or whether it should change strategy.

This limitation leads us to more sophisticated patterns like **ReAct** (Reasoning and Acting), which explicitly interleaves reasoning steps with tool calls, allowing the agent to think through problems more deliberately. We will explore ReAct patterns in lessons 7 and 8.