##### Copyright 2024 Google LLC.

In [1]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemma - 3 Function Calling: Local File Reader

Google released Gemma-3, a new variant that comes with multimodality, multilingual support for up to 140 global languages, a 128K context window, and agentic capabilities like structured responses and function calling. In this notebook, we will explore how to use function calling with open-source models from Hugging Face via the Gemma 3 4B-it model to read and summarize local text files.

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_3]Function_Calling_with_HF_document_summarizer.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

## Setup

### Select the Colab runtime
To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:

1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.
2. Select **Change runtime type**.
3. Under **Hardware accelerator**, select **T4 GPU**.

### Gemma-3 setup

To complete this tutorial, you'll first need to complete the setup instructions at [Gemma setup](https://ai.google.dev/gemma/docs/setup). The Gemma setup instructions show you how to do the following:

* Get access to Gemma on kaggle.com.
* Select a Colab runtime with sufficient resources to run the Gemma 3 4B-it model.
* Generate and configure a Kaggle username and an API key as Colab secrets.

After you've completed the Gemma setup, move on to the next section, where you'll set environment variables for your Colab environment.


### Install dependencies

Run the cell below to install all the required dependencies.

- Transformers: Latest version that supports Gemma-3
- Accelerate
- No external APIs required; uses standard Python file I/O

In [2]:
!pip install git+https://github.com/huggingface/transformers.git

In [3]:
!pip install accelerate

# Gemma 3

Google has released Gemma-3, a successor to its earlier Gemma models, focusing on open-source accessibility and multimodal vision capabilities. The model is designed to handle complex tasks with extended context lengths, support for multiple languages, and vision capabilities, making it suitable for a wide range of applications.

Gemma-3 has gained attention for outperforming notable models like DeepSeek V3, o3-mini, and Llama 3 405B, with extended context lengths, multilingual support, and multimodal capabilities.

- Extended context length from 32K to 128K using ROPE scaling (from 10K to 1M) on global self-attention layers.
- Available in 1B, 4B, 12B, and 27B sizes (Vision capability included, except for 1B, with bi-directional attention).
- Supports an extensive range of 140 global languages.
- Additional Features: Supports function calling, where the model can generate code or instructions to call external functions, and structured outputs, enabling it to produce data in specific formats like JSON or tables.

## Load Model weights

In [4]:
import torch
from transformers import AutoProcessor, Gemma3ForConditionalGeneration

In [5]:
model_id = "google/gemma-3-4b-it"

In [6]:
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, device_map="auto"
).eval()

processor = AutoProcessor.from_pretrained(model_id)

In [7]:
torch.cuda.empty_cache()

## Function Calling Gemma

Function calling is a feature that allows large language models to interact with external systems, APIs, and tools in a structured way. This feature enables LLMs to not just generate text responses but to invoke specific functions when appropriate based on user queries. In this notebook, we’ll use function calling to read and summarize a local text file uploaded to the Colab environment. For example, a user can ask the model to summarize a file named `notes.txt`.

In [8]:
def read_file(filename: str) -> str:
    """
    Reads the contents of a local text file and returns it as a string.

    Args:
        filename: The name of the file to read (e.g., 'notes.txt')
    Returns:
        The contents of the file as a string, or an error message if the file is not found.
    """
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            return file.read()
    except FileNotFoundError:
        return f"Error: File '{filename}' not found."
    except Exception as e:
        return f"Error: {str(e)}"

In [9]:
tools = [read_file]

In [10]:
# Example query
query = "Summarize the contents of notes.txt"

## Define prompt for function calling

Function Calling follows these steps:

- Your application sends a prompt to the LLM along with function definitions
- The LLM analyzes the prompt and decides whether to respond directly or use defined functions
- If using functions, the LLM generates structured arguments for the function call
- Your application receives the function call details and executes the actual function
- The function results are sent back to the LLM
- The LLM provides a final response incorporating the function results

In [11]:
conversation = [
    {
        "role": "system",
        "content": [{"type": "text", "text": """
        You are an expert file assistant. Use `read_file` to access and process local text files.
        At each turn, if you decide to invoke any of the function(s), it should be wrapped with ```tool_code```. The Python methods described below are imported and available,
        you can only use defined methods. The generated code should be readable and efficient. The response to a method will be wrapped in ```tool_output```; use it to call more tools or generate a helpful, friendly response.
        When using a ```tool_call```, think step by step why and how it should be used.

        The following Python methods are available:
        ```python
        def read_file(filename: str) -> str:
          "
          Reads the contents of a local text file and returns it as a string.

          Args:
              filename: The name of the file to read (e.g., 'notes.txt')
          Returns:
              The contents of the file as a string, or an error message if the file is not found.
          "
        ```

        User: {user_message}
        """}]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": query}
        ]
    }
]

In [12]:
inputs = processor.apply_chat_template(
            conversation,
            tools=tools,
            add_generation_prompt=True,
            return_dict=True,
            tokenize=True,
            return_tensors="pt",
).to(model.device, dtype=torch.bfloat16)

In [13]:
outputs = model.generate(**inputs, max_new_tokens=512)

In [14]:
output = processor.decode(outputs[0], skip_special_tokens=True)

In [15]:
response = output.split("model")[-1]

In [16]:
print(response)


Okay, I can help with that. I will read the file "notes.txt" and provide a summary of its contents.

```tool_code
summary = read_file(filename='notes.txt')
print(summary)
```


## Execute the function calling code

In [20]:
import re
import io
from contextlib import redirect_stdout

def extract_tool_call(text):
    pattern = r"```tool_code\s*(.*?)\s*```"
    match = re.search(pattern, text, re.DOTALL)
    if match:
        code = match.group(1).strip()
        # Capture stdout in a string buffer
        f = io.StringIO()
        with redirect_stdout(f):
            # result = eval(code)
            result = exec(code, globals())
        output = f.getvalue()
        r = result if output == '' else output
        return f'```tool_output\n{r}\n```'
    return None

In [21]:
# Upload a sample file to Colab first (e.g., notes.txt)
from google.colab import files
uploaded = files.upload()

# Extract tool call from the response
keyword = extract_tool_call(response)
# Upload a sample file to Colab first (e.g., notes.txt)
from google.colab import files
uploaded = files.upload()

# Extract tool call from the response
keyword = extract_tool_call(response)

Saving notes.txt to notes (2).txt


Saving notes.txt to notes (3).txt


In [22]:
keyword

'```tool_output\nAdapt AI :A personalized solution to sense your stress, fix your mess, and , boost productivity.\nMaintaining productivity while managing stress is a significant challenge in todays fast paced and multi tasking work environment.\n\nConventional tools are often static in nature, and offer generic solutions that fail to consider the nuances of individual differences in behavior, and stress triggers.\n\nTherefore in this Late breaking work we introduce Adapt AI: a groundbreaking, personalized AI solution designed to boost productivity and enhance well-being through tailored, personalized, interventions.\n\nAdapt AI employs a sophisticated multimodal sensing system, integrating egocentric vision, audio, and physiological data such as heart and motion activities of the individuals.\n\nThis allows Adapt AI to understand and respond to the unique context of each user with the help of Large language models.\n\nBy monitoring real-time data, Adapt AI detects stress levels and ac

In [23]:
prompt = [{
        "role": "user",
        "content": [
            {"type": "text", "text": f"Results from the file read:{keyword} User_query: {query}"}
        ]
    }
]

In [24]:
final_prompt = processor.apply_chat_template(
            prompt,
            tools=tools,
            add_generation_prompt=True,
            return_dict=True,
            tokenize=True,
            return_tensors="pt",
).to(model.device, dtype=torch.bfloat16)

In [25]:
response = model.generate(**final_prompt, max_new_tokens=512)

In [26]:
print(processor.decode(response[0], skip_special_tokens=True))

user
Results from the file read:```tool_output
Adapt AI :A personalized solution to sense your stress, fix your mess, and , boost productivity.
Maintaining productivity while managing stress is a significant challenge in todays fast paced and multi tasking work environment.

Conventional tools are often static in nature, and offer generic solutions that fail to consider the nuances of individual differences in behavior, and stress triggers.

Therefore in this Late breaking work we introduce Adapt AI: a groundbreaking, personalized AI solution designed to boost productivity and enhance well-being through tailored, personalized, interventions.

Adapt AI employs a sophisticated multimodal sensing system, integrating egocentric vision, audio, and physiological data such as heart and motion activities of the individuals.

This allows Adapt AI to understand and respond to the unique context of each user with the help of Large language models.

By monitoring real-time data, Adapt AI detects s