# Gemma 3n - Function Calling: Local File Reader


Google released Gemma 3n, a compact generative AI model optimized for everyday devices like phones and laptops. It supports multimodal inputs (image + audio) and offers multilingual capabilities across numerous languages. Though smaller than its larger siblings, Gemma 3n enables efficient function calling for structured responses. In this notebook, we’ll explore how to leverage Gemma 3n via Hugging Face Transformers to read and summarize local text files using its function-calling abilities.

## Setup

Before starting this tutorial, complete the following steps:

* Get access to Gemma by logging into [Hugging Face](https://huggingface.co/google/gemma-3n-E4b-it) and selecting **Acknowledge license** for a Gemma model.
* Select a Colab runtime with sufficient resources to run
  the Gemma model size you want to run. [Learn more](https://ai.google.dev/gemma/docs/core#sizes).
* Generate a Hugging Face [Access Token](https://huggingface.co/docs/hub/en/security-tokens#how-to-manage-user-access-token) and use it to login from Colab.

This notebook will run on an NVIDIA T4 GPU using Gemma 3n E2B.\
But if you want to use Gemma 3n E4B, select L4 or A100.


In [1]:
# Login into Hugging Face Hub
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Install Python packages

Install the Hugging Face libraries required for running the Gemma model and making requests.

In [None]:
# Install a transformers version that supports Gemma 3n (>= 4.53)
!pip install "transformers>=4.53.0" "timm>=1.0.16"

## Load Model

In [3]:
from transformers import AutoModelForImageTextToText, AutoProcessor
import torch

GEMMA_PATH = "google/gemma-3n-E2B-it" #@param ["google/gemma-3n-E2B-it", "google/gemma-3n-E4B-it"]

processor = AutoProcessor.from_pretrained(GEMMA_PATH)
model = AutoModelForImageTextToText.from_pretrained(GEMMA_PATH, torch_dtype="auto", device_map="auto")

print(f"Device: {model.device}")
print(f"DType: {model.dtype}")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Device: cuda:0
DType: torch.bfloat16


In [4]:
torch.cuda.empty_cache()

## Function Calling Gemma 3n
Function calling lets language models interact with tools and systems through structured commands. In this example, we use Gemma 3n to read and summarize a local file (e.g., notes.txt) uploaded to Colab using this feature.

In [5]:
def read_file(filename: str) -> str:
    """
    Reads the contents of a local text file and returns it as a string.

    Args:
        filename: The name of the file to read (e.g., 'notes.txt')
    Returns:
        The contents of the file as a string, or an error message if the file is not found.
    """
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            return file.read()
    except FileNotFoundError:
        return f"Error: File '{filename}' not found."
    except Exception as e:
        return f"Error: {str(e)}"

In [6]:
# List of available tools
tools = [read_file]

### Code Extraction and Execution

When models generate code inside text, extracting and safely running that code lets us connect AI with real tools.
This code snippet shows how to extract and run Python code embedded in text generated by the model.


In [7]:
import re
import io
from contextlib import redirect_stdout

def extract_tool_call(text):
    """
    Extracts and executes tool_code from the model-generated text
    """
    pattern = r"```tool_code\s*(.*?)\s*```"
    match = re.search(pattern, text, re.DOTALL)

    if match:
        code = match.group(1).strip()
        print(f"📋 Extracted code:\n{code}")

        # Capture stdout in a string buffer
        f = io.StringIO()
        with redirect_stdout(f):
            # Execute code with available tools in namespace
            exec(code, {"read_file": read_file})

        output = f.getvalue()

        # If there's print output, use it; otherwise look for variables in local namespace
        if output.strip():
            result = f'```tool_output\n{output.strip()}\n```'
        else:
            # Try to evaluate the code to get the result
            try:
                result_eval = eval(code, {"read_file": read_file})
                result = f'```tool_output\n{result_eval}\n```'
            except:
                result = f'```tool_output\nTool executed successfully\n```'

        print(f"✅ Tool result:\n{result}")
        return result

    print("❌ No tool_code found in response")
    return None

### ChatState Class  
Creates a chat helper to manage conversation history, send messages to the model, and process responses including tool outputs.

In [8]:
class ChatState():
    def __init__(self, model, processor):
        self.model = model
        self.processor = processor
        self.history = []

    def send_message(self, message, max_tokens=512):
        """
        Sends message and gets response with possible tool call
        """
        self.history.append(message)

        input_ids = self.processor.apply_chat_template(
            self.history,
            tools=tools,
            add_generation_prompt=True,
            tokenize=True,
            return_dict=True,
            return_tensors="pt",
        )

        input_len = input_ids["input_ids"].shape[-1]
        input_ids = input_ids.to(self.model.device)

        outputs = self.model.generate(
            **input_ids,
            max_new_tokens=max_tokens,
            disable_compile=True,
            do_sample=True,
            temperature=0.7,
            pad_token_id=self.processor.tokenizer.eos_token_id
        )

        response_text = self.processor.batch_decode(
            outputs[:, input_len:],
            skip_special_tokens=True,
            clean_up_tokenization_spaces=True
        )[0]

        # Add response to history
        self.history.append({
            "role": "assistant",
            "content": [{"type": "text", "text": response_text}]
        })

        return response_text

    def process_tool_result_and_respond(self, tool_result, original_query, max_tokens=512):
        """
        Processes tool result and generates final response
        """

        # Create new prompt with tool result
        prompt = [{
            "role": "user",
            "content": [
                {"type": "text", "text": f"Results from the file read:{tool_result} User_query: {original_query}"}
            ]
        }]

        final_prompt = self.processor.apply_chat_template(
            prompt,
            tools=tools,
            add_generation_prompt=True,
            return_dict=True,
            tokenize=True,
            return_tensors="pt",
        ).to(self.model.device, dtype=torch.bfloat16)

        response = self.model.generate(
            **final_prompt,
            max_new_tokens=max_tokens,
            disable_compile=True,
            do_sample=True,
            temperature=0.7,
            pad_token_id=self.processor.tokenizer.eos_token_id
        )

        final_response = self.processor.decode(response[0], skip_special_tokens=True)
        return final_response

### Define prompt for function calling
Orchestrates a two-step flow: first prompts the model to generate a code snippet using available tools, then executes it and builds a final response based on the output.

In [9]:
def process_file_query(chat_state, query):
    """
    Processes a file query using the 2-step flow:
    1. Generates tool_code
    2. Executes the code and generates final response
    """

    # Create initial conversation with system instructions

    conversation = [
        {
            "role": "system",
            "content": [{"type": "text", "text": """
            You are an expert file assistant. Use `read_file` to access and process local text files.
            At each turn, if you decide to invoke any of the function(s), it should be wrapped with ```tool_code```. The Python methods described below are imported and available,
            you can only use defined methods. The generated code should be readable and efficient. The response to a method will be wrapped in ```tool_output```; use it to call more tools or generate a helpful, friendly response.
            When using a ```tool_call```, think step by step why and how it should be used.

            The following Python methods are available:
            ```python
            def read_file(filename: str) -> str:
              "
              Reads the contents of a local text file and returns it as a string.

              Args:
                  filename: The name of the file to read (e.g., 'notes.txt')
              Returns:
                  The contents of the file as a string, or an error message if the file is not found.
              "
            ```

            User: {user_message}
            """}]
        },
        {
            "role": "user",
            "content": [
                {"type": "text", "text": query}
            ]
        }
    ]

    # Step 1: Generate tool_code
    print("=== 🔄 STEP 1: Generating tool_code ===")
    response = chat_state.send_message(conversation)  # Only user message
    print("📝 Initial response:")
    print(response)

    # Step 2: Extract and execute tool_code
    print("\n=== ⚙️ STEP 2: Executing tool_code ===")
    tool_result = extract_tool_call(response)

    if tool_result:
        print("✅ Tool executed successfully")

        # Step 3: Generate final response
        print("\n=== 🎯 STEP 3: Generating final response ===")
        final_response = chat_state.process_tool_result_and_respond(tool_result, query)
        print("📋 Final response:")
        print(final_response)

        return final_response
    else:
        print("❌ No tool_code found in response")
        return response

### Upload your file  
Before interacting with the model, you can upload any `.txt` file to the environment. In this example, we’ll use this [sample file](https://huggingface.co/datasets/daqc/demo-data/resolve/main/gemma-3n/notes.txt).


In [10]:
# Upload file to Google Colab
from google.colab import files

print("📁 Upload your notes.txt file (or any .txt file):")
uploaded = files.upload()

# Verify the file was uploaded correctly
import os
print("📋 Available files:")
for filename in os.listdir('.'):
    if filename.endswith('.txt'):
        print(f"  ✅ {filename}")

📁 Upload your notes.txt file (or any .txt file):


Saving notes.txt to notes.txt
📋 Available files:
  ✅ notes.txt


## Test your function calling

The function calling process works like this:

1. The application sends a prompt and function definitions, including the user query, to the LLM.
2. The LLM decides whether to respond directly or generate a structured function call with arguments.
3. The application extracts and executes the function call.
4. The execution results are sent back to the LLM.
5. The LLM creates a final response using the function output and the original query.




In [11]:
# Create chat instance
chat = ChatState(model, processor)

In [12]:
# Define the query
query = "Summarize the contents of notes.txt"

print(f"🔍 Executing query: '{query}'")
print("=" * 50)

# Process the query
result = process_file_query(chat, query)

🔍 Executing query: 'Summarize the contents of notes.txt'
=== 🔄 STEP 1: Generating tool_code ===
📝 Initial response:
I need to read the contents of the file "notes.txt" to summarize it. I will use the `read_file` function for this purpose.
```tool_code
print(read_file("notes.txt"))
```

=== ⚙️ STEP 2: Executing tool_code ===
📋 Extracted code:
print(read_file("notes.txt"))
✅ Tool result:
```tool_output
The Way of Code – A Taoist Manifesto for Developers

The Way of Code is a minimalist, poetic guide to programming, created as a collaboration between music producer Rick Rubin and Claude, the AI developed by Anthropic. Presented as a digital meditation, it reimagines software development through the lens of Taoist philosophy, echoing the spirit of the Tao Te Ching. It introduces the idea of “vibe coding”—a gentle, ego-free way of creating software that prioritizes intuition, presence, and flow over rigid structure or performance metrics. Each short passage encourages stillness, simplicity,

## Next steps

Build and explore more with Gemma 3n models:

## Inference

* [Multimodal inference using Gemma 3n via pipeline](https://colab.research.google.com/github/huggingface/huggingface-gemma-recipes/blob/main/notebooks/gemma3n_inference_via_pipeline.ipynb)

## Fine Tuning

* [Fine tuning Gemma 3n 2B on free Colab T4](https://colab.research.google.com/github/huggingface/huggingface-gemma-recipes/blob/main/notebooks/fine_tune_gemma3n_on_t4.ipynb)

* [Fine tuning Gemma 3n 4B with Unsloth on free Colab T4](https://colab.research.google.com/github/huggingface/huggingface-gemma-recipes/blob/main/notebooks/Gemma3N_(4B)-Conversational.ipynb)

