# Tool Use in Large Language Models

## Introduction

Welcome to this hands-on tutorial on **Tool Use** (also known as **Function Calling**) in Large Language Models (LLMs)!

### What is Tool Use?

Tool Use is a powerful feature that allows LLMs to interact with external tools and functions. Instead of trying to perform complex calculations or access real-time information directly, the model can:

1. **Recognize** when it needs to use a tool
2. **Request** a function call with appropriate parameters
3. **Process** the function's output
4. **Generate** a natural language response based on the results

### Why is Tool Use Important?

Tool use bridges these gaps by enabling LLMs to:
-  Delegate tasks to specialized tools
-  Access up-to-date information
-  Perform reliable computations
-  Integrate with existing software systems

### How Does It Work?

The typical function calling workflow consists of:

1. **Tool Definition**: Define functions with clear signatures and docstrings
2. **Schema Generation**: Convert functions to JSON schemas
3. **Model Prompting**: Pass the schemas to the model along with user queries
4. **Tool Call Detection**: Parse the model's output for tool call requests
5. **Execution**: Execute the requested functions with provided arguments
6. **Result Integration**: Feed results back to the model
7. **Final Response**: Generate a user-friendly answer

In this tutorial, we'll build a **calculator assistant** that can solve complex mathematical expressions by calling appropriate calculator functions.

<br>
<br>

## Step 1: Import Required Libraries

Let's start by importing the necessary libraries:
- `json`: For handling JSON data
- `re`: For parsing model outputs with regular expressions
- `get_json_schema`: A utility from transformers to convert Python functions to JSON schemas

**Note: No installation required**

In [1]:
from typing import Any, Dict, List
import json
import re
from transformers.utils import get_json_schema

## Step 2: Define Calculator Functions

### Writing Tool-Ready Functions

For a function to work with LLM tool calling, it must follow specific conventions:

#### ‚úÖ Required Format:
1. **Descriptive function name**: Clear and self-explanatory
2. **Type hints**: All arguments must have type annotations
3. **Google-style docstring**: Including description and `Args:` section
4. **No types in Args section**: Types go in the function signature only

#### Example Format:
```python
def function_name(arg1: type, arg2: type) -> return_type:
    """Brief description of what the function does.
    
    Args:
        arg1: Description of first argument
        arg2: Description of second argument
    
    Returns:
        Description of return value
    """
    # Function implementation
```

### Our Calculator Functions

Below we define 8 calculator functions that our LLM can call:
- **add**: Addition of two numbers
- **subtract**: Subtraction
- **multiply**: Multiplication
- **divide**: Division (with zero-check)
- **power**: Exponentiation
- **square_root**: Square root calculation (with negative-check)
- **modulo**: Remainder operation
- **absolute_value**: Absolute value

Each function returns a dictionary with:
- `operation`: The operation performed
- `operands`: The input values
- `result`: The computed result
- `expression`: A human-readable expression
- `error`: Error message (if applicable)

In [2]:
# Calculator

def add(a: float, b: float) -> Dict[str, Any]:
    """Add two numbers together.

    Args:
        a: The first number to add.
        b: The second number to add.

    Returns:
        A dictionary containing the operation name, operands, result, and
        expression string.
    """
    result = a + b
    return {
        "operation": "add",
        "operands": [a, b],
        "result": result,
        "expression": f"{a} + {b} = {result}",
    }


def subtract(a: float, b: float) -> Dict[str, Any]:
    """Subtract the second number from the first number.

    Args:
        a: The minuend (number being subtracted from).
        b: The subtrahend (number to subtract).

    Returns:
        A dictionary containing the operation name, operands, result, and
        expression string.
    """
    result = a - b
    return {
        "operation": "subtract",
        "operands": [a, b],
        "result": result,
        "expression": f"{a} - {b} = {result}",
    }


def multiply(a: float, b: float) -> Dict[str, Any]:
    """Multiply two numbers together.

    Args:
        a: The first factor.
        b: The second factor.

    Returns:
        A dictionary containing the operation name, operands, result, and
        expression string.
    """
    result = a * b
    return {
        "operation": "multiply",
        "operands": [a, b],
        "result": result,
        "expression": f"{a} * {b} = {result}",
    }


def divide(a: float, b: float) -> Dict[str, Any]:
    """Divide the first number by the second number.

    Args:
        a: The dividend (numerator).
        b: The divisor (denominator). Must not be zero.

    Returns:
        A dictionary containing the operation name, operands, result,
        expression string, and an error field if division by zero occurs.
    """
    if b == 0:
        return {
            "operation": "divide",
            "operands": [a, b],
            "result": None,
            "error": "Division by zero is undefined",
            "expression": f"{a} / {b} = ERROR",
        }
    result = a / b
    return {
        "operation": "divide",
        "operands": [a, b],
        "result": result,
        "expression": f"{a} / {b} = {result}",
    }


def power(base: float, exponent: float) -> Dict[str, Any]:
    """Raise a base number to the power of an exponent.

    Args:
        base: The base number to raise.
        exponent: The power to raise the base to.

    Returns:
        A dictionary containing the operation name, operands, result,
        expression string, and an error field if the operation fails.

    Raises:
        ValueError: If the operation is mathematically invalid.
        OverflowError: If the result is too large to represent.
    """
    try:
        result = base**exponent
        return {
            "operation": "power",
            "operands": [base, exponent],
            "result": result,
            "expression": f"{base} ^ {exponent} = {result}",
        }
    except (ValueError, OverflowError) as e:
        return {
            "operation": "power",
            "operands": [base, exponent],
            "result": None,
            "error": str(e),
            "expression": f"{base} ^ {exponent} = ERROR",
        }


def square_root(x: float) -> Dict[str, Any]:
    """Calculate the square root of a number.

    Args:
        x: The number to calculate the square root for. Must be non-negative.

    Returns:
        A dictionary containing the operation name, operands, result,
        expression string, and an error field if the input is negative.
    """
    if x < 0:
        return {
            "operation": "square_root",
            "operands": [x],
            "result": None,
            "error": "Cannot compute square root of negative number",
            "expression": f"sqrt({x}) = ERROR",
        }
    result = x**0.5
    return {
        "operation": "square_root",
        "operands": [x],
        "result": result,
        "expression": f"sqrt({x}) = {result}",
    }


def sin(x: float) -> Dict[str, Any]:
    """Calculate the sine of a number (in radians).

    Args:
        x: The angle in radians.

    Returns:
        A dictionary containing the operation name, operands, result, and
        expression string.
    """
    import math
    result = math.sin(x)
    return {
        "operation": "sin",
        "operands": [x],
        "result": result,
        "expression": f"sin({x}) = {result}",
    }

## Step 3: Generate JSON Schemas

### Understanding JSON Schemas

LLMs don't see your Python function code directly. Instead, they work with **JSON schemas** that describe:
- Function name
- Function description
- Parameter names and types
- Required parameters

The `get_json_schema()` function automatically converts Python functions into the standard JSON schema format.

#### Example Schema Structure:
```json
{
  "type": "function",
  "function": {
    "name": "add",
    "description": "Add two numbers together.",
    "parameters": {
      "type": "object",
      "properties": {
        "a": {"type": "number", "description": "The first number to add."},
        "b": {"type": "number", "description": "The second number to add."}
      },
      "required": ["a", "b"]
    }
  }
}
```

### Creating the Tools List

We convert each calculator function to a JSON schema and store them in a `tools` list. This list will be passed to the model during inference.

We also create a `function_map` dictionary to easily execute functions by name when the model requests them.

In [3]:
from collections.abc import Callable


tools: List[Dict[str, Dict]] = [
    get_json_schema(add),
    get_json_schema(subtract),
    get_json_schema(multiply),
    get_json_schema(divide),
    get_json_schema(power),
    get_json_schema(square_root),
    get_json_schema(sin),
]


function_map: Dict[str, Callable] = {
    "add": add,
    "subtract": subtract,
    "multiply": multiply,
    "divide": divide,
    "power": power,
    "square_root": square_root,
    "sin": sin,
}

print("Available Tools:\n")
print(json.dumps(tools, indent=2))

print(function_map["multiply"](2, 3)) # Test call to multiply function

Available Tools:

[
  {
    "type": "function",
    "function": {
      "name": "add",
      "description": "Add two numbers together.",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {
            "type": "number",
            "description": "The first number to add."
          },
          "b": {
            "type": "number",
            "description": "The second number to add."
          }
        },
        "required": [
          "a",
          "b"
        ]
      },
      "return": {
        "type": "object",
        "additionalProperties": {},
        "description": "A dictionary containing the operation name, operands, result, and\n    expression string."
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "subtract",
      "description": "Subtract the second number from the first number.",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {
            "type": "number",
   

**Output**: The `tools` list now contains 8 JSON schemas, and `function_map` allows us to execute functions by name.

## Step 4: Parse Tool Calls from Model Output

### Why Parsing is Necessary

Different models output tool calls in different formats:
- Some use XML-like tags: `<tool_call>{...}</tool_call>`
- Others use special markers: `[TOOL_CALLS]...[/TOOL_CALLS]`
- Some output raw JSON objects

### Our Parsing Function

The `parse_tool_calls()` function handles multiple formats:

1. **XML Format**: `<tool_call>{"name": "add", "arguments": {"a": 5, "b": 3}}</tool_call>`
2. **Raw JSON**: `{"name": "multiply", "arguments": {"a": 2, "b": 4}}`
3. **Array Format**: `[TOOL_CALLS][{...}, {...}][/TOOL_CALLS]`

The function uses regular expressions to detect and extract tool call information, making it robust to various model outputs.

In [4]:
def parse_tool_calls(response_text: str) -> List[Dict[str, Any]]:
    """
    Parse tool calls from model response.
    Supports multiple formats:
    - <tool_call>{...}</tool_call>
    - Multiple individual JSON objects on separate lines
    - [TOOL_CALLS] JSON_ARRAY [/TOOL_CALLS]
    - Generic JSON objects with "name" and "arguments"
    """
    tool_calls = []

    # Pattern 1: <tool_call> XML format (handles multiline JSON)
    try:
        # Regex: <tool_call>\s*(\{[\s\S]*?\})\s*</tool_call>
        # Matches: <tool_call> + optional whitespace + capture group ({ to }) + optional whitespace + </tool_call>
        # [\s\S]*? matches any character including newlines (non-greedy)
        # This captures JSON objects wrapped in XML-like tags
        tool_call_pattern = r"<tool_call>\s*(\{[\s\S]*?\})\s*</tool_call>"
        tool_call_matches = re.findall(tool_call_pattern, response_text)

        if tool_call_matches:
            for match in tool_call_matches:
                try:
                    # Clean up the JSON string
                    json_str = match.strip()
                    tool_call = json.loads(json_str)
                    tool_calls.append(tool_call)
                except json.JSONDecodeError as e:
                    # Try to find JSON objects within the match
                    pass
    except Exception as e:
        pass

    # Pattern 2: Standalone JSON objects (each on own line or separated by newlines)
    if not tool_calls:
        try:
            # Regex: \{[^{}]*\"name\"[^{}]*\"arguments\"[^{}]*\}
            # Matches: { + any chars except braces + "name" + any chars except braces + "arguments" + any chars except braces + }
            # [^{}]* ensures we only match single-level JSON objects (no nested braces)
            # This captures flat JSON objects that contain both "name" and "arguments" keys
            json_pattern = r"\{[^{}]*\"name\"[^{}]*\"arguments\"[^{}]*\}"
            json_matches = re.findall(json_pattern, response_text)

            for match in json_matches:
                try:
                    tool_call = json.loads(match)
                    if "name" in tool_call:
                        tool_calls.append(tool_call)
                except json.JSONDecodeError:
                    pass
        except Exception as e:
            pass

    # Pattern 3: [TOOL_CALLS] format
    if not tool_calls:
        try:
            # Regex: \[TOOL_CALLS\](.*?)(\[/TOOL_CALLS\]|$)
            # Matches: [TOOL_CALLS] + capture group (any chars, non-greedy) + either [/TOOL_CALLS] or end of string
            # re.DOTALL makes . match newlines too
            # (.*?) captures everything between the opening and closing tags (or end of text)
            # This handles array format: [TOOL_CALLS][{...}, {...}][/TOOL_CALLS]
            match = re.search(
                r"\[TOOL_CALLS\](.*?)(\[/TOOL_CALLS\]|$)",
                response_text,
                re.DOTALL
            )
            if match:
                json_str = match.group(1).strip()
                if json_str.startswith("["):
                    tool_calls = json.loads(json_str)
                else:
                    tool_calls = [json.loads(json_str)]
        except Exception as e:
            pass

    return tool_calls




## Step 5: Execute Tool Calls

### The Execution Function

Once we've parsed the tool call requests from the model, we need to actually execute them. The `execute_tool_call()` function:

1. **Validates** the function name exists in our `function_map`
2. **Calls** the appropriate function with the provided arguments
3. **Handles errors** gracefully with try-catch blocks
4. **Returns** the result in a structured format

In [5]:
def execute_tool_call(func_name: str, args: Dict[str, Any]) -> Dict[str, Any]:
    """Execute a single tool call."""
    if func_name not in function_map:
        return {
            "error": f"Unknown function: {func_name}",
            "available_functions": list(function_map.keys())
        }

    try:
        return function_map[func_name](**args)
    except Exception as e:
        return {
            "error": f"Error executing {func_name}: {str(e)}",
            "function": func_name,
            "arguments": args
        }


## Step 6: Load a Tool-Use-Compatible Model

### Model Selection

We use **SmolLM3-3B**, a compact language model that supports function calling. Key considerations when choosing a model:

- ‚úÖ **Tool-use capability**: The model must be trained to understand and generate tool calls
- ‚úÖ **Size vs Performance**: Larger models (e.g., Llama-3-70B, Command-R) offer better performance but require more resources
- ‚úÖ **Format compatibility**: Different models use different tool call formats, e.g.,  <tool_call> XML format, Standalone JSON objects, [TOOL_CALLS] format

### Popular Tool-Use Models

- **SmolLM3** (3B parameters): Lightweight, good for experimentation
- **Hermes-2-Pro-Llama-3-8B**: Strong tool-use performance for its size
- **Command-R**: Enterprise-grade performance with RAG support
- **Mixtral-8x22B**: High performance, supports multiple simultaneous tool calls
- **GPT-4 / Claude**: Commercial APIs with robust tool support

In [6]:
# Load model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


# https://huggingface.co/HuggingFaceTB/SmolLM3-3B
MODEL_NAME = "HuggingFaceTB/SmolLM3-3B"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME,
    use_fast=True,
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    dtype=torch.bfloat16,
    trust_remote_code=True,
)
model.eval()
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

print(f"Model loaded successfully on device: {model.device}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/289 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/182 [00:00<?, ?B/s]

Model loaded successfully on device: cuda:0


## Step 7: First Tool Call - Initial Model Response

### The Process Flow

1. **Create conversation**: Build a messages list with system prompt and user query
2. **Apply chat template**: Use `apply_chat_template()` with `tools` parameter
3. **Generate response**: Let the model decide which tools to call
4. **Decode output**: Convert model tokens to text

### Key Parameters

- `tools=tools`: Passes our function schemas to the model
- `add_generation_prompt=True`: Adds the assistant's turn marker
- `return_dict=True`: Returns input_ids and attention_mask
- `enable_thinking=False`: Disables chain-of-thought (optional feature)

### What to Expect

The model will analyze the query: *"What is the result of (15 + 27) * 3 - sqrt(81)?"*

It should recognize that it needs to:
1. First add 15 + 27
2. Then multiply the result by 3
3. Calculate the square root of 81
4. Finally subtract the square root from the multiplication result

**Output**: **SmolLM3-3B** defaults to tool call requests in XML format. You should see `<tool_call>...</tool_call>` tags wrapping a JSON schema specifying which functions to call and their arguments.

In [7]:
# Define the user's mathematical query
user_query = "What is the result of 3.8^2.2 * sin(3)?"

# Build the conversation with system prompt and user query
# This creates the initial context for the model
conversation_messages = [
        {
            "role": "system",
            "content": "You are a helpful calculator assistant. Use the provided calculator functions to solve math problems accurately. CRITICAL: You must NEVER compute any (even the simplist) mathematical results yourself. ALWAYS call the appropriate functions and WAIT for their computed results, excluding nested calls as we are computing iteratively. Do not provide any numerical answers until you have received the tool's output.",
        },
        {"role": "user", "content": user_query},
    ]

# Apply chat template to format the conversation for the model
# tools=tools: Include function schemas so model knows what functions are available
# add_generation_prompt=True: Add the assistant's turn marker
# return_dict=True: Return both input_ids and attention_mask
# return_tensors="pt": Return PyTorch tensors
# enable_thinking=False: Disable chain-of-thought reasoning
inputs = tokenizer.apply_chat_template(
    conversation_messages,
    tools=tools,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
    enable_thinking=False,
).to(model.device)

# Generate the model's response
# The model will analyze the query and decide which tools to call
# max_new_tokens=256: Generate up to 256 new tokens
# temperature=0.6: Controls randomness (lower = more deterministic)
# do_sample=True: Enable sampling for more varied responses
# top_p=0.95: Nucleus sampling - consider tokens with cumulative probability 95%
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.2,
    do_sample=True,
    top_p=0.95,
)

# outputs[0][inputs["input_ids"].shape[1]:] extracts just the new tokens
# skip_special_tokens=False: Keep special tokens like <tool_call> tags for parsing
out_text = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1] :], skip_special_tokens=False
)

print("Initial model output:\n", out_text)

Initial model output:
 <tool_call>
{"name": "power", "arguments": {"base": 3.8, "exponent": 2.2}}
</tool_call>
<tool_call>
{"name": "sin", "arguments": {"x": 3}}
</tool_call>

<tool_call>
{"name": "multiply", "arguments": {"a": {"$result": {"name": "power", "arguments": {"base": 3.8, "exponent": 2.2}}, "b": {"$result": {"name": "sin", "arguments": {"x": 3}}}}, "result": {"$result": {"name": "power", "arguments": {"base": 3.8, "exponent": 2.2}}, "b": {"$result": {"name": "sin", "arguments": {"x": 3}}}}}
</tool_call><|im_end|>


## Step 8: Parse Model Output's Tool Call Requests

Now we extract the structured tool call requests from the model's output using the `parse_tool_calls()` function.

### Expected Output Format

Each parsed tool call should contain:
```python
{
    "name": "function_name",
    "arguments": {"arg1": value1, "arg2": value2},
    "id": "call_id"  # Optional, used by some models
}
```

For our example query, we expect to see calls to functions like `add`, `multiply`, and `square_root`.

In [8]:
tool_calls_parsed = parse_tool_calls(out_text)
if tool_calls_parsed:
    print("First tool calls found:", tool_calls_parsed)
else:
    print("No tool calls parsed from the model output.")

First tool calls found: [{'name': 'power', 'arguments': {'base': 3.8, 'exponent': 2.2}}, {'name': 'sin', 'arguments': {'x': 3}}]


## Step 9: Add Tool Calls to Conversation

### Conversation Format

We need to append the assistant's tool call requests to the conversation history. The standard format is:

```python
{
    "role": "assistant",
    "tool_calls": [
        {
            "type": "function",
            "function": {
                "name": "function_name",
                "arguments": {...}
            }
        }
    ]
}
```

### Why This Matters

This standardized format:
- Maintains conversation continuity
- Allows the model to track its own tool calls
- Enables proper context for interpreting tool results
- Follows the OpenAI/HuggingFace API conventions (with dict arguments, not JSON strings)

In [9]:
tool_calls = [{"type": "function", "function": f} for f in tool_calls_parsed]

conversation_messages.append({"role": "assistant", "tool_calls": tool_calls})


print("Conversation messages with tool calls:", conversation_messages)

Conversation messages with tool calls: [{'role': 'system', 'content': "You are a helpful calculator assistant. Use the provided calculator functions to solve math problems accurately. CRITICAL: You must NEVER compute any (even the simplist) mathematical results yourself. ALWAYS call the appropriate functions and WAIT for their computed results, excluding nested calls as we are computing iteratively. Do not provide any numerical answers until you have received the tool's output."}, {'role': 'user', 'content': 'What is the result of 3.8^2.2 * sin(3)?'}, {'role': 'assistant', 'tool_calls': [{'type': 'function', 'function': {'name': 'power', 'arguments': {'base': 3.8, 'exponent': 2.2}}}, {'type': 'function', 'function': {'name': 'sin', 'arguments': {'x': 3}}}]}]


**Output**: The conversation now includes an assistant message with properly formatted tool calls.

## Step 10: Execute Functions and Add Results

### The Execution Loop

For each tool call requested by the model:
1. Extract the function name and arguments
2. Call `execute_tool_call()` to run the actual function
3. Convert the result to JSON string
4. Append a tool message to the conversation

### Tool Message Format

```python
{
    "role": "tool",
    "name": "function_name",
    "content": "json_result_string",
    "tool_call_id": "call_id"  # Required by some models like Mistral/Mixtral
}
```

### Important Notes

- The `content` must be a **string**, not a dict (convert results with `json.dumps()`)
- Some models require a `tool_call_id` to match calls with responses
- The `role: "tool"` indicates this message contains function output

This step is where the actual computation happens - the LLM's requests are translated into real function executions!

In [10]:


for i, tool_call in enumerate(tool_calls_parsed):
    func_name = tool_call.get("name")
    args = tool_call.get("arguments", {})
    call_id = tool_call.get("id", f"call_{i}")


    print(f"\n  [{i+1}] Executing: {func_name}({args})")

    result = execute_tool_call(func_name, args)

    if "error" in result:
      print (f"{result}, skipping appending it to the conversation history.")
      continue

    print(f"Result: {result}")

    conversation_messages.append(
        {
            "role": "tool",
            "name": func_name,
            "content": json.dumps(result),
            "tool_call_id": call_id,
        }
    )



  [1] Executing: power({'base': 3.8, 'exponent': 2.2})
Result: {'operation': 'power', 'operands': [3.8, 2.2], 'result': 18.85922806856392, 'expression': '3.8 ^ 2.2 = 18.85922806856392'}

  [2] Executing: sin({'x': 3})
Result: {'operation': 'sin', 'operands': [3], 'result': 0.1411200080598672, 'expression': 'sin(3) = 0.1411200080598672'}


**Output**: Each tool call is executed and you'll see the function being called with its arguments, followed by the result (e.g., `power(3.8, 2.2)` ‚Üí `{"operation": "power", ..., "result": 18.859, ...}`).

## Step 11: Review the Complete Conversation

At this point, our conversation history contains:
1. **System message**: Instructions for the assistant
2. **User message**: The original query
3. **Assistant message**: Tool call requests
4. **Tool message(s)**: Function execution results

This complete context will be passed to the model in the next generation step, allowing it to formulate a final answer based on the tool outputs.

In [11]:

print("Conversation messages after tool execution:", conversation_messages)

Conversation messages after tool execution: [{'role': 'system', 'content': "You are a helpful calculator assistant. Use the provided calculator functions to solve math problems accurately. CRITICAL: You must NEVER compute any (even the simplist) mathematical results yourself. ALWAYS call the appropriate functions and WAIT for their computed results, excluding nested calls as we are computing iteratively. Do not provide any numerical answers until you have received the tool's output."}, {'role': 'user', 'content': 'What is the result of 3.8^2.2 * sin(3)?'}, {'role': 'assistant', 'tool_calls': [{'type': 'function', 'function': {'name': 'power', 'arguments': {'base': 3.8, 'exponent': 2.2}}}, {'type': 'function', 'function': {'name': 'sin', 'arguments': {'x': 3}}}]}, {'role': 'tool', 'name': 'power', 'content': '{"operation": "power", "operands": [3.8, 2.2], "result": 18.85922806856392, "expression": "3.8 ^ 2.2 = 18.85922806856392"}', 'tool_call_id': 'call_0'}, {'role': 'tool', 'name': 'si

**Output**: The complete conversation history showing all messages: system, user, assistant (with tool calls), and tool (with results).

## Step 12: Generate Second-Round Response

Now we pass the **entire conversation** (including tool results) back to the model to generate a natural language response.

### What Happens Here

1. The model reads its own tool call requests
2. It processes the tool execution results
3. It synthesizes the information into a user-friendly response with next-step planning.

### Expected Output

For the query *"What is the result of 3.8^2.2 * sin(3)?"*, the model should incorporate the calculated results of 3.8^2.2 = 18.859 and sin(3) = 0.141, and plan for next tool call to calculate 18.859 * 0.141.

In [12]:
inputs = tokenizer.apply_chat_template(
    conversation_messages,
    tools=tools,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
    enable_thinking=False,
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.1,
    do_sample=False,
    top_p=0.95,
)

model_response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1] :], skip_special_tokens=True
)



print("User query:\n", user_query)
print("\nModel response:\n", model_response)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


User query:
 What is the result of 3.8^2.2 * sin(3)?

Model response:
 <tool_call>
{"name": "multiply", "arguments": {"a": 18.85922806856392, "b": 0.1411200080598672}}
</tool_call>


**Output**: (Hopefully) another tool call requesting multiplying the result of 3.8^2.2 with the result of sin(3).

## Step 13: Full with Multi-Turn Tool Use Pipeline

### Building a Robust Pipeline

The previous steps showed a full tool-use round + begin of the second round. In practice, complex queries may require **multiple rounds** of tool calls:

Example: *"What is ((5 + 3) * 2) ^ 2?"*
- **Round 1**: Call `add(5, 3)` ‚Üí Get result: 8
- **Round 2**: Call `multiply(8, 2)` ‚Üí Get result: 16
- **Round 3**: Call `power(16, 2)` ‚Üí Get result: 256
- **Final**: Generate answer: "256"

### Pipeline Features

The `pipeline()` function handles:

1. **Multi-turn conversations**: Automatically loops until no more tool calls
2. **Tool call limits**: Prevents infinite loops with `max_tool_calls`
3. **Automatic result injection**: Executes functions and adds results to conversation
4. **Thinking mode**: Optional chain-of-thought reasoning

In [13]:
user_query = "What is the result of 3.8^2.2 * sin(3)?"
print("User query:\n", user_query)

def pipeline(
    user_query: str,
    thinking: bool = False,
    max_new_tokens: int = 512,
    max_tool_calls: int = 5,
) -> str:

    messages = [
        {
            "role": "system",
            "content": "You are a helpful calculator assistant. Use the provided calculator functions to solve math problems accurately. CRITICAL: You must NEVER compute any (even the simplist) mathematical results yourself. ALWAYS call the appropriate functions and WAIT for their computed results, excluding nested calls as we are computing iteratively. Do not provide any numerical answers until you have received the tool's output.",
        },
        {"role": "user", "content": user_query},
    ]

    round_idx = 0
    while round_idx < max_tool_calls:

        print(f"\n{'='*60}")
        print(f"ROUND {round_idx + 1}")
        print(f"{'='*60}")

        inputs = tokenizer.apply_chat_template(
            messages,
            tools=tools,
            add_generation_prompt=True,
            return_dict=True,
            return_tensors="pt",
            enable_thinking=thinking,
        ).to(model.device)

        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.2,
            do_sample=False,
            top_p=0.95,
        )

        out_text = tokenizer.decode(
            outputs[0][inputs["input_ids"].shape[1] :], skip_special_tokens=False
        )

        tool_calls_parsed = parse_tool_calls(out_text)

        if not tool_calls_parsed:
            print("\n‚úì No more tool calls detected - generating final answer...")
            print(f"\nFinal Response:\n{out_text}")
            return out_text

        if round_idx == max_tool_calls - 1:
            print(f"\n‚ö† Maximum tool call rounds ({max_tool_calls}) reached.")
            return out_text

        print(f"\nüìã Tool Calls Requested: {len(tool_calls_parsed)}")

        tool_calls = [{"type": "function", "function": f} for f in tool_calls_parsed]
        messages.append({"role": "assistant", "tool_calls": tool_calls})

        for i, tool_call in enumerate(tool_calls_parsed):
            func_name = tool_call.get("name")
            args = tool_call.get("arguments", {})

            print(f"\n  üîß [{i+1}] Calling: {func_name}({', '.join(f'{k}={v}' for k, v in args.items())})")

            result = execute_tool_call(func_name, args)

            if "error" in result:
                print(f"     ‚ùå Error: {result['error']}")
                # Skip adding errored results to conversation
                continue
            elif "result" in result:
                print(f"     ‚úì Result: {result['result']}")
            else:
                print(f"     ‚úì Result: {result}")

            tool_messsage = {
                "role": "tool",
                "name": func_name,
                "content": json.dumps(result),
            }

            if "id" in tool_call:
                tool_messsage["tool_call_id"] = tool_call["id"]

            messages.append(tool_messsage)

        round_idx += 1


model_response = pipeline(user_query, max_new_tokens=512)
print("\nModel response:\n", model_response)

User query:
 What is the result of 3.8^2.2 * sin(3)?

ROUND 1

üìã Tool Calls Requested: 3

  üîß [1] Calling: power(base=3.8, exponent=2.2)
     ‚úì Result: 18.85922806856392

  üîß [2] Calling: sin(x=3)
     ‚úì Result: 0.1411200080598672

  üîß [3] Calling: multiply(a={'$result': {'name': 'power', 'arguments': {'base': 3.8, 'exponent': 2.2}}, 'b': {'$result': {'name': 'sin', 'arguments': {'x': 3}}}})
     ‚ùå Error: Error executing multiply: multiply() missing 1 required positional argument: 'b'

ROUND 2

üìã Tool Calls Requested: 1

  üîß [1] Calling: multiply(a=18.85922806856392, b=0.1411200080598672)
     ‚úì Result: 2.661414417038614

ROUND 3

‚úì No more tool calls detected - generating final answer...

Final Response:
</think>
{"operation": "multiply", "operands": [18.85922806856392, 0.1411200080598672], "result": 2.661414417038614, "expression": "18.85922806856392 * 0.1411200080598672 = 2.661414417038614"}<|im_end|>

Model response:
 </think>
{"operation": "multiply", "

**Output**: You'll see round-by-round progress (e.g., "--- Round 1 ---", "--- Round 2 ---") as the model makes tool calls, receives results, and eventually generates the final answer.

---

## Summary and Key Takeaways

### What We Learned

1. **Function Calling Basics**
   - LLMs can request external function calls rather than computing everything internally
   - Functions must follow specific conventions (type hints, docstrings)
   - JSON schemas describe functions to the model

2. **The Tool-Use Workflow**
   ```
   User Query ‚Üí Model Analysis ‚Üí Tool Call Request ‚Üí
   Function Execution ‚Üí Result Injection ‚Üí Final Response
   ```

3. **Critical Components**
   - **Tool definitions**: Well-documented Python functions
   - **Schema generation**: Converting functions to JSON
   - **Parsing**: Extracting tool calls from model output
   - **Execution**: Running the requested functions
   - **Result formatting**: Feeding results back in the correct format

4. **Multi-Turn Interactions**
   - Complex queries may require multiple rounds
   - Each round builds on previous results
   - Proper conversation management is essential


### Real-World Applications

Function calling enables:
- **Weather bots**: Access real-time weather data
- **Data analysis**: Query databases and visualize results
- **Calculators**: Perform precise mathematical operations
- **Search agents**: Retrieve information from external sources
- **E-commerce**: Check inventory, place orders
- **Scheduling**: Book appointments, check availability
- **API integration**: Interact with any REST API

## References and Further Reading

### Official Documentation

**Hugging Face Transformers - Tool Use**
- [Expanding Chat Templates with Tools and Documents](https://huggingface.co/docs/transformers/v4.49.0/en/chat_template_tools_and_documents)
- [Tool Use Guide (v5.0)](https://huggingface.co/docs/transformers/v5.0.0rc2/chat_extras)

### Recommended Models for Tool Use

**Open-Source Models**
- [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) - Excellent tool-use performance
- [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) - Lightweight, good for learning
- [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-v01) - Enterprise-grade with RAG support
- [Mixtral-8x22B](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) - High performance, parallel tool calls
- [Llama-3.1 series](https://huggingface.co/meta-llama) - Strong tool-use capabilities

### Additional Resources

- [JSON Schema Documentation](https://json-schema.org/learn/getting-started-step-by-step)
- [Google Python Style Guide - Docstrings](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings)
- [OpenAI Function Calling Guide](https://platform.openai.com/docs/guides/function-calling)