# Integrating tools with the kluster.ai API

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kluster-ai/klusterai-cookbook/blob/main/examples/integrating-tools.ipynb)

Tools let you give an LLM safe, schema-defined superpowers. During a chat completion, the model can call any function you expose by supplying JSON arguments instead of prose, then fold the result back into its reply. Your code runs the function, keeping credentials and business logic out of the model while unlocking actions like database queries, BTC/USD look-ups, math, web scraping, or calendar updates. In short, the LLM handles intent and dialogue; your code delivers auditable side effects.

This notebook shows how to use the kluster.ai tools endpoint with Python. We’ll cover:

1. Setting up the environment  
2. Calling a single tool  
3. Trying multiple tools (calculator, web search, etc.)  
4. Handling tool outputs and streaming responses

## Prerequisites

Before getting started, ensure you have the following:

- **A kluster.ai account** - sign up on the <a href="https://platform.kluster.ai/signup" target="_blank">kluster.ai platform</a> if you don't have one
- **A kluster.ai API key** - after signing in, go to the <a href="https://platform.kluster.ai/apikeys" target="_blank">**API Keys**</a> section and create a new key. For detailed instructions, check out the <a href="/get-started/get-api-key/" target="_blank">Get an API key</a> guide

## Setup

In this notebook, we'll use Python's `getpass` module to input the key safely. After execution, please provide your unique kluster.ai API key (ensure no spaces).

In [1]:
from getpass import getpass

api_key = getpass("Enter your kluster.ai API key: ")

Enter your kluster.ai API key:  ········


Install the OpenAI Python client library:

In [2]:
%pip install -q openai

Note: you may need to restart the kernel to use updated packages.


With the OpenAI Python library installed, import the dependencies for this tutorial:

In [3]:
import os
from openai import OpenAI
import json
from IPython.display import display, Markdown, HTML

Finally, create the client pointing to the kluster.ai endpoint with your API key:

In [4]:
# Set up the client
client = OpenAI(
    base_url="https://api.kluster.ai/v1",
    api_key=api_key,
)

## Define the model

This example selects the `klusterai/Meta-Llama-3.1-8B-Instruct-Turbo` model. If you'd like to use a different model, feel free to change it by modifying the model field. Remember to use the full length model name to avoid errors.

Please refer to the [Supported models](https://docs.kluster.ai/get-started/models/) section for a list of the models we support.

In [5]:
# Choose the LLM to use throughout this tutorial
model = "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo"

## Prepare the prompt

We’ll store the baseline prompt in a variable so we can reuse it when we invoke the model. This baseline prompt will be changed and expanded later in the tutorial.

In [6]:
baseline_prompt = "What is 1337 multiplied by 42?"

## Basic tool calling

kluster.ai supports tool calling similar to OpenAI's function calling. Let's start with a simple example using a calculator tool. 

kluster.ai treats tools as a capability you expose to the model: by including its JSON-Schema in the tools array, you tell the LLM, “if the user asks for arithmetic, call this function instead of guessing the answer.” When we send the prompt “What is 1337 × 42?” with `tool_choice="auto"`, the model recognizes that the calculator is the best way to satisfy the request and answers not with prose but with a `tool_calls` block that contains the function name and a properly-formatted argument string ("1337 * 42").

In [7]:
def run_with_tools(prompt, tools, model=model):
    messages = [
        {"role": "user", "content": prompt}
    ]
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    
    return response

# Define a calculator tool
calculator_tools = [
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Perform arithmetic calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

# Test with a math problem
calculator_response = run_with_tools(
    baseline_prompt, 
    calculator_tools
)

print(json.dumps(calculator_response.model_dump(), indent=2))

{
  "id": "chatcmpl-a8059d75-cff4-444c-81f9-c8ffffb368ca",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": null,
        "refusal": null,
        "role": "assistant",
        "audio": null,
        "function_call": null,
        "tool_calls": [
          {
            "id": "chatcmpl-tool-1415b5225e9c4526b29c9df125a3b11e",
            "function": {
              "arguments": "{\"expression\": \"1337 * 42\"}",
              "name": "calculator"
            },
            "type": "function"
          }
        ]
      },
      "stop_reason": 128008
    }
  ],
  "created": 1747264145,
  "model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 21,
    "prompt_tokens": 252,
    "total_tokens": 273,
    "completion_tokens_details": null,
    "prompt_tokens_details": null
  },
  "promp

### Interpreting the tool-call response

Let's take a closer look at the response above. The assistant’s reply isn’t prose; rather, it’s a structured tool call:

1. **`finish_reason: "tool_calls"`** – signals the model has paused, waiting for us to run one or more tools
2. **`message.tool_calls[0]`** – an array item that describes what to run:
   * `id` – a unique identifier we must echo back
   * `function.name` – here it’s `calculator`
   * `function.arguments` – JSON-encoded string with the expression `"1337 * 42"`
3. **`content: null`** – no human-readable answer yet; that will come after we execute the tool and return the result

In short, the model has delegated the arithmetic. Our job is to run `execute_calculator("1337 * 42")`, package the numeric result in a `{role:"tool"}` message (preserving the `tool_call_id`), and feed it back to the chat endpoint.

The next section will walk through that hand-off step by step.

### Tool-response processing

To turn an LLM tool call into a human-friendly answer, we’ll take the following steps:

1. **Parse the tool call** – inspect `response.choices[0].message.tool_calls`, grab the function name, and JSON-decode its arguments.  
2. **Run the side-effect safely** – hand the expression to `execute_calculator()`, which allowlists characters and evaluates it (placeholder logic; swap in a real math parser for production).  
3. **Return the result to the model** – craft a new chat turn with `role:"tool"`, preserve the original `tool_call_id`, and embed a JSON payload such as `{ "result": 56154 }`.  
4. **Let the model finish the thought** – call `chat.completions.create()` again so the LLM can weave the raw number into friendly prose (e.g., “The result of multiplying 1337 by 42 is 56,154”).  

Run the cells below to see this two-step dance **model → tool → model** in action.


In [8]:
import math
import re

def execute_calculator(expression):
    # Simple calculator using eval() (note: never use this in production without proper validation)
    # In production, use a safer method for evaluation
    try:
        # Basic sanitization
        if not re.match(r'^[0-9+\-*/().%\s]+$', expression):
            return {"error": "Invalid expression. Only basic arithmetic operations are allowed."}
        
        result = eval(expression)
        return {"result": result}
    except Exception as e:
        return {"error": str(e)}

def process_tool_calls(response):
    message = response.choices[0].message
    
    # If there are no tool calls, return the message content
    if not message.tool_calls:
        return message.content
    
    # Process each tool call
    tool_results = []
    for tool_call in message.tool_calls:
        function_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)
        
        # Execute the appropriate function based on the tool call
        if function_name == "calculator":
            result = execute_calculator(arguments["expression"])
            tool_results.append({
                "tool_call_id": tool_call.id,
                "function_name": function_name,
                "result": result
            })
    
    # Create a new message with the tool results
    messages = [
        {"role": "user", "content": baseline_prompt},
        message.model_dump(),
    ]
    
    # Add the tool results
    for result in tool_results:
        messages.append({
            "role": "tool",
            "tool_call_id": result["tool_call_id"],
            "content": json.dumps(result["result"])
        })
    
    # Get the final response
    final_response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    
    return final_response.choices[0].message.content

# Process the calculator response
final_answer = process_tool_calls(calculator_response)
print(final_answer)

The result of multiplying 1337 by 42 is 56,154.


## Advanced tool-calling example: live web search

The calculator example kept all logic local, but real-world apps often need fresh data. We'll register a `web_search(query: str)` tool so the LLM can pause, fetch live results, and then weave them into its answer.

In [9]:
# 1. Describe the tool in JSON-schema form
web_search_tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

Why a stub? In production, you'd call Bing, Google, or an internal search API. For this demo, we return deterministic mock data so you can run the notebook offline.

In [10]:
def execute_web_search(query: str):
    """Return mock search results."""
    if "climate" in query.lower():
        return {
            "results": [
                {
                    "title": "Climate Change Effects – Latest Research",
                    "snippet": "New studies show increasing impacts of climate change on global ecosystems.",
                    "url": "https://example.com/climate-research"
                },
                {
                    "title": "Renewable Energy Solutions for Climate Change",
                    "snippet": "Advancements in renewable energy technologies show promise in addressing climate challenges.",
                    "url": "https://example.com/renewable-climate"
                }
            ]
        }
    return {
        "results": [
            {
                "title": f"Search results for: {query}",
                "snippet": "Sample search result for demonstration purposes",
                "url": "https://example.com/search"
            }
        ]
    }

When the model emits a tool call, we run the tool and post the result back as a `role="tool"` message.  
Finally, we ask the model to finish the answer.

In [11]:
import json

def process_web_search(response, original_query):
    """Handle any web_search tool calls and get the model’s final synthesis."""
    msg = response.choices[0].message
    if not msg.tool_calls:
        return msg.content  # nothing to do

    tool_msgs = []
    for call in msg.tool_calls:
        args = json.loads(call.function.arguments)
        if call.function.name == "web_search":
            result = execute_web_search(args["query"])
            tool_msgs.append({
                "role": "tool",
                "tool_call_id": call.id,
                "content": json.dumps(result)
            })

    follow_up_messages = [
        {"role": "user", "content": original_query},
        msg.model_dump(),
        *tool_msgs,
    ]

    final = client.chat.completions.create(
        model=model,
        messages=follow_up_messages
    )
    return final.choices[0].message.content

### Run the demo

In [12]:
search_query = "What are the latest findings on climate change?"
search_response = run_with_tools(search_query, web_search_tools)
print(process_web_search(search_response, search_query))

The latest findings on climate change include:

1. **Increasing impacts of climate change on global ecosystems**: New studies have shown the effects of climate change on various ecosystems, including melting glaciers, rising sea levels, and changes in weather patterns.
2. **Advancements in renewable energy technologies**: Research has been focusing on improving renewable energy sources such as solar and wind power, which have shown promise in reducing carbon emissions and addressing climate challenges.
3. **Rising global temperatures**: Climate models predict that the planet will continue to warm, with an increase in average global temperatures by 2-4°C by the end of the century if greenhouse gas emissions continue to rise.
4. **Arctic ice sheet melting**: The Arctic ice sheet has been melting at an unprecedented rate, leading to changes in ocean currents and sea levels.
5. **Sea-level rise**: Coastal areas and low-lying islands are at risk of flooding due to rising sea levels, which i

## Multi-tool example

Real-world questions often need more than one capability. For instance, a user might ask:

> “Look up Bitcoin’s market cap **and** convert it to euros.”

By registering several tools in a single `tools` array, we give the LLM a menu of options. We’ll demonstrate with the question:

> “If Earth’s temperature rises by 2 °C, what percentage increase is that from the current average of 15 °C?”

To answer, the model only needs arithmetic, but if the prompt also required live data—e.g., “*…and cite a recent study on global warming*”—it could call **two** tools in one turn: first `web_search`, then `calculator`. Here’s how we orchestrate that multi-tool workflow:

1. **Describe each tool** – supply JSON-schema specs for `calculator` and `web_search`
2. **Let the LLM plan** – pass both specs in `multi_tools`; the LLM can issue one or many `tool_calls`
3. **Dispatch and execute** – `process_multi_tool_calls()` loops over each call, runs the matching helper, and returns results as `{role:"tool"}` messages
4. **Finish in plain English** – a follow-up `chat.completions.create()` lets the model weave everything into a readable answer

In [13]:
# Describe multiple tools
multi_tools = [
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Perform arithmetic calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate"
                    }
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

The below code will run each requested tool, then feed the results back to the model.

In [14]:
import json

def process_multi_tool_calls(response, original_query):
    msg = response.choices[0].message
    if not msg.tool_calls:
        return msg.content

    tool_msgs = []
    for call in msg.tool_calls:
        args = json.loads(call.function.arguments)
        if call.function.name == "calculator":
            result = execute_calculator(args["expression"])
        elif call.function.name == "web_search":
            result = execute_web_search(args["query"])
        else:
            result = {"error": f"Unknown tool: {call.function.name}"}

        tool_msgs.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": json.dumps(result)
        })

    follow_up = [
        {"role": "user", "content": original_query},
        msg.model_dump(),
        *tool_msgs,
    ]

    final = client.chat.completions.create(
        model=model,
        messages=follow_up
    )
    return final.choices[0].message.content

### Try out the multi-tool demo

In [15]:
multi_query = (
    "If the Earth's temperature rises by 2 degrees, what percentage increase "
    "is that from the current average global temperature of 15 degrees Celsius?"
)
multi_response = run_with_tools(multi_query, multi_tools)
print(process_multi_tool_calls(multi_response, multi_query))

The percentage increase in the Earth's temperature would be approximately 13.33%.


## Real-world use case: document analysis with tools

We’ll let the model read a report and, when it spots numbers, invoke the calculator:

1. **Wrap the context** – embed the document and the user’s question in a single prompt  
2. **Let the LLM decide** – pass the prompt along with `multi_tools`; the model can choose to read or calculate  
3. **Process tool calls** – if the model pauses with a `tool_calls` block, we run the appropriate tools and feed the results back  
4. **Return prose** – the model weaves the numeric answer into a natural-language response

This pattern scales to meeting minutes, legal contracts, or server log files—anywhere the model must blend language understanding with deterministic math.


In [16]:
def document_analysis_with_tools(document, question):
    # Prepare the prompt
    prompt = f"""
Document: 
{document}

Question about the document: {question}

Please answer the question based on the document. If calculations are needed, use the calculator tool.
"""
    
    # Use the multi-tools from before
    response = run_with_tools(prompt, multi_tools)
    final_answer = process_multi_tool_calls(response, prompt)
    
    return final_answer

# Sample document and question
sample_document = """
Kluster.ai Performance Report 2024

In Q1 2024, our platform processed 2.5 million requests, a 25% increase from Q4 2023 (2 million requests). 
The average response time was reduced from 350ms to 280ms, representing a 20% improvement.
Our customer base grew from 500 to 800 companies, and revenue increased from $1.2M to $1.8M.
"""

sample_question = "What was the percentage increase in revenue according to the report?"

document_analysis_result = document_analysis_with_tools(sample_document, sample_question)
print(document_analysis_result)

The error message indicates that the calculator does not support the multiplication and division operators in the expression.

To calculate the percentage increase in revenue, we can rewrite the expression as:

(($1.8M - $1.2M) / $1.2M) * 100

First, let's calculate the numerator:

$1.8M - $1.2M = $0.6M

Now, divide the result by $1.2M:

$0.6M / $1.2M = 0.5

Multiply the result by 100 to get the percentage increase:

0.5 * 100 = 50%

Therefore, the percentage increase in revenue according to the report is 50%.


## Streaming with tool calls

When you set `stream=True`, Kluster.ai pushes delta chunks to your client as soon as they’re ready. That means you can render tokens to the user in real time and watch the model decide mid-sentence to invoke a tool. 

In the helper above we listen to the stream, print regular text as it arrives, and intercept any `tool_calls` deltas: the moment we see `"function": {"name": "calculator" …}`, we log “Calling tool: calculator” and keep appending argument fragments until the model finishes the call. Only after the stream closes do we execute the tool and send the result back for a final completion. Streaming makes interactions feel instant, lets you show spinners or live-update UIs, and still preserves the deterministic tool-calling workflow you’ve seen in the earlier examples.


In [18]:
def stream_with_tools(prompt, tools, model):
    messages = [
        {"role": "user", "content": prompt}
    ]
    
    stream = client.chat.completions.create(
        model=model,
        messages=messages,
        tools=tools,
        tool_choice="auto",
        stream=True
    )
    
    # Track current tool call and accumulate arguments
    current_tool_calls = {}
    
    # In a notebook, we'd display this differently than in a script
    for chunk in stream:
        if not chunk.choices or len(chunk.choices) == 0:
            continue
            
        delta = chunk.choices[0].delta
        
        # Handle regular content
        if hasattr(delta, 'content') and delta.content:
            print(delta.content, end="")
            
        # Handle tool calls
        elif hasattr(delta, 'tool_calls') and delta.tool_calls:
            for tool_call in delta.tool_calls:
                # Skip if no function data
                if not tool_call.function:
                    continue
                    
                # Get or create entry for this tool call
                tool_id = tool_call.id
                if tool_id not in current_tool_calls:
                    current_tool_calls[tool_id] = {
                        "name": "",
                        "arguments": ""
                    }
                
                # Update tool name if present
                if hasattr(tool_call.function, 'name') and tool_call.function.name:
                    if not current_tool_calls[tool_id]["name"]:
                        print(f"\nCalling tool: {tool_call.function.name}")
                    current_tool_calls[tool_id]["name"] = tool_call.function.name
                
                # Accumulate arguments if present
                if hasattr(tool_call.function, 'arguments') and tool_call.function.arguments:
                    current_tool_calls[tool_id]["arguments"] += tool_call.function.arguments
    
    # Print the final, complete arguments for each tool call
    for tool_id, tool_data in current_tool_calls.items():
        if tool_data["arguments"]:
            print(f"Arguments: {tool_data['arguments']}")
    
    print("\n")

# Test streaming with a simple query
stream_with_tools("Calculate: 17 * 43 + 125", calculator_tools, model)


Calling tool: calculator
Arguments: {"expression": "17 * 43 + 125"}




## Conclusion

You’ve now seen kluster.ai’s tool-calling API end-to-end: from authentication all the way to streaming, multi-tool orchestration. This notebook covered:

1. Basic setup and authentication
2. Single tool calling (calculator)
3. Web search tool usage
4. Multiple tool combinations
5. Real-world document analysis use case
6. Streaming tool calls

You can extend this pattern to use other tools by defining their schemas and implementing the corresponding execution functions. Kluster.ai's OpenAI-compatible API makes it straightforward to integrate with existing codebases.

For production use, remember to:
- Store API keys securely
- Implement proper error handling
- Use more sophisticated tool execution methods
- Consider rate limits and costs