# Integrating Tools with the kluster.ai API

## Introduction

A **tool** is a server-side capability you describe to the language-model (LLM) with a JSON Schema. During a chat completion the model can emit a `tool_calls` block by naming the function and filling in schema-valid arguments instead of plain-text prose. Your application (or one of ’s built-ins) runs that function, returns the result in a follow-up message, and the model weaves the data into its final answer.

This pattern lets natural-language prompts trigger real-world effects such as querying a database, fetching the BTC/USD price, crunching numbers, scraping docs, or posting calendar events, without exposing credentials or business logic to the model. In short, tools turn a smart chat agent into a full-stack teammate: the LLM handles intent and dialogue, while your code performs deterministic, auditable side effects.

This notebook shows how to use the kluster Tools endpoint with Python. We’ll cover:

1. Setting up the environment  
2. Calling a single tool  
3. Trying multiple tools (calculator, web search, etc.)  
4. Handling tool outputs and streaming responses

## Setup

First, let's install the required packages:

In [1]:
%pip install openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


Now, let's set up our client:

In [2]:
import os
from openai import OpenAI
import json
from IPython.display import display, Markdown, HTML

You'll need to set your kluster API key. For security, use environment variables or a secrets manager in a production environment.

In [3]:
# Set your API key
# For demo purposes, we'll ask for it here - in production, use environment variables
from getpass import getpass

kluster_api_key = getpass("Enter your kluster API key: ")

# Initialize the client
client = OpenAI(
    base_url="https://api.kluster.ai/v1",  # kluster API endpoint
    api_key=kluster_api_key
)

Enter your kluster API key:  ········


## Basic Tool Calling

Kluster supports tool calling similar to OpenAI's function calling. Let's start with a simple example using a calculator tool. 

Kluster treats the calculator as a first‐class capability you expose to the model: by including its JSON-Schema in the tools array, you tell the LLM, “if the user asks for arithmetic, call this function instead of guessing the answer.” When we send the prompt “What is 1337 × 42?” with `tool_choice="auto"`, the model recognises that the calculator is the best way to satisfy the request and answers not with prose but with a `tool_calls` block that contains the function name and a properly-formatted argument string ("1337 * 42").

In [4]:
def run_with_tools(prompt, tools, model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo"):
    messages = [
        {"role": "user", "content": prompt}
    ]
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    
    return response

# Define a calculator tool
calculator_tools = [
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Perform arithmetic calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

# Test with a math problem
calculator_response = run_with_tools(
    "What is 1337 multiplied by 42?", 
    calculator_tools
)

print(json.dumps(calculator_response.model_dump(), indent=2))

{
  "id": "chatcmpl-8dd4c1cc-4ecc-4c79-aed6-d408bbad5081",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": null,
        "refusal": null,
        "role": "assistant",
        "audio": null,
        "function_call": null,
        "tool_calls": [
          {
            "id": "chatcmpl-tool-fa51f55519ff412bb1798262c3a5a797",
            "function": {
              "arguments": "{\"expression\": \"1337 * 42\"}",
              "name": "calculator"
            },
            "type": "function"
          }
        ]
      },
      "stop_reason": 128008
    }
  ],
  "created": 1746741970,
  "model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 20,
    "prompt_tokens": 252,
    "total_tokens": 272,
    "completion_tokens_details": null,
    "prompt_tokens_details": null
  },
  "promp

## Tool Response Processing

Once the model pauses with its `tool_calls` block we, not the LLM, take the wheel.
The helper below does four things in one sweep:

1. **Parse the call**: It inspects `response.choices[0].message.tool_calls`, pulls out the function name, and JSON-decodes the arguments

2. **Run the side-effect safely**: Here we hand the expression to `execute_calculator()`, which first whitelists characters and then evaluates it. This is a placeholder for demonstration purposes only - use a proper math parser in production

3. **Hand the result back to the model**: We craft a new chat turn with `role:"tool"`, include the original `tool_call_id`, and embed the JSON `{ "result": 56154 }`

4. **Let the model finish the thought**: A second `chat.completions.create()` lets the LLM convert that raw number into friendly prose, e.g. “The result of multiplying 1337 by 42 is 56,154”

This two-step dance—model → tool → model keeps business logic in your codebase while giving users a seamless conversational experience.


In [5]:
import math
import re

def execute_calculator(expression):
    # Simple calculator using eval() (note: never use this in production without proper validation)
    # In production, use a safer method for evaluation
    try:
        # Basic sanitization
        if not re.match(r'^[0-9+\-*/().%\s]+$', expression):
            return {"error": "Invalid expression. Only basic arithmetic operations are allowed."}
        
        result = eval(expression)
        return {"result": result}
    except Exception as e:
        return {"error": str(e)}

def process_tool_calls(response):
    message = response.choices[0].message
    
    # If there are no tool calls, return the message content
    if not message.tool_calls:
        return message.content
    
    # Process each tool call
    tool_results = []
    for tool_call in message.tool_calls:
        function_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)
        
        # Execute the appropriate function based on the tool call
        if function_name == "calculator":
            result = execute_calculator(arguments["expression"])
            tool_results.append({
                "tool_call_id": tool_call.id,
                "function_name": function_name,
                "result": result
            })
    
    # Create a new message with the tool results
    messages = [
        {"role": "user", "content": "What is 1337 multiplied by 42?"},
        message.model_dump(),
    ]
    
    # Add the tool results
    for result in tool_results:
        messages.append({
            "role": "tool",
            "tool_call_id": result["tool_call_id"],
            "content": json.dumps(result["result"])
        })
    
    # Get the final response
    final_response = client.chat.completions.create(
        model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
        messages=messages
    )
    
    return final_response.choices[0].message.content

# Process the calculator response
final_answer = process_tool_calls(calculator_response)
print(final_answer)

The result of 1337 multiplied by 42 is 56,154.


## Advanced Tool Calling Example - Web Search

The calculator example kept all the logic on our machine, but many real-world tasks need fresh external data. Here we register a `web_search(query: str)` tool that the model can invoke whenever the user’s request can’t be answered from its training set alone. When we ask “What are the latest findings on climate change?” the LLM recognises it needs up-to-date information, pauses with a `tool_calls` block that contains the search term, and lets our app take over.

`Execute_web_search()` (stubbed here with mock results) returns a list of title/snippet/URL triples. We wrap that payload in a `{role:"tool"}` message, preserving the original `tool_call_id`, and hand it back to the model. The LLM then synthesises a readable summary including bullet points of ecosystem impacts, renewable energy advances, tipping point research, and so on.

This pattern of LLM for intent, tool for retrieval, LLM for synthesis is the backbone of production grade assistants that need live data such as news digests, domain-specific knowledge chatbots, or anything where “I don’t know” isn’t acceptable. Although the results here are fabricated for demo purposes, the flow shows exactly how you’d slot in a real search API or internal knowledge service in production.

In [6]:
# Define the web search tool
web_search_tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

# Sample function to simulate web search results
def execute_web_search(query):
    # In a real application, this would call an actual search API
    # For this demo, we'll return mock results
    if "climate" in query.lower():
        return {
            "results": [
                {
                    "title": "Climate Change Effects - Latest Research",
                    "snippet": "New studies show increasing impacts of climate change on global ecosystems.",
                    "url": "https://example.com/climate-research"
                },
                {
                    "title": "Renewable Energy Solutions for Climate Change",
                    "snippet": "Advancements in renewable energy technologies show promise in addressing climate challenges.",
                    "url": "https://example.com/renewable-climate"
                }
            ]
        }
    else:
        return {
            "results": [
                {
                    "title": "Search results for: " + query,
                    "snippet": "Sample search result for demonstration purposes",
                    "url": "https://example.com/search"
                }
            ]
        }

# Function to process web search tool calls
def process_web_search(response, original_query):
    message = response.choices[0].message
    
    # If there are no tool calls, return the message content
    if not message.tool_calls:
        return message.content
    
    # Process each tool call
    tool_results = []
    for tool_call in message.tool_calls:
        function_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)
        
        # Execute the appropriate function based on the tool call
        if function_name == "web_search":
            result = execute_web_search(arguments["query"])
            tool_results.append({
                "tool_call_id": tool_call.id,
                "function_name": function_name,
                "result": result
            })
    
    # Create a new message with the tool results
    messages = [
        {"role": "user", "content": original_query},
        message.model_dump(),
    ]
    
    # Add the tool results
    for result in tool_results:
        messages.append({
            "role": "tool",
            "tool_call_id": result["tool_call_id"],
            "content": json.dumps(result["result"])
        })
    
    # Get the final response
    final_response = client.chat.completions.create(
        model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
        messages=messages
    )
    
    return final_response.choices[0].message.content

# Test with a query that would benefit from web search
search_query = "What are the latest findings on climate change?"
search_response = run_with_tools(search_query, web_search_tools)
final_search_result = process_web_search(search_response, search_query)

print(final_search_result)

The latest findings on climate change indicate that its impacts are becoming more pronounced and widespread. Some of the key developments include:

1. **Accelerating Sea-Level Rise**: New research suggests that sea levels are rising at an accelerating rate, with some studies predicting increases of up to 26 inches by 2050 and 82 inches by 2100.
2. **Large-Scale Ice Sheet Collapse**: Scientists have discovered that the West Antarctic Ice Sheet is undergoing a sudden and irreversible collapse, which could lead to a catastrophic sea-level rise of up to 10 feet.
3. **Rate of Global Warming**: The rate of global warming has been accelerating, with 2020 and 2021 ranking as the two hottest years on record globally.
4. **Impacts on Global Ecosystems**: Climate change is having devastating impacts on global ecosystems, including coral bleaching, species extinctions, and loss of biodiversity.
5. **Climate Change and Human Migration**: New studies suggest that climate change is driving human migr

## Multi-Tool Example

Real-world questions often need more than one capability. For example, a user might ask “look up Bitcoin’s market cap and convert it to euros.” By registering several tools in the same tools array we give the LLM a menu of options. In the prompt below we include both web_search and calculator, then ask:

“If Earth’s temperature rises by 2 °C, what percentage increase is that from the current average of 15 °C?”

The model inspects the schemas and decides this is purely arithmetic, so it emits a single `tool_calls` entry for the calculator. Our `process_multi_tool_calls()` helper loops over each requested tool, dispatches to the matching execution function, bundles every result into `{role:"tool"}` messages, and lets the model wrap things up in plain English.

Had the question also required fresh data, e.g. “…and cite a recent study on global warming”, the LLM could have issued two calls in one turn: first web_search, then calculator. That illustrates the power of multi-tool orchestration: the model can plan a mini-workflow, while your code executes each deterministic step and feeds the results back for seamless narration.

In [7]:
# Define multiple tools
multi_tools = [
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Perform arithmetic calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate"
                    }
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

# Enhanced function to process multi-tool calls
def process_multi_tool_calls(response, original_query):
    message = response.choices[0].message
    
    # If there are no tool calls, return the message content
    if not message.tool_calls:
        return message.content
    
    # Process each tool call
    tool_results = []
    for tool_call in message.tool_calls:
        function_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)
        
        # Execute the appropriate function based on the tool call
        if function_name == "calculator":
            result = execute_calculator(arguments["expression"])
        elif function_name == "web_search":
            result = execute_web_search(arguments["query"])
        else:
            result = {"error": f"Unknown tool: {function_name}"}
            
        tool_results.append({
            "tool_call_id": tool_call.id,
            "function_name": function_name,
            "result": result
        })
    
    # Create a new message with the tool results
    messages = [
        {"role": "user", "content": original_query},
        message.model_dump(),
    ]
    
    # Add the tool results
    for result in tool_results:
        messages.append({
            "role": "tool",
            "tool_call_id": result["tool_call_id"],
            "content": json.dumps(result["result"])
        })
    
    # Get the final response
    final_response = client.chat.completions.create(
        model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
        messages=messages
    )
    
    return final_response.choices[0].message.content

# Test with a complex query that might use multiple tools
multi_query = "If the Earth's temperature rises by 2 degrees, what percentage increase is that from the current average global temperature of 15 degrees Celsius?"
multi_response = run_with_tools(multi_query, multi_tools)
final_multi_result = process_multi_tool_calls(multi_response, multi_query)

print(final_multi_result)

If the Earth's temperature rises by 2 degrees at a current average global temperature of 15 degrees Celsius, the percentage increase would be approximately 13.33%.


## Real-world Use Case: Document Analysis with Tools

Here we treat an entire report as context and give the model two powers: read the text and, if it spots numbers that need crunching, invoke the calculator. The helper wraps the document and question into a single prompt, passes in our multi_tools array, and lets the model decide the workflow. In the revenue example the LLM extracts $1.2 M → $1.8 M, calls the calculator to compute the delta, then replies in prose: “The percentage increase in revenue was 50 %.” This pattern scales to meeting minutes, legal contracts, or log files- any place the model must combine natural language comprehension with deterministic math.

In [8]:
def document_analysis_with_tools(document, question):
    # Prepare the prompt
    prompt = f"""
Document: 
{document}

Question about the document: {question}

Please answer the question based on the document. If calculations are needed, use the calculator tool.
"""
    
    # Use the multi-tools from before
    response = run_with_tools(prompt, multi_tools)
    final_answer = process_multi_tool_calls(response, prompt)
    
    return final_answer

# Sample document and question
sample_document = """
Kluster.ai Performance Report 2024

In Q1 2024, our platform processed 2.5 million requests, a 25% increase from Q4 2023 (2 million requests). 
The average response time was reduced from 350ms to 280ms, representing a 20% improvement.
Our customer base grew from 500 to 800 companies, and revenue increased from $1.2M to $1.8M.
"""

sample_question = "What was the percentage increase in revenue according to the report?"

document_analysis_result = document_analysis_with_tools(sample_document, sample_question)
print(document_analysis_result)

According to the report, the percentage increase in revenue was 50%.


## Streaming with Tool Calls

When you set `stream=True`,  pushes delta chunks to your client as soon as they’re ready. That means you can render tokens to the user in real time and watch the model decide mid-sentence to invoke a tool. In the helper above we listen to the stream, print regular text as it arrives, and intercept any `tool_calls` deltas: the moment we see `"function": {"name": "calculator" …}`, we log “Calling tool: calculator” and keep appending argument fragments until the model finishes the call. Only after the stream closes do we execute the tool and send the result back for a final completion. Streaming makes interactions feel instant, lets you show spinners or live-update UIs, and still preserves the deterministic tool-calling workflow you’ve seen in the earlier examples.


In [9]:
def stream_with_tools(prompt, tools, model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo"):
    messages = [
        {"role": "user", "content": prompt}
    ]
    
    stream = client.chat.completions.create(
        model=model,
        messages=messages,
        tools=tools,
        tool_choice="auto",
        stream=True
    )
    
    # Track current tool call and accumulate arguments
    current_tool_calls = {}
    
    # In a notebook, we'd display this differently than in a script
    for chunk in stream:
        if not chunk.choices or len(chunk.choices) == 0:
            continue
            
        delta = chunk.choices[0].delta
        
        # Handle regular content
        if hasattr(delta, 'content') and delta.content:
            print(delta.content, end="")
            
        # Handle tool calls
        elif hasattr(delta, 'tool_calls') and delta.tool_calls:
            for tool_call in delta.tool_calls:
                # Skip if no function data
                if not tool_call.function:
                    continue
                    
                # Get or create entry for this tool call
                tool_id = tool_call.id
                if tool_id not in current_tool_calls:
                    current_tool_calls[tool_id] = {
                        "name": "",
                        "arguments": ""
                    }
                
                # Update tool name if present
                if hasattr(tool_call.function, 'name') and tool_call.function.name:
                    if not current_tool_calls[tool_id]["name"]:
                        print(f"\nCalling tool: {tool_call.function.name}")
                    current_tool_calls[tool_id]["name"] = tool_call.function.name
                
                # Accumulate arguments if present
                if hasattr(tool_call.function, 'arguments') and tool_call.function.arguments:
                    current_tool_calls[tool_id]["arguments"] += tool_call.function.arguments
    
    # Print the final, complete arguments for each tool call
    for tool_id, tool_data in current_tool_calls.items():
        if tool_data["arguments"]:
            print(f"Arguments: {tool_data['arguments']}")
    
    print("\n")

# Test streaming with a simple query
stream_with_tools("Calculate: 17 * 43 + 125", calculator_tools)


Calling tool: calculator
Arguments: {"expression": "17 * 43 + 125"}




## Conclusion

You’ve now seen ’s tool-calling API end-to-end: from authentication all the way to streaming, multi-tool orchestration. This notebook covered:

1. Basic setup and authentication
2. Single tool calling (calculator)
3. Web search tool usage
4. Multiple tool combinations
5. Real-world document analysis use case
6. Streaming tool calls

You can extend this pattern to use other tools by defining their schemas and implementing the corresponding execution functions. Kluster's OpenAI-compatible API makes it straightforward to integrate with existing codebases.

For production use, remember to:
- Store API keys securely
- Implement proper error handling
- Use more sophisticated tool execution methods
- Consider rate limits and costs