# LLM Tool Calling

Example of LLM using tooling to infer parameters to call function.

Please reference this [DeepLearning.AI](https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/sgmbf/tool-calling) course for more details.  

In [1]:
from dotenv import load_dotenv
load_dotenv()
import nest_asyncio
nest_asyncio.apply()
import llama_index.core
llama_index.core.__version__

'0.12.36'

The Llamaindex `FunctionTool` module wraps the Python function below so the can be called by the LLM. The type annotations and doc strings are view import and they will be used in prompt for the LLM tooling. 

In [2]:
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def mystery(x: int, y: int) -> int: 
    """Mystery function that operates on top of two numbers."""
    return (x + y) * (x + y)


add_tool = FunctionTool.from_defaults(fn=add)
mystery_tool = FunctionTool.from_defaults(fn=mystery)

Below, the LLM decides which tool to call and synthesizes a call to that tool using the query below. Like a LlamaIndex router, it picks the tool. But also decides what parameters to pass the tool.

In [5]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="o4-mini")
response = llm.predict_and_call(
    [add_tool, mystery_tool], 
    "Tell me the output of the mystery function on 2 and 9", 
    verbose=True
)
print(str(response))

=== Calling Function ===
Calling function: mystery with args: {"x": 2, "y": 9}
=== Function Output ===
121
121


The following introduces Agentic meta data tags that are used to return a more precise result.

The code below assumes the PDFs referenced below have been stored in the `data` directory which was done in the `router_example.ipynb` router example.

In [7]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

# load documents
documents = SimpleDirectoryReader(input_files=["./data/file0.pdf"]).load_data()

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

Lets look at a node with the meta listed at the top.

In [8]:
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: file0.pdf
file_path: data\file0.pdf
file_type: application/pdf
file_size: 328049
creation_date: 2025-05-19
last_modified_date: 2025-05-19

arXiv:2505.10543v1  [cs.AI]  15 May 2025
Towards a Deeper Understanding of Reasoning
Capabilities in Large Language Models
Annie Wong ,*, Thomas Bäck, Aske Plaat, Niki van Stein and Anna V . Kononova
Leiden Institute of Advanced Computer Science
Abstract. While large language models demonstrate impressive
performance on static benchmarks, the true potential of large lan-
guage models as self-learning and reasoning agents in dynamic envi-
ronments remains unclear. This study systematically evaluates the ef-
ficacy of self-reflection, heuristic mutation, and planning as prompt-
ing techniques to test the adaptive capabilities of agents. We con-
duct experiments with various open-source language models in dy-
namic environments and find that larger models generally outperform
smaller ones, but that strategic prompting can close

A vector index store and a RAG pipeline are created.

In [9]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes)
query_engine = vector_index.as_query_engine(similarity_top_k=2)

Below is an example of using the meta data filters when make requests to the query engine.

In [11]:
import textwrap 
from llama_index.core.vector_stores import MetadataFilters

query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "7"}
        ]
    )
)

response = query_engine.query(
    "What is the conclusion of the paper?", 
)

wrapped_text = textwrap.fill(str(response), width=150, replace_whitespace=False)
print(wrapped_text)

The conclusion of the paper highlights the potential and limitations of advanced prompting strategies, such as self-reflection, heuristic mutation,
and planning. It emphasizes that excessive reasoning can hinder the performance of smaller models on simple tasks by causing distractions and leading
to overthinking, which can result in the model overlooking simpler and more effective solutions.


Lets check the nodes used as content for the LLM to synthesize the response.

In [12]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '7', 'file_name': 'file0.pdf', 'file_path': 'data\\file0.pdf', 'file_type': 'application/pdf', 'file_size': 328049, 'creation_date': '2025-05-19', 'last_modified_date': '2025-05-19'}
{'page_label': '7', 'file_name': 'file0.pdf', 'file_path': 'data\\file0.pdf', 'file_type': 'application/pdf', 'file_size': 328049, 'creation_date': '2025-05-19', 'last_modified_date': '2025-05-19'}


Below is an example of tooling the filter above for generic use.

In [15]:
from typing import List
from llama_index.core.vector_stores import FilterCondition


def vector_query(
    query: str, 
    page_numbers: List[str]
) -> str:
    """Perform a vector search over an index.
    
    query (str): the string query to be embedded.
    page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
        over all pages. Otherwise, filter by the set of specified pages.
    
    """

    metadata_dicts = [
        {"key": "page_label", "value": p} for p in page_numbers
    ]
    
    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts,
            condition=FilterCondition.OR
        )
    )
    response = query_engine.query(query)
    return response
    

vector_query_tool = FunctionTool.from_defaults(
    name="vector_tool",
    fn=vector_query
)

In [20]:
llm = OpenAI(model="o4-mini", temperature=0)
response = llm.predict_and_call(
    [vector_query_tool], 
    "Provide a summary of pages 1 and 2.", 
    verbose=False
)
wrapped_text = textwrap.fill(str(response), width=150, replace_whitespace=False)
print(wrapped_text)

The study on pages 1 and 2 focuses on evaluating the capabilities of large language models (LLMs) in dynamic environments through various prompting
strategies. It compares different open-source LLMs on decision-making tasks using SMART PLAY as a benchmark. The study finds that carefully designed
prompts can help smaller models match or exceed the performance of larger models. Advanced reasoning techniques can improve performance but may also
introduce instability. Transforming sparse rewards into dense, task-aligned rewards enhances learning effectiveness. The study also notes limitations
in self-learning and emergent reasoning in tasks requiring planning and coordination.


Lets confirm only pages 1 and 2 were used as content for the synthesis.

In [21]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'file0.pdf', 'file_path': 'data\\file0.pdf', 'file_type': 'application/pdf', 'file_size': 328049, 'creation_date': '2025-05-19', 'last_modified_date': '2025-05-19'}
{'page_label': '1', 'file_name': 'file0.pdf', 'file_path': 'data\\file0.pdf', 'file_type': 'application/pdf', 'file_size': 328049, 'creation_date': '2025-05-19', 'last_modified_date': '2025-05-19'}
