# ToolRAG Agent with LangChain, Granite and watsonx.ai

ToolRAG is a method that helps AI systems become more powerful and useful by combining two key abilities:

- Retrieval-Augmented Generation [RAG](https://research.ibm.com/blog/retrieval-augmented-generation-RAG): This means the AI can look up relevant information from a database or document store before answering.
- Tool use: The AI can choose and use tools (like calculators, search engines, APIs, or code execution) to solve problems.

This approach is part of a larger series on [Agent Architectures](https://github.com/ibm-granite-community/granite-agent-cookbook/blob/main/building_agents.md), which explores how to take agents from prototype to production. ToolRAG is an architecture designed for agents that need to use a large set of tools.

Instead of just guessing or generating text, the AI can look things up and take action. ToolRAG is an architecture designed for agents that need to use a large set of tools. It pre-filters the available tools from the LLM's main reasoning by first using a vector database to semantically retrieve the most relevant tools for a query. This prevents overloading the LLM's context window with hundreds of tool definitions, significantly improving efficiency and performance for tool-using agents.

This notebook shows how to build a Retrieval-Augmented Generation (RAG) agent with a large, semantically-searchable toolset, powered by LangChain’s agent framework, IBM’s Granite LLM, and watsonx embeddings. You’ll see setup of the credentials, tool semantic indexing, and agent orchestration for robust research and engineering workflows.

## Prerequisites
- Python 3.10+ environment (e.g., Jupyter, Colab, or watsonx.ai)..
- IBM watsonx.ai credentials (API key, project ID) for Granite model access.

Let's build a scalable ToolRAG Agent!

# Steps

## Step 1. Set up your environment

While you can choose from several tools, this recipe is best suited for a Jupyter Notebook. Jupyter Notebooks are widely used within data science to combine code with various data sources such as text, images and data visualizations. 

You can run this notebook in [Colab](https://colab.research.google.com/), or download it to your system and [run the notebook locally](https://github.com/ibm-granite-community/granite-kitchen/blob/main/recipes/Getting_Started_with_Jupyter_Locally/Getting_Started_with_Jupyter_Locally.md). 

To avoid Python package dependency conflicts, we recommend setting up a [virtual environment](https://docs.python.org/3/library/venv.html).

Note, this notebook is compatible with Python 3.12 and well as Python 3.11, the default in Colab at the time of publishing this recipe. To check your python version, you can run the `!python --version` command in a code cell.


## Step 2. Set up a watsonx.ai instance

See [Getting Started with IBM watsonx](https://github.com/ibm-granite-community/granite-kitchen/blob/main/recipes/Getting_Started/Getting_Started_with_WatsonX.ipynb) for information on getting ready to use watsonx.ai. 

You will need three credentials from the watsonx.ai set up to add to your environment: `WATSONX_URL`, `WATSONX_APIKEY`, and `WATSONX_PROJECT_ID`.

## Step 3. Install relevant libraries and set up credentials and the Granite model

We'll need a few libraries for this recipe. We will be using LangGraph and LangChain libraries to use Granite on watsonx.ai.

In [None]:
# Install core libraries
%pip install -qU langchain langchain-ibm langgraph toolregistry

# Install RAG components (Vector Store and utilities)
%pip install -q chromadb langchain-chroma

# Install IBM specific utility for easy credentials load
%pip install -q ibm-watsonx-ai "git+https://github.com/ibm-granite-community/utils.git"

## Step 4: Authentication and model initialization 

The next step involves initialization of the watsonx LLM (used for the agent's reasoning) and watsonx Embeddings (for Tool-RAG semantic search).

**Note:** Ensure your environment variables (`WATSONX_URL`, `WATSONX_APIKEY`, `WATSONX_PROJECT_ID`) are set before running this cell.

In [None]:
import os
import uuid
from getpass import getpass
from typing import List, Dict, Any
from langchain_ibm import WatsonxLLM, WatsonxEmbeddings
from langchain_core.tools import tool
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.messages import HumanMessage
from langchain_core.documents import Document
from langchain_core.tools import BaseTool

from langchain_chroma import Chroma
from toolregistry import ToolRegistry

from ibm_granite_community.notebook_utils import get_env_var
from langchain_core.utils.utils import convert_to_secret_str
from langchain.chat_models import init_chat_model
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames

# --- Configuration ---
model = "ibm/granite-3-3-8b-instruct"

llm_params = {
    "temperature": 0,
    "max_completion_tokens": 200,
    "repetition_penalty": 1.05,
}

# --- 1. LLM Initialization (Agent's Brain) ---
llm = init_chat_model(
    model=model,
    model_provider="ibm",
    url=convert_to_secret_str(get_env_var("WATSONX_URL")),
    apikey=convert_to_secret_str(get_env_var("WATSONX_APIKEY")),
    project_id=get_env_var("WATSONX_PROJECT_ID"),
    params=llm_params,
)
print(f"LLM initialized: {model}")


# --- 2. Embeddings Initialization (Tool-RAG Indexer) ---
watsonx_embedding = WatsonxEmbeddings(
    model_id="ibm/granite-embedding-107m-multilingual",
    url=get_env_var("WATSONX_URL"),
    apikey=get_env_var("WATSONX_APIKEY"),
    project_id=get_env_var("WATSONX_PROJECT_ID"),
    params={
        EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3
    }
)
print("Embeddings initialized: ibm/granite-embedding-107m-multilingual")
print("Setup Complete.")

We choose `ibm/granite-embedding-107m-multilingual` because it is an IBM Granite family model, ensuring seamless integration with other watsonx components.

Its multilingual capability is a benefit for real-world scenarios, and the '107m' size provides a good balance between performance and embedding quality for semantic retrieval of tool descriptions. We use ChromaDB as the underlying vector store for tool metadata.

## Step 5: Define small tools and data structure

The ToolRAG concept are specifically designed to address the scalability problem that arises when an agent has hundreds or thousands of tools. In classic "tool-calling" architectures, the definitions of all tools must be inserted directly into the LLM's prompt, which quickly hits the context window limit and degrades performance.

For this Proof of Concept (PoC), we use a small set of 5 tools for the following reasons:
- Isolation of ToolRAG mechanics: By keeping the total tool count small, we can easily verify that the core ToolRAG retrieval mechanism is working correctly. When a query is given, we can visually confirm that the vector store is correctly selecting the top K=3 most relevant tool definitions (e.g., finance tools for a financial query), even though the LLM could technically handle all 5 tools without RAG.
- Focus on selection logic, and avoid context overflow: This setup allows us to focus on the agent's ability to select a filtered subset of tools (`get_weather_forecast`, `get_stock_price`, `convert_currency`) from the small indexed pool, and then correctly chain their execution, without introducing the complexity of context window management or large-scale tool embedding.
- Simulating a scalable environment: In a production setting, this small set of 5 tools would ideally be hundreds of other tools (e.g., 50 financial, 50 IT, 50 HR tools). The retrieval mechanism shown here is what scales to those environments.

In [None]:
@tool
def calculate_future_value(principal: float, rate: float, years: int) -> str:
    """Calculates the future value of an investment using compound interest."""
    future_value = principal * ((1 + rate) ** years)
    return f"The future value is ${future_value:.2f}."

@tool
def get_stock_price(ticker: str) -> str:
    """Fetches the current or historical stock price for a given ticker symbol (e.g., IBM)."""
    # Placeholder for a real API call
    if ticker == "IBM":
        return "The current price for IBM is $185.50."
    return f"Stock price for {ticker} not found in this mock-up."

@tool
def check_developer_skill(skill: str) -> str:
    """Checks the availability of an AI developer with a specific programming or ML skill."""
    if 'python' in skill.lower() or 'langchain' in skill.lower():
        return "Yes, we have multiple experienced developers with that skill set."
    return "Developer with that specific skill is currently limited."

@tool
def convert_currency(amount: float, from_currency: str, to_currency: str) -> str:
    """Converts a monetary amount between two specified currencies (e.g., USD to EUR)."""
    # Placeholder for a real currency conversion
    if from_currency == "USD" and to_currency == "EUR":
        converted = amount * 0.92
        return f"{amount} USD is approximately {converted:.2f} EUR."
    return f"Conversion from {from_currency} to {to_currency} is not supported."

@tool
def get_weather_forecast(city: str) -> str:
    """Provides the current weather forecast for a specified city."""
    if 'boston' in city.lower():
        return "Boston's current weather is partly cloudy with a temperature of 15°C."
    return f"Weather data for {city} is not available in this mock-up."

tools = [
    calculate_future_value, 
    get_stock_price, 
    check_developer_skill, 
    convert_currency, 
    get_weather_forecast
]

## Step 6: Create tool registry and index tools

We perform the following steps here:

- Initialize tool registry. This is the registry that holds the tools, but is now decoupled from the storage.
- Register/index the tools into the Chroma vector store.
- Get the documents/metadata from the tools
- Add the tool documents to the vector store
- Create the retriever for the BigTool agent

In [None]:
# Get the documents/metadata from the tools
tool_docs = []
for tool_obj in tools:
    if isinstance(tool_obj, BaseTool):
        tool_docs.append(
            Document(
                page_content=tool_obj.description,
                metadata={"tool_name": tool_obj.name}
            )
        )

# Add the tool documents to the vector store
vectorstore = Chroma.from_documents(
    documents=tool_docs,
    embedding=watsonx_embedding
)

# Create the retriever for Tool-RAG (used in graph node)
tool_retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 3}  # Retrieve top 3 relevant tools
)


# ------------------------------------------

print(f"Total tools available: {len(tools)}")
print(f"Tools indexed in Chroma vectorstore: {len(tool_docs)}")


## Step 7: Agent capabilities: tool selection and execution

In [None]:
def custom_retrieve_tools(query: str, limit: int = 5) -> list[BaseTool]:
    """Retrieves the top N tools most semantically similar to the query.
    
    Returns actual tool objects (not just names) for dynamic binding in llm_node.
    """
    # Use standalone vectorstore (no registry dependency)
    results = vectorstore.similarity_search_with_score(query, k=limit)
    
    # Get tool objects by name (from global 'tools' dict for lookup)
    tool_map = {tool.name: tool for tool in tools}  # Simple lookup dict
    retrieved_tools = [tool_map[result[0].metadata['tool_name']] for result in results if result[0].metadata['tool_name'] in tool_map]
    
    print(f"\n[ToolRAG Retrieval]: Found {len(retrieved_tools)} relevant tools based on query: {[t.name for t in retrieved_tools]}")
    return retrieved_tools  # Return tools, not just names—for bind_tools()


### Create the ToolRAG Agent

The `create_agent` function handles the LangGraph state machine and RAG-tool fusion.

In [None]:
from langgraph_bigtool import create_agent
from typing import Mapping, Callable, Any
from langgraph.checkpoint.memory import InMemorySaver

tool_map = { 
    name: tool_registry.get_callable(name)
    for name in tool_registry._tools.keys() 
}

store = InMemorySaver()

agent_builder = create_agent(
    llm=llm,
    tool_registry=tool_map,
    retrieve_tools_function=custom_retrieve_tools
)

agent = agent_builder.compile(checkpointer=store) 
print("ToolRAG Agent compiled successfully.")


### Testing the agent with a query requiring 3 tools or a subset of tools

In [None]:
import uuid
from langchain_core.messages import HumanMessage

thread_id = str(uuid.uuid4())
config = {
    "configurable": {
        "thread_id": thread_id
    }
}

user_query = "What's the weather like in Boston, and can you check the stock price for IBM, then convert 100 USD to EUR?"

print(f"--- Running Agent with Query: {user_query} ---")

# Run the agent, passing the new config
final_result = agent.invoke(
    {"messages": [HumanMessage(content=user_query)]},
    config=config
)

# Extract and print the final response
final_message = final_result["messages"][-1]

print("\n--- Final Agent Response ---")
print(final_message.content)