# Build an Agent with Tool-Calling Superpowers using Smolagents

In this example, we will use [`smolagents`](https://huggingface.co/docs/smolagents/index) to build AI agents, which are systems that are powered by an LLM and enable the LLM (with careful prompting and output parsing) to use specific *tools* to solve problems.

These *tools* are basically functions that the LLM could not perform well by itself, for example, a calculator, a web search tool, etc...

## Setups

In [None]:
!pip install -qU smolagents datasets langchain sentence-transformers faiss-cpu duckduckgo-search openai langchain-community

## Multimodal + Web-browsing assistant

For this use case, we will build an agent that browses the web and is able to generate images, so we need two tools:
- image generation: we will load a tool from the Hub that uses the HF Inference API (serverless) to generate images using Stable Diffusion.
- web search: we will use a built-in tool.

In [1]:
from smolagents import load_tool, CodeAgent, HfApiModel, DuckDuckGoSearchTool

# Import tool from Hub
image_generation_tool = load_tool(
    'm-ric/text-to-image',
    trust_remote_code=True
)
search_tool = DuckDuckGoSearchTool()

tool.py:   0%|          | 0.00/635 [00:00<?, ?B/s]

In [2]:
# Initialize a model
model = HfApiModel('Qwen/Qwen2.5-72B-Instruct')
# Initialize the agent with both tools
agent = CodeAgent(
    model=model,
    tools=[image_generation_tool, search_tool]
)

In [None]:
# test it
result = agent.run(
    'Generate me a photo of the car that James Bond drove in the latest movie.'
)
result

## RAG with iterative query refinement & source selection

RAG has many advantages over using a vanilla or finetuned LLM. It allows to ground the answer on true facts and reduce hallucinations, it allows to provide the LLM with domain-specific knowledge, and it allows fine-grained control of access to information from the knowledge base.

Depending on the user query, we would like to restrict the search to specific subsets of the knowledge base, or we want to adjust the number of documents retrieved. **How can we control and adjust the scope of this behavior?**

A frequent failure case of RAG is when the retrieval based on the user query does not return any relevant supporting documents. **How can we create a method to iterate by re-calling the retriever with a modified query in case the previous results were not relevant?**

Thoses are the cases where we need an agent to control over the retriever's parameters.

To do this, we first need to load a knowledge base on which we want to perform RAG:

In [None]:
import datasets

knowledge_base = datasets.load_dataset('m-ric/huggingface_doc', split='train')

In [5]:
knowledge_base

Dataset({
    features: ['text', 'source'],
    num_rows: 2647
})

In [6]:
knowledge_base[0]

{'text': ' Create an Endpoint\n\nAfter your first login, you will be directed to the [Endpoint creation page](https://ui.endpoints.huggingface.co/new). As an example, this guide will go through the steps to deploy [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) for text classification. \n\n## 1. Enter the Hugging Face Repository ID and your desired endpoint name:\n\n<img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/1_repository.png" alt="select repository" />\n\n## 2. Select your Cloud Provider and region. Initially, only AWS will be available as a Cloud Provider with the `us-east-1` and `eu-west-1` regions. We will add Azure soon, and if you need to test Endpoints with other Cloud Providers or regions, please let us know.\n\n<img src="https://raw.githubusercontent.com/huggingface/hf-endpoints-documentation/main/assets/1_region.png" alt="select region" />\n\n## 3. Defi

Now we prepare the knowledge base by processing the dataset and storing it into a vector database to be used by the retriever.

In [8]:
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

source_docs = [
    Document(page_content=doc['text'], metadata={'source': doc['source'].split('/')})
    for doc in knowledge_base
]

docs_processed = RecursiveCharacterTextSplitter(chunk_size=500).split_documents(source_docs)[:1000]
docs_processed[0]

Document(metadata={'source': ['huggingface', 'hf-endpoints-documentation', 'blob', 'main', 'docs', 'source', 'guides', 'create_endpoint.mdx']}, page_content='Create an Endpoint\n\nAfter your first login, you will be directed to the [Endpoint creation page](https://ui.endpoints.huggingface.co/new). As an example, this guide will go through the steps to deploy [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) for text classification. \n\n## 1. Enter the Hugging Face Repository ID and your desired endpoint name:')

In [10]:
# Initialize an embedding model
embeddings = HuggingFaceEmbeddings(model_name='thenlper/gte-small')
# Intiialize a vector database
vectordb = FAISS.from_documents(
    documents=docs_processed,
    embedding=embeddings
)

Now we can build a RAG system that answers user queries base on it. We want our system to select only from the most relevant sources of information, depending on the query.

In [23]:
all_sources = []
for doc in docs_processed:
    all_sources.extend(doc.metadata["source"])
all_sources = list(set(all_sources))
print(all_sources)

['q-learning-recap.mdx', 'se-resnet.mdx', 'vision-text-dual-encoder.md', 'huggingface', 'units', 'source', 'run.ipynb', 'raw', 'ip_adapter.md', 'research_projects', 'metrics', 'peft', 'api', 'flair.md', 'examples', 'hf-endpoints-documentation', 'diffusers', 'evaluate', 'create_endpoint.mdx', 'unit6', 'gradio-app', 'quantization.mdx', 'tasks', 'training', 'examples_component', 'transformers', 'onnxruntime', 'hub', 'pytorch-image-models', 'chapter2', 'zero_shot_object_detection.md', 'datasets', 'mape', 'about_mapstyle_vs_iterable.mdx', 'worker', 'perf_train_tpu_tf.md', 'security-git-ssh.md', '01_getting-started', 'model_doc', 'audioldm.md', 'accordion', 'gradio', 'loaders', 'chapter5', 'config.md', 'additional-readings.mdx', 'poseval', 'map_airbnb', 'optimum', 'res2net.mdx', 'kandinsky3.md', 'layoutlmv3', '03_building-with-blocks', 'guides', 'unit4', 'main', '04c_character-based-tokenizers.md', 'README.md', 'pipelines', 'deep-rl-class', 'js', 'quiz.mdx', 'demo', 'cn', 'blog', 'README_ru.

Now we will build a `RetrieverTool` that our agent can leverage to retrieve information from the knowledge base. Since we need to add a `vectordb` as an attribute of the tool, we need to define it as a class.

In [38]:
import json
from smolagents import Tool
from langchain_core.vectorstores import VectorStore


class RetrieverTool(Tool):
    name = 'retriever'
    description = "Retrieves some documents from the knowledge base that have the closest embeddings to the input query."
    inputs = {
        'query': {
            'type': 'string',
            'description': "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
        },
        'source': {
            'type': 'string',
            'description': ""
        },
        'number_of_documents': {
            'type': 'string',
            'description': "The number of documents to retrieve. Stay under 10 to avoid drowning in docs",
        }
    }
    output_type = 'string'

    def __init__(self, vectordb: VectorStore, all_sources: str, **kwargs):
        super().__init__(**kwargs)
        self.vectordb = vectordb
        self.inputs['source']['description'] = (
            f"The source of the documents to search, as a str representation of a list. Possible values in the list are: {all_sources}. If this argument is not provided, all sources will be searched.".replace(
                "'", "`"
            )
        )

    def forward(self, query: str, source: str, number_of_documents: str) -> str:
        assert isinstance(query, str), "Your search query must be a string"
        number_of_documents = int(number_of_documents)

        if source:
            if isinstance(source, str) and '[' not in str(source):
                # if the source is not representing a list
                source = [source]
            source = json.loads(str(source).replace("'", '"'))

        docs = self.vectordb.similarity_search(
            query,
            filter=({'source': source} if source else None),
            k=number_of_documents
        )

        if len(docs) == 0:
            return "No documents found with this filtering. Try removing the source filter."

        return "Retrieved documents:\n\n" + "\n===Document===\n".join([doc.page_content for doc in docs])

Optionally, we can share our retriever tool to the Hub by saving as a standlone `retriever.py`

In [None]:
share_to_hub = True
if share_to_hub:
    from huggingface_hub import login
    from retriever import RetrieverTool

    login()

    tool = RetrieverTool(vectordb, all_sources)
    tool.push_to_hub(repo_id='m-ric/retriever-tool')

    # load the tool
    from smolagents import load_tool
    retriever_tool = load_tool('m-ric/retriever-tool', vectordb=vectordb, all_sources=all_sources)

### Run the agent

In [40]:
from smolagents import HfApiModel, ToolCallingAgent

model = HfApiModel('Qwen/Qwen2.5-72B-Instruct')

retriever_tool = RetrieverTool(vectordb=vectordb, all_sources=all_sources)

agent = ToolCallingAgent(
    model=model,
    tools=[retriever_tool],
)

In [None]:
agent_output = agent.run("Please show me a LORA finetuning script")

print("Final output:")
print(agent_output)

## Debug Python code

Since the `CodeAgent` has a built-in Python code interpreter, we can use it to debug faulty Python script.

In [None]:
from smolagents import CodeAgent

agent = CodeAgent(
    model=model,
    tools=[]
)

In [None]:
code = """
numbers=[0, 1, 2]

for i in range(4):
    print(numbers(i))
"""

final_answer = agent.run(
    "I have some code that creates a bug: please debug it, then run it to make sure it works and return the final code",
    additional_args=dict(code=code),
)
print(final_answer)