# Multi-Agent RAG System

In this example, we will create a **multi-agent RAG system**, a system where multiple agents work together to retrieve and generate information, combining the strengths of **retrieval-based systems** and **generative models**.

## Multi-agent RAG system

A **Multi-agent Retrieval-Augmented Generation (RAG)** system consists of multiple agents that collaborate to perform complex tasks. The retrieval agent retrieves documents or infomration, while the generative agent synthesizes that information to generate meaning outputs. There is a Manager agent that orchestrates the system and selects the most appropriate agent for the task based on the user input.

## Setups

In [None]:
!pip install -qU smolagents markdownify duckduckgo-search spaces gradio-tools langchain langchain-community langchain-huggingface faiss-cpu datasets

## Create multi-agent RAG system

In our RAG system, we will have 3 agents managed by a central agent:
- **Web search agent** - will include the `DuckDuckGoSearchTool` and the `VisitWebpageTool`
- **Retriever agent** - will include two tools for retrieving information from two different knowledge bases
- **Image generation agent** - will include a prompt generator tool in addition to the image generation tool.

In addition to these agents, the **central/orchestrator agent** will also have access to the **code interpreter tool** to execute code.

We will use [`Qwen/Qwen2.5-72B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) as the LLM for each component, which will be accessed via the Inference API.

In [None]:
from smolagents import HfApiModel

model_id = 'Qwen/Qwen2.5-72B-Instruct'
model = HfApiModel(model_id)

### Web search agent

The **web search agent** will use the `DuckDuckGoSearchTool` to search the web and gather relevant information. This tool acts as a search engine, querying for results based on the speciifed keywords.

To make the search results actionable, we also need the agent to access the web pages retrieved by DuckDuckGo, which can be achieved by using the built-in `VisitWebpageTool`.

Since this agent combine several tools to perform more complex tasks, we will use the `ToolCallingAgent`. The `ToolCallingAgent` is well-suited for web search tasks because its JSON action formulation requires only simple arguments and works seamlessly in sequential chains of single actions. In contrast, `CodeAgent` action formulation is better suited for scenarios involving numerous or parallel tool calls.

In [None]:
from smolagents import CodeAgent, ToolCallingAgent, ManagedAgent, DuckDuckGoSearchTool, VisitWebpageTool

web_agent = ToolCallingAgent(
    tools=[DuckDuckGoSearchTool(), VisitWebpageTool()],
    model=model
)

Once we have `web_agent` ready, we will wrap it as a `ManagedAgent` so the central agent can use it:

In [None]:
managed_web_agent = ManagedAgent(
    agent=web_agent,
    name='search_agent',
    description='Runs web searches for you. Give it your query as an argument.'
)

### Retriever agent

The **retriever agent** is responsible for gathering relevant information from different sources. It will utilize two tools that retrieve data from two separate knowledge bases.

#### HF docs retriever tool

For this retriever, we will use a dataset that contains a compilation of documentation pages for various `huggingface` packages, all stored as markdown files.

We will
- Download the dataset
- Embed the data: we then convert the documentation into embeddings using a **FAISS vector store** for efficient similarity search.

In [9]:
import datasets

knowledge_base = datasets.load_dataset('m-ric/huggingface_doc', split='train')

In [10]:
knowledge_base

Dataset({
    features: ['text', 'source'],
    num_rows: 2647
})

In [None]:
from tqdm import tqdm
from transformers import AutoTokenizer
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores.utils import DistanceStrategy


source_docs = [
    Document(page_content=doc['text'], metadata={'source': doc['source'].split('/')[1]})
    for doc in knowledge_base
]

text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
    AutoTokenizer.from_pretrained('thenlper/gte-small'),
    chunk_size=200,
    chunk_overlap=20,
    add_start_index=True,
    strip_whitespace=True,
    separators=['\n\n', '\n', '.', ' ', '']
)

In [None]:
# Split docs and keep only unique ones
print("Splitting documents...")
docs_processed = []
unique_texts = {}

for doc in tqdm(source_docs):
    new_docs = text_splitter.split_documents([doc])
    for new_doc in new_docs:
        if new_doc.page_content not in unique_texts:
            unique_texts[new_doc.page_content] = True
            docs_processed.append(new_doc)

print("Embedding documents...")
embedding_model = HuggingFaceEmbeddings(model_name='thenlper/gte-small')
huggingface_doc_vector_db = FAISS.from_documents(
    documents=docs_processed,
    embedding=embedding_model,
    distance_strategy=DistanceStrategy.COSINE
)

Now that we have the documentation embedded in FAISS, we can create the `RetrieverTool`. This tool will query the FAISS vector store to retrieve the most relevant documents based on the user query. This will allow the retriever agent to access and provide relevant documentation when queried.

In [14]:
from smolagents import Tool
from langchain_core.vectorstores import VectorStore

class RetrieverTool(Tool):
    name = 'retriever'
    description = "Using semantic similarity, retrieves some documents from the knowledge base that have the closest embeddings to the input query."
    inputs = {
        'query': {
            'type': 'string',
            'description': "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
        }
    }
    output_type = 'string'

    def __init__(self, vectordb: VectorStore, **kwargs):
        super().__init__(**kwargs)
        self.vectordb = vectordb

    def forward(self, query: str) -> str:
        assert isinstance(query, str), "Your search query must be a string"

        docs = self.vectordb.similarity_search(
            query,
            k=7
        )

        return "\nRetrieved documents:\n" + "".join(
            [f"===== Document {str(i)} =====\n"] + doc.page_content
            for i, doc in enumerate(docs)
        )

In [None]:
huggingface_doc_retriever_tool = RetrieverTool(vectordb=huggingface_doc_vector_db)

#### PEFT issue retriever tool

For the second retriever, we will use the PEFT issues as data source.

In [None]:
from google.colab import userdata

GITHUB_ACCESS_TOKEN = userdata.get('GITHUB_ACCESS_TOKEN')

In [None]:
from langchain.document_loaders import GitHubIssuesLoader

loader = GitHubIssuesLoader(
    repo='huggingface/peft',
    access_token=GITHUB_ACCESS_TOKEN,
    include_prs=False,
    state='all'
)
docs = loader.load()

In [None]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=30
)
chunks_docs = splitter.split_documents(docs)

peft_issues_vector_db = FAISS.from_documents(
    chunks_docs,
    embedding=embedding_model
)

Now we generate the second retriever tool using the same `RetrieverTool`

In [None]:
peft_issues_retriever_tool = RetrieverTool(vectordb=peft_issues_vector_db)

#### Build the retriever agent

Once we have created two retirever tools, we can build the **retriever agent**. This agent will manage both tools and retrieve relevant information based on the user query. When then will use the `ManagedAgent` to integrate these tools and pass the agent to the central agent for coordination.

In [None]:
retriever_agent = ToolCallingAgent(
    tools=[huggingface_doc_retriever_tool, peft_issues_retriever_tool],
    model=model,
    max_iterations=4,
    verbose=2
)

In [None]:
managed_retriever_agent = ManagedAgent(
    agent=retriever_agent,
    name='retriever_agent',
    description="Retrieves documents from the knowledge base for you that are close to the input query. Give it your query as an argument. The knowledge base includes Hugging Face documentation and PEFT issues.",
)

### Image generation agent

The **image generation agent** will have two tools: one for refining the user query and another for generating the image based on the query. In this case, we will use the `CodeAgent` instead of a `ReactAgent` since the set of actions can be executed in one shot.

In [None]:
from transformers import load_tool, CodeAgent

prompt_generator_tool = Tool.from_space(
    'sergiopaniego/Promptist',
    name='generator_tool',
    description="Optimizes user input into model-preferred prompts"
)

In [None]:
image_generation_tool = load_tool('m-ric/text-to-image', trust_remote_code=True)

image_generation_agent = CodeAgent(
    tools=[prompt_generator_tool, image_generation_tool],
    model=model
)

Again, we will use `ManagedAgent` to tell the central agent that it can manage this agent. In addition, we will include an `additional_prompting` parameter to ensure the agent returns the generated image instead of just a text description:

In [None]:
managed_image_generation_agent = ManagedAgent(
    agent=image_generation_agent,
    name='image_generation_agent',
    description="Generates images from text prompts. Give it your prompt as an argument.",
    additional_prompting="\n\nYour final answer MUST BE only the generated image location."
)

### Central agent manager

The **central agent manager** will coordinate tasks between the agents. It will
- **Receive user input** and decide which agent (Web search, Retriever, Image generation) handles it.
- **Delegate tasks** to the appropriate agent based on the user's query.
- **Collect and synthesize** results from the agents.
- **Return the final output** to the user.

We will include all the agents we have developed as `managed_agents` and add any necessary imports for the code executor under `additional_authorized_imports`.

In [None]:
manager_agent = CodeAgent(
    tools=[],
    model=model,
    managed_agents=[
        managed_web_agent,
        managed_retriever_agent,
        managed_image_generation_agent
    ],
    additional_authorized_imports=['time', 'datetime', 'PIL']
)

Now that everything is set up, we can test the performance of the multi-agent RAG system.

#### Example trying to trigger the search agent

In [None]:
manager_agent.run("How many years ago was Stripe founded?")

#### Example trying to trigger the image generator agent

In [None]:
result = manager_agent.run(
    "Improve this prompt, then generate an image of it.",
    prompt="A rabbit wearing a space suit"
)

In [None]:
from IPython.display import Image, display

display(Image(filename=result))

#### Example trying to trigger the retriever agent for the HF docs knowledge base

In [None]:
manager_agent.run("How can I push a model to the Hub?")

#### Example trying to trigger the retriever agent for the PEFT issues knowledge base

In [None]:
manager_agent.run("How do you combine multiple adapters in peft?")