# RAG Agent

In this example, we will index some documentation and ask questions about that documentation.

The tool we use is the memory tool. Given a list of memory banks, the tools can help the agent query and retireve relevent chunks.
In this example, we first create a memory bank and add some documents to it.
Then we configure the agent to use the memory tool.

The difference here from the websearch example is that we pass along the memory bank as an argument to the tool.

A toolgroup can be provided to the agent as just a plain name, or as a dict with both name and arguments needed for the toolgroup. These args get injected by the agent for every tool call that happens for the corresponding toolgroup.


## Setup

In [1]:
# Imports
import os
import sys

from llama_stack_client import LlamaStackClient

# Select model
model = "sambanova/Meta-Llama-3.3-70B-Instruct"

In [2]:
# Create HTTP client
client = LlamaStackClient(base_url=f"http://localhost:{os.environ['LLAMA_STACK_PORT']}")

In [3]:
import uuid
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from termcolor import cprint
from llama_stack_client.types import Document

urls = ["chat.rst", "llama3.rst", "memory_optimizations.rst", "lora_finetune.rst"]
documents = [
    Document(
        document_id=f"num-{i}",
        content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
        mime_type="text/plain",
        metadata={},
    )
    for i, url in enumerate(urls)
]

vector_db_id = f"test-vector-db-{uuid.uuid4().hex}"
client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model="all-MiniLM-L6-v2",
    embedding_dimension=384,
)

client.tool_runtime.rag_tool.insert(
    documents=documents,
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=512,
)

rag_agent = Agent(
    client, 
    model=model,
    instructions="You are a helpful assistant",
    tools = [
        {
          "name": "builtin::rag/knowledge_search",
          "args" : {
            "vector_db_ids": [vector_db_id],
          }
        }
    ],
)

session_id = rag_agent.create_session("test-session")

user_prompts = [
        "What are the top 5 topics that were explained? Only list succinct bullet points.",
]
for prompt in user_prompts:
    cprint(f'User> {prompt}', 'green')
    response = rag_agent.create_turn(
        messages=[{"role": "user", "content": prompt}],
        session_id=session_id,
    )
    for log in EventLogger().log(response):
        log.print()

[32mUser> What are the top 5 topics that were explained? Only list succinct bullet points.[0m
[33minference> [0m[36m{"query":"top 5 explained topics"}[0m[36m)[0m[97m[0m
[32mtool_execution> Tool:knowledge_search Args:{'query': 'top 5 explained topics'}[0m
[33minference> [0m[33m* Tokenizing prompt templates & special tokens
* Fine-Tuning Llama3 with Chat Data
* Model Precision
* Evaluating fine-tuned Llama3-8B models with EleutherAI's Eval Harness
* Fine-tuning on a custom chat dataset[0m[97m[0m
[30m[0m