# Level 3: Agentic RAG

This tutorial presents an example of executing queries with agentic RAG in Llama Stack. It shows how to initialize an agent with the RAG tool provided by Llama Stack and to invoke it such that retrieval from a vector DB is activated when necessary. The tutorial also covers document ingestion using the RAG tool.
For a foundational (non-agentic) RAG tutorial, please refer to [Level1_foundational_RAG.ipynb](demos/rag_agentic/notebooks/Level1_foundational_RAG.ipynb).

## Overview

This tutorial covers the following steps:
1. Connecting to a llama-stack server.
2. Indexing a collection of documents in a vector DB for later retrieval.
3. Initializing the agent capable of retrieving content from vector DB via tool use.
4. Launching the agent and using it to answer user queries during the inference step.


## Prerequisites

Before starting, ensure you have a running instance of the Llama Stack server (local or remote) with at least one preconfigured vector DB. For more information, please refer to the corresponding [Llama Stack tutorials](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html).

## 1. Setting Up the Environment
- Import the necessary libraries.
- Define the settings for the RAG pipeline, including the Llama Stack server URL, inference and document ingestion parameters.
- Initialize the connection to the server.

In [7]:
import os
import uuid

from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient

# the server endpoint
LLAMA_STACK_SERVER_URL = "http://localhost:8321"

# inference settings
MODEL_ID = "ibm-granite/granite-3.2-8b-instruct"
SYSTEM_PROMPT = "You are a helpful assistant. "
TEMPERATURE = 0.0
TOP_P = 0.95

# RAG settings
VECTOR_DB_EMBEDDING_MODEL = "all-MiniLM-L6-v2"
VECTOR_DB_EMBEDDING_DIMENSION = 384
VECTOR_DB_CHUNK_SIZE = 512

# For this demo, we are using Milvus Lite, which is our preferred solution. Any other Vector DB supported by Llama Stack can be used.
VECTOR_DB_PROVIDER_ID = 'milvus'

# initialize the inference strategy
if TEMPERATURE > 0.0:
    strategy = {"type": "top_p", "temperature": TEMPERATURE, "top_p": TOP_P}
else:
    strategy = {"type": "greedy"}
    
# initialize the document collection to be used for RAG
vector_db_id = f"test_vector_db_{uuid.uuid4()}"
    
# initialize the server connection
client = LlamaStackClient(base_url=os.environ.get("LLAMA_STACK_ENDPOINT", LLAMA_STACK_SERVER_URL))

## 2. Indexing the Documents
- Initialize a new document collection in the target vector DB. All parameters related to the vector DB, such as the embedding model and dimension, must be specified here.
- Provide a list of document URLs to the RAG tool. Llama Stack will handle fetching, conversion and chunking of the documents' content.

In [9]:
# define and register the document collection to be used
client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model=VECTOR_DB_EMBEDDING_MODEL,
    embedding_dimension=VECTOR_DB_EMBEDDING_DIMENSION,
    provider_id=VECTOR_DB_PROVIDER_ID,
)

# ingest the documents into the newly created document collection
urls = [
    ("https://www.openshift.guide/openshift-guide-screen.pdf", "application/pdf"),
    ("https://www.cdflaborlaw.com/_images/content/2023_OCBJ_GC_Awards_Article.pdf", "application/pdf"),
]
documents = [
    RAGDocument(
        document_id=f"num-{i}",
        content=url,
        mime_type=url_type,
        metadata={},
    )
    for i, (url, url_type) in enumerate(urls)
]
client.tool_runtime.rag_tool.insert(
    documents=documents,
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=VECTOR_DB_CHUNK_SIZE,
)

## 3. Executing queries via the RAG-aware agent
- Initialize an agent with a list of tools including the built-in RAG tool. The RAG tool specification must include a list of document collection IDs to retrieve from.
- For each prompt, initialize a new agent session, execute a turn during which a retrieval call may be requested, and output the reply received from the agent.

In [28]:
queries = [
    "How to install OpenShift?",
    "Are employees based in California eligible for remote work?",
]

# initializing the agent
agent = Agent(
    client,
    model=MODEL_ID,
    instructions=SYSTEM_PROMPT,
    sampling_params={
        "strategy": strategy,
    },
    # we make our agent aware of the RAG tool by including builtin::rag/knowledge_search in the list of tools
    tools=[
        dict(
            name="builtin::rag/knowledge_search",
            args={
                "vector_db_ids": [vector_db_id],  # list of IDs of document collections to consider during retrieval
            },
        )
    ],
)

for prompt in queries:
    print(f"User> {prompt}")
    
    # create a new turn with a new session ID for each prompt
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=agent.create_session(f"rag-session_{uuid.uuid4()}")
    )
    
    # print the response, including tool calls output
    for log in AgentEventLogger().log(response):
        print(log.content, end='')

User> How to install OpenShift?
{"type": "function", "name": "knowledge_search", "parameters": {"query": "installing OpenShift"}}Tool:knowledge_search Args:{'query': 'installing OpenShift'}Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text='Result 1:\nDocument_id:num-0\nContent:  We\nrecommend you to check the official Red Hat OpenShift Local documentation for an updated list of\nrequirements at the official documentation website.\n\uf05a\nRegarding Linux, even if Red Hat does not officially support them, OpenShift Local\ncan run on other distributions, such as Ubuntu or Debian, with minor caveats.\nRunning OpenShift Local on any Linux distribution requires a few additional\nsoftware packages to be installed through your default package manager. The\n15\ndocumentation at crc.dev/crc has more information about this subject.\n7.2. Hardware Requirements\nIn terms of har

## Key Takeaways
This tutorial demonstrates how to implement agentic RAG with Llama Stack. We do so by initializing an agent while giving it access to the RAG tool, then invoking the agent on each of the specified queries. Please check out our [complementary tutorial](demos/rag_agentic/notebooks/Level1_foundational_RAG.ipynb) for a non-agentic RAG example.