# Building a RAG Agent with CrewAI

We will build a RAG agent with `CrewAI`by integrating `RagTool` from `crewai_tools` with a `CrewAI` agent. `RagTool` provides a way to create and query knowledge bases from various data sources, and allows the agent to access specialized context. We will provide the RAG tool a pdf file containing details about insurance coverage provided by a private health insurer. We will build an insurer agent specialized in answering queries related to health benefits. At the end, we will wrap this agent in an ACP server and make it interact with other ACP agents.

## 3.1. Import Libraries

In [2]:
# Need an agent to do the task
# Bring up some dependencies
from crewai import Crew, Task, Agent, LLM
from crewai_tools import RagTool

In [3]:
# filter unnecessary warnings
import warnings
warnings.filterwarnings('ignore')

## 3.2. Define the Agent's Large Language Model

We'll now define the large language model that will be used for our CrewAI agent. `max_tokens`: maximum number of tokens the model can generate in a single response.

**Note**: If we will define this model locally, it requires that we define the API key in a **.env** file as follows:
```
# Required
OPENAI_API_KEY=sk-...

# Optional
OPENAI_API_BASE=<custom-base-url>
OPENAI_ORGANIZATION=<your-org-id>
```

In [4]:
# we can use any model that we want to use aside of openai gpt-4
llm = LLM(model="openai/gpt-4", max_tokens=1024)

## 3.3. Define the RAG Tool

For the RAG tool, we can define the model provider and the embedding model in a configuration Python dictionary. We can also define the details of your vector database. If we don't specify the vector database, the RagTool will use Chroma (ChromaDB) as the default vector database in local/in-memory mode.

In [5]:
config = {
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4",
        }
    },
    "embedding_model": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-ada-002"
        }
    }
}

We can then pass the `config` to the `RagTool`, and then specify the data source for which the knowledge base will be constructed. When embedding our data, the `RagTool` chunks our document into chunks and creates an embedding vector for each chunk. We can specify the chunk size (`chunk_size`: number of characters) and how many characters overlap between consecutive chunks (`chunk_overlap`). We can also use the default behavior.

In [6]:
rag_tool = RagTool(config=config,  
                   chunk_size=1200,       
                   chunk_overlap=200, # make sure we're not cutting off channel incorrectly    
                  )
# we can change the source of document to be added in rag tool
rag_tool.add("../data/gold-hospital-and-premium-extras.pdf", data_type="pdf_file")

Inserting batches in chromadb: 100%|██████████| 1/1 [00:00<00:00, 12.92it/s]


In [7]:
# This our rag tools to define the agent
rag_tool

RagTool(name='Knowledge base', description="Tool Name: Knowledge base\nTool Arguments: {'query': {'description': None, 'type': 'str'}}\nTool Description: A knowledge base that can be used to answer questions.", args_schema=<class 'abc.RagToolSchema'>, description_updated=False, cache_function=<function BaseTool.<lambda> at 0x7f61f8736de0>, result_as_answer=False, summarize=False, adapter=EmbedchainAdapter(embedchain_app=<embedchain.app.App object at 0x7f61e78cf050>, summarize=False), config={'llm': {'provider': 'openai', 'config': {'model': 'gpt-4'}}, 'embedding_model': {'provider': 'openai', 'config': {'model': 'text-embedding-ada-002'}}})

## 3.4. Define the Insurance Agent

Now, we have the `rag_tool` defined to define the CrewAI agent that can assist with insurance coverage queries.

In [8]:
# we can change the role
insurance_agent = Agent(
    role="Senior Insurance Coverage Assistant", 
    goal="Determine whether something is covered or not",
    backstory="You are an expert insurance agent designed to assist with coverage queries",
    verbose=True, # it means as agent is running, we can see the progress (like ChatGPT is thinking....)
    allow_delegation=False, # we're not going to be passing off the task to other agents
    llm=llm,
    tools=[rag_tool], # pass our rag tool
    max_retry_limit=5 # give the time for the agent to find the answer for 5 times, if not it will be error
)

## 3.5. Define the Agent Task

Let's now test the insurance agent. For that, we need to define the agent task and pass to it the query and the agent.

In [9]:
# we can change the description
task1 = Task(
        description='What is the waiting period for rehabilitation?',
        expected_output = "A comprehensive response as to the users question",
        agent=insurance_agent
)

## 3.6. Run the Insurance Agent

To run the agent, we need to pass the agent and the task to a Crew object that you can run using the `kickoff` method.

In [10]:
crew = Crew(agents=[insurance_agent], tasks=[task1], verbose=True)
task_output = crew.kickoff() # it same like when we hit enter in ChatGPT
print(task_output) 

[1m[95m# Agent:[00m [1m[92mSenior Insurance Coverage Assistant[00m
[95m## Task:[00m [92mWhat is the waiting period for rehabilitation?[00m




[1m[95m# Agent:[00m [1m[92mSenior Insurance Coverage Assistant[00m
[95m## Thought:[00m [92mIn order to respond to this query, I need to consult the knowledge base for details on the waiting period for rehabilitation under the client's specific insurance policy.[00m
[95m## Using tool:[00m [92mKnowledge base[00m
[95m## Tool Input:[00m [92m
"{\"query\": \"waiting period for rehabilitation\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
CLINICAL CATEGORIES WAITING PERIOD GOLD Rehabilitation 2 months 4 Hospital psychiatric services 2 months 4 Palliative care 2 months 4 Brain and nervous system 2 months 4 Eye (not cataracts) 2 months 4 Ear, nose and throat 2 months 4 Tonsils, adenoids and grommets 2 months 4 Bone, joint and muscle 2 months 4 Joint reconstructions 2 months 4 Kidney and bladder 2 months 4 Male reproductive system 2 months 4 Digestive system 2 months 4 Hernia and appendix 2 months 4 Gastrointestinal endoscopy 2 months 4 Gynaecology 2 months 4 Misca



[1m[95m# Agent:[00m [1m[92mSenior Insurance Coverage Assistant[00m
[95m## Final Answer:[00m [92m
The waiting period for rehabilitation services under most clinical categories is 2 months of continuous cover. However, for any pre-existing conditions, the waiting period is extended to 12 months. It is worth noting that these durations apply provided that referrals from treating doctors are obtained where necessary. Please note that waiting periods may also apply to certain health programs that aim to help you recover in the comfort of your own home.[00m




The waiting period for rehabilitation services under most clinical categories is 2 months of continuous cover. However, for any pre-existing conditions, the waiting period is extended to 12 months. It is worth noting that these durations apply provided that referrals from treating doctors are obtained where necessary. Please note that waiting periods may also apply to certain health programs that aim to help you recover in the comfort of your own home.


## 3.7. Resources

- [CrewAI Agents](https://docs.crewai.com/concepts/agents)
- [CrewAI Tasks](https://docs.crewai.com/concepts/tasks)
- [CrewAI RagTool](https://docs.crewai.com/tools/ai-ml/ragtool)
- [Short course on Multi Agents with CrewAI](https://www.deeplearning.ai/short-courses/multi-ai-agent-systems-with-crewai/)