## Set up

In [122]:
from google.colab import drive
#drive.flush_and_unmount()
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


Installing the necessary libraries

In [123]:
!pip install -qqq crewai_tools
!pip install -qqq crewai
!pip install -qqq openai

Loading the api keys

In [124]:
import os
from google.colab import userdata

# os.environ["OPENAI_API_KEY"] = userdata.get('openai-key')
os.environ["GROQ_API_KEY"] = userdata.get('groq-key')
# os.environ["PINECONE_API_KEY"] = userdata.get('pinecone-key')
# Agent will automatically use the model defined in the environment variable
# os.environ["OPENAI_MODEL_NAME"] = 'gpt-4'
os.environ["GROQ_MODEL_NAME"] = 'llama3-70b-8192'

In [125]:
from crewai import Agent, Crew, Process, Task
from crewai_tools import PDFSearchTool
from dotenv import load_dotenv

load_dotenv()

False

## Creating Tools
I have used the PDFSearchTool provided by crewAI, which is built on RAG using OpenAI's embeddings and stores the vector databases on ChromaDB. All of this is handled by CrewAI.

Just passing the file-path to the pdfs will work here.

In [126]:
import os

print(os.listdir('/content'))


['.config', '.ipynb_checkpoints', 'Post-Event-Report-Beryl-July-2024.pdf', 'drive', 'db', 'fastfacts-what-is-climate-change.pdf', 'history_of_olympics.pdf', 'sample_data']


In [127]:
pdf_search_tool1 = PDFSearchTool(
    pdf='/content/history_of_olympics.pdf',
)

pdf_search_tool2 = PDFSearchTool(
    pdf='/content/Post-Event-Report-Beryl-July-2024.pdf',
)

pdf_search_tool3 = PDFSearchTool(
    pdf='/content/fastfacts-what-is-climate-change.pdf',
)


## User Query
I wanted to enter the query here, so that

In [128]:
# user_query = """
#         When was the first Modern Olympics Games held?
#         """


user_query = """
        What was the total economic damage of Hurricane Beryl?
        """

# user_query = """
#         Is climate change a serious problem? Provide 3 bullet points to tackle it.
#         ONLY provide the information from the pdfs provided!
#         """

# user_query ="What's the purpose of life? "

print(user_query)


        What was the total economic damage of Hurricane Beryl?
        


## Creating Agents
There are 3 agents here each of whom reads through the 3 separate pdfs provided and creates their vector embeddings employing the 'PDFSearchTool'. The tool isn't passed here yet, it's passed when defining the task. This way it resulted in a better response.

In [129]:
pdf_reading_Agent1 = Agent(
    role="pdf_reading_Agent1",
    goal="Search through the PDF to find relevant answers to the query",
    verbose=True,
    memory = True,
    backstory=(
        """
        You are adept at searching and extracting data ONLY from the third pdf
        document provided, ensuring accurate and prompt responses.
        """
    ),
    # tools=[pdf_search_tool1],
    allow_delegation=False
)

pdf_reading_Agent2 = Agent(
    role="pdf_reading_Agent2",
    goal="Search through the PDF to find relevant answers to the query",
    verbose=True,
    memory = True,
    backstory=(
        """
        You are adept at searching and extracting data ONLY from the third pdf
        document provided, ensuring accurate and prompt responses.
        """
    ),
    # tools=[pdf_search_tool2],
    allow_delegation=False
)

pdf_reading_Agent3 = Agent(
    role="pdf_reading_Agent3",
    goal="Search through the PDF to find relevant answers to the query",
    verbose=True,
    memory = True,
    backstory=(
        """
        You are adept at searching and extracting data ONLY from the third pdf
        document provided, ensuring accurate and prompt responses.
        """
    ),
    # tools=[pdf_search_tool3],
    allow_delegation=False
)

Create a manager agent for strategic task assignments

In [130]:
manager = Agent(
    role ="Project Manager",
    goal = "strategically & efficiently manage the tasks among the crew members & ensure high quality task completion",
    backstory=(
        f"""
        You are an experienced project manager skiled in overseeing complex projects
        and guiding teams to success.
        Your goal is to coordinate efforts of the crew members, to extract ONLY
        the relevant information from one of the 3 pdf documents provided
        to each other agents in reponse to the user's query.

        If you cannot find the information that the user is seeking for,
        in any of the three pdfs, you respond saying that the query isn't relevant.

        The User's query is:
        {user_query}
        """
    ),
    allow_delegation = True
)

## Creating Tasks
Each of these 3 tasks utilizes the 'PDFSearchTool' as defined above to create embeddings of the pdf which can then be queried later to provide a response.

In [131]:
response_task1 = Task(
    description=(
        f"""
        Provide a response to user's query based on content of the provided PDF.
        The agent1 will search through the PDF to find the relevant answers.
        The final answer MUST be clear & accurate, based ONLY on the content of the PDF.

        The User's query is:
        {user_query}
        """
    ),
    expected_output="""
        Provide clear and accurate answers to the user's queries based on
        the content of the PDF.
        """,
    tools=[pdf_search_tool1],
    agent=pdf_reading_Agent1
)

response_task2 = Task(
    description=(
        f"""
        Provide a response to user's query based on content of the provided PDF.
        The agent1 will search through the PDF to find the relevant answers.
        The final answer MUST be clear & accurate, based ONLY on the content of the PDF.

        The User's query is:
        {user_query}
        """
    ),
    expected_output="""
        Provide clear and accurate answers to the user's queries based on
        the content of the PDF.
        """,
    tools=[pdf_search_tool2],
    agent=pdf_reading_Agent2
)

response_task3 = Task(
    description=(
        f"""
        Provide a response to user's query based on content of the provided PDF.
        The agent1 will search through the PDF to find the relevant answers.
        The final answer MUST be clear & accurate, based ONLY on the content of the PDF.

        The User's query is:
        {user_query}
        """
    ),
    expected_output="""
        Provide clear and accurate answers to the user's queries based on
        the content of the PDF.
        """,
    tools=[pdf_search_tool3],
    agent=pdf_reading_Agent3
)

## Creating Crew with a Custom Manager

When kicking off the crew, I got an error saying, "Manager agent should not have tools", although I haven't allocated any tools to the 'manager' agent.

Therefore, I have manually set the 'tools' attribute to an empty list here.

In [132]:
manager.tools=[]

In [133]:
# from langchain_openai import ChatOpenAI

# manager_llm = ChatOpenAI(model_name="gpt-3.5-turbo")

crew = Crew(
    agents=[pdf_reading_Agent1, pdf_reading_Agent2, pdf_reading_Agent3],
    tasks=[response_task1, response_task2, response_task3],
    manager_agent=manager,
    # manager_llm=manager_llm,
    verbose = True,
    process = Process.hierarchical
)



Let the work Beginnnn!!!

In [134]:
result = crew.kickoff()
print(result)

[1m[95m [2024-07-29 02:39:50][DEBUG]: == Working Agent: Project Manager[00m
[1m[95m [2024-07-29 02:39:50][INFO]: == Starting Task: 
        Provide a response to user's query based on content of the provided PDF.
        The agent1 will search through the PDF to find the relevant answers.
        The final answer MUST be clear & accurate, based ONLY on the content of the PDF.

        The User's query is:
        
        What was the total economic damage of Hurricane Beryl?
        
        [00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3m[0m[32;1m[1;3m[0m[32;1m[1;3m[0m[32;1m[1;3m[0m[32;1m[1;3m[0m[32;1m[1;3mIf you don't need to use any more tools, you must give your best complete final answer, make sure it satisfy the expect criteria, use the EXACT format below:

Thought: I now can give a great answer
Final Answer: my best complete final answer to the task.[0m

[1m> Finished chain.[0m
[95m 

my best complete final answer to the task.
[00

KeyboardInterrupt: 

Display the final result in **Bold** Text



In [None]:
from IPython.display import display, HTML

def display_result_in_bold(text):
    display(HTML(f"<h1 style='font-size:24px; font-weight:bold;'>{text}</h1>"))

# Example usage
# result = "This is the result you want to display in bold with a bigger font"
display_result_in_bold(result)


# print(result)

In [None]:
# from IPython.display import Markdown
# Markdown(result)

## Observations

1. I used the PDFSearchTool provided by crewAI for creating embeddings of the 3 pdfs and getting the response of the query. It uses RAG underneath along with OpenAI embeddings and chromadb.
This was really easy and simple way of doing the RAG search. It took me quite a long time to finally figure it out, after watching some youtube videos.

For most of time, I was under the impression that we have to create the embeddings ourselves, which I tried, even tried creating custom crewAI tool, but it didn't work.

2. I stopped running the crew, as you can see above, since my balance was exhausted on the OpenAI account. You may not realize, but running this crew was extremely costly. I spent almost $15 for this without even realizing. It charged me almost a dollar for every time I'd kickoff the crew.

It could be because since the CrewAI creates the embeddings using OpenAI, which is not cheap.

3. I did run several iterations playing with manager agent v/s manager llm, twicking the prompt and backstory, and asking queries on each of the pdfs. They all yielded great results, but it would wipe them out when I ran the crew next time. Then, I ran out of funds, so the results may not be exactly what you've wanted.

However, you can see one of the results displayed in bold above.