### Run in Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1G4qV9OMj7E1XHc8F1qezJ6E0ird3K_7j#scrollTo=957LeMHjAdNY)

### Setup your API key

Before you can use the OpenAI API, you must first obtain an API key. If you don't already have one, create a key for OPENAI_API_KEY which is obtained from: https://openai.com/blog/openai-api

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name `OPENAI_API_KEY`.


### Mount this Colab notebook at Google Drive
To ensure your work is saved and can be accessed later, it's a good practice to mount your Google Drive to this Colab notebook. You can do this by running the following code. This will prompt you to authorize Colab to access your Google Drive account. After authorization, your Google Drive will be mounted, and you can save your files directly to it.

In [None]:
# Used to securely store your API key
from google.colab import userdata
api_key = userdata.get('OPENAI_API_KEY')

# Mount google drive in the virtual envrionment
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### ENVIRONMENT SETUP

In [None]:
!pip install crewai
!pip install openai

Collecting crewai
  Downloading crewai-0.86.0-py3-none-any.whl.metadata (19 kB)
Collecting appdirs>=1.4.4 (from crewai)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting auth0-python>=4.7.1 (from crewai)
  Downloading auth0_python-4.7.2-py3-none-any.whl.metadata (8.9 kB)
Collecting chromadb>=0.5.18 (from crewai)
  Downloading chromadb-0.5.23-py3-none-any.whl.metadata (6.8 kB)
Collecting crewai-tools>=0.17.0 (from crewai)
  Downloading crewai_tools-0.17.0-py3-none-any.whl.metadata (4.9 kB)
Collecting instructor>=1.3.3 (from crewai)
  Downloading instructor-1.7.0-py3-none-any.whl.metadata (17 kB)
Collecting json-repair>=0.25.2 (from crewai)
  Downloading json_repair-0.31.0-py3-none-any.whl.metadata (11 kB)
Collecting jsonref>=1.1.0 (from crewai)
  Downloading jsonref-1.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting litellm>=1.44.22 (from crewai)
  Downloading litellm-1.55.2-py3-none-any.whl.metadata (34 kB)
Collecting opentelemetry-exporter-otlp-proto-http>

### Download Essential Data

If this is the first time you run the code, make sure you set download_data to True

In [None]:
import os

# Specify the output folder where the CSV will be saved
output_folder = "/content/drive/MyDrive/LLM_crewai_RAG/"

# Check if the output folder exists and create it if it does not
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

# Google Drive download commands using gdown for each file
!gdown --id 1-UaVrPof8mhpZFtt958Ns7zRu-01-E-E -O "/content/drive/MyDrive/LLM_crewai_RAG/EngineeringReference.pdf"

Downloading...
From: https://drive.google.com/uc?id=1-UaVrPof8mhpZFtt958Ns7zRu-01-E-E
To: /content/drive/MyDrive/LLM_crewai_RAG/EngineeringReference_1.pdf
100% 23.8M/23.8M [00:00<00:00, 103MB/s] 


In [None]:
from crewai import Agent, Task, Crew

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from IPython.display import Image

import openai
client = openai.OpenAI(api_key=api_key)

# Warning control
import warnings
warnings.filterwarnings('ignore')

from crewai_tools import ScrapeWebsiteTool, SerperDevTool

search_tool = SerperDevTool()
scrape_tool = ScrapeWebsiteTool()

from crewai import Agent, Crew, Process, Task
from crewai_tools import PDFSearchTool
# from dotenv import load_dotenv
import os
# load_dotenv()

os.environ["OPENAI_API_KEY"] = api_key

In [None]:
# --- Tools ---
# PDF SOURCE: https://www.gpinspect.com/wp-content/uploads/2021/03/sample-home-report-inspection.pdf
pdf_search_tool = PDFSearchTool(
    pdf="/content/drive/MyDrive/LLM_crewai_RAG/EngineeringReference.pdf"
)

# --- Agents ---
research_agent = Agent(
    role="Research Agent",
    goal="Search through the PDF to find relevant answers",
    allow_delegation=False,
    verbose=True,
    backstory=(
        """
        The research agent is adept at searching and
        extracting data from documents, ensuring accurate and prompt responses.
        """
    ),
    tools=[pdf_search_tool],

)

professional_writer_agent = Agent(
    role="Professional Re-Writer",
    goal="Re-Write improved professional answer based on the research agent's findings, to make sure it is understandable by non-building expert",
    allow_delegation=False,
    verbose=True,
    backstory=(
        """
        The professional writer agent has excellent writing skills and is able to craft
        clear and concise response based on the provided information.
        """
    ),
    tools=[],
)


# --- Tasks ---
answer_people_question_task = Task(
    description=(
        """
        Answer the people's questions based on the content in PDF.
        The research agent will search through the PDF to find the relevant answers.
        Your final answer MUST be clear and accurate, based on the content of the PDF.

        Here is the people's question:
        {people_question}
        """
    ),
    expected_output="""
        Provide clear and accurate answers to the people's questions based on
        the content of the home inspection PDF.
        """,
    tools=[pdf_search_tool],
    agent=research_agent,
)

professional_writer_task = Task(
    description=(
        """
        Re-Write improved professional answer based on the research agent's findings, to make sure it is understandable by non-building expert by expalining terminologies

        """
    ),
    expected_output="""
        Provide improved writing clear and accurate answers based on the research agent's findings, to make sure it is understandable by non-building expert.
        """,
    tools=[],
    agent=professional_writer_agent,
)

# --- Crew ---
crew = Crew(
    agents=[research_agent, professional_writer_agent],
    tasks=[answer_people_question_task, professional_writer_task],
    process=Process.sequential,
)

people_question = "What is Baffle Heat Balance?\n"

result = crew.kickoff(inputs={"people_question": people_question})
print(result)



[1m[95m# Agent:[00m [1m[92mResearch Agent[00m
[95m## Task:[00m [92m
        Answer the people's questions based on the content in PDF.
        The research agent will search through the PDF to find the relevant answers.
        Your final answer MUST be clear and accurate, based on the content of the PDF.

        Here is the people's question:
        What is Baffle Heat Balance?

        [00m
[91m 

I encountered an error while trying to use the tool. This was the error: Arguments validation failed: 1 validation error for FixedPDFSearchToolSchema
query
  Input should be a valid string [type=string_type, input_value={'description': 'Baffle H...Balance', 'type': 'str'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/string_type.
 Tool Search a PDF's content accepts these inputs: Tool Name: Search a PDF's content
Tool Arguments: {'query': {'description': "Mandatory query you want to use to search the PDF's content", 'type': 'str'}}
Tool 