### Building a simple RAG Application with CrewAI
---

CrewAI is a cutting-edge framework for orchestrating autonomous AI agents. CrewAI enables you to create AI teams where each agent has specific roles, tools, and goals, working together to accomplish complex tasks.

**CrewAI has the following main components:**

1. `Crew`: This is the top level management system that manages various AI teams, breaks down simple tasks and delegates those tasks, ensures collaboration between the agents and ensures task completion.

1. `AI Agents`: These are specialized entities (agents), could be a writer, researcher, etc. These agents have the power to make autonomous decisions and perform certain tasks.

1. `Process`: This is the workflow that defines the collaboration patterns between the sub agents, controls the task assignments, interactions and task execution.

1. `Tasks`: These are well-defined micro-level tasks that are supposed to be executed by the AI agent to produce some level of actionable result.

In [None]:
# Install crew ai. For installation steps, follow the instructions here: https://docs.crewai.com/installation
!pip install 'crewai[tools]'

In [None]:
# Verify the crew ai installation
!pip freeze | grep crewai

In [None]:
!uv add docling
!uv add crewai
!uv add langchain
!uv add requests

In [None]:
# Import the required libraries
import os
import json
import time
import uuid
import boto3
import typing
import logging
import requests
from crewai import LLM
from crewai import Task
from crewai import Agent
from crewai_tools import SerperDevTool
from typing import List, Dict, Optional, Any

In [None]:
# set a logger
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)

In [None]:
session = boto3.session.Session()
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()["Account"]
region = session.region_name

logger.info(f"current region: {region}")

In [None]:
# define global variables that will be used across this notebook
BEDROCK_NOVA_LITE_MODEL: str = 'us.amazon.nova-lite-v1:0'
BEDROCK_CLAUDE_3_5_SONNET_V1_MODEL: str = 'anthropic.claude-3-5-sonnet-20240620-v1:0'
BEDROCK_CLAUDE_3_HAIKU: str = "us.anthropic.claude-3-haiku-20240307-v1:0"
BEDROCK_LLAMA3_1_70B_MODEL: str = "us.meta.llama3-1-70b-instruct-v1:0"
TITAN_TEXT_EMBED_V2: str = 'amazon.titan-embed-text-v2:0'
DATA_DIR: str = "data"
AWS_SERVICES_PDF_URL: str = "https://docs.aws.amazon.com/pdfs/whitepapers/latest/aws-overview/aws-overview.pdf"
PDF_FILE_NAME_LOCAL: str = "aws_overview.pdf"

### Store the knowledge for the agent
---

In this portion of the notebook, we will store the AWS service PDF file as a `string_knowledge_source`. CrewAI supports text (`PDF`, `raw strings`, `text files`) and structured data (`CSV`, `JSON`, `Excel`) files.

In this example, we will create a custom knowledge base to store information from the AWS service PDF file.

In [None]:
# initialize an embedder
embedder = {
    "provider": "bedrock",
    "config": {
        "model": TITAN_TEXT_EMBED_V2
    },
}

In [None]:
import shutil
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)
try:
    response = requests.get(AWS_SERVICES_PDF_URL)
    response.raise_for_status()
    # Save directly to root directory
    with open(PDF_FILE_NAME_LOCAL, 'wb') as f:
        f.write(response.content)
    print(f"PDF successfully downloaded to root directory as {PDF_FILE_NAME_LOCAL}")

    # Create 'knowledge' directory if it doesn't exist
    if not os.path.exists('knowledge'):
        os.makedirs('knowledge')
    
    # Copy the PDF to knowledge directory
    shutil.copy(PDF_FILE_NAME_LOCAL, os.path.join('knowledge', PDF_FILE_NAME_LOCAL))
    print(f"PDF copied to knowledge directory")
    pdf_source = PDFKnowledgeSource(file_paths=[PDF_FILE_NAME_LOCAL], embedder=embedder)
except Exception as e:
    print(f"An error occurred while downloading the file: {e}")

### Create a CrewAI Agent
---

In this portion of the notebook, we will create an Agent using the agent class. There are various ways to create an agent, using a [YAML file](https://docs.crewai.com/concepts/agents) or directly through the [code](https://docs.crewai.com/concepts/agents). For the purpose of this RAG example, we will be creating an agent through simple python code using an Agent class.

In the example below, we will create a simple RAG agent that is an AWS service provider and a code generation agent that will generate code based on user requests. It has access to large amounts of data and will be able to answer questions about that data and perform other simple tasks.

In [None]:
# Basic configuration
llm = LLM(
    model=f"bedrock/{BEDROCK_CLAUDE_3_HAIKU}", 
    temperature=0.1,        # Higher for more creative outputs
    timeout=120,           # Seconds to wait for response
    max_tokens=256,       # Maximum length of response
    top_p=0.9,            # Nucleus sampling parameter
)

In [None]:
# First, we will create an agent that is an AWS solutions architect and assists users with questions about
# their journey on AWS cloud

# Create an agent with all available parameters
aws_agent = Agent(
    role="AWS Solutions Architect. All requests are routed to this agent about AWS",
    goal="Analyze the customer question and best assist them by answering and providing accurate answers about the AWS cloud. All requests are routed to this agent about AWS",
    backstory="With over 10 years of experience solutions architecture and AWS cloud, "
              "you excel at supporting customers in their journeys on the cloud. You are highly technical and can,"
              "answer customer questions with ease. If there is a question you don't know the answer to, you never second guess, "
              "you always answer truthfully and accurately.",
    llm=llm,  
    function_calling_llm=None,  # Optional: Separate LLM for tool calling
    memory=True,  # Default: True
    verbose=False,  # Default: False
    allow_delegation=False,  # Default: False
    max_iter=20,  # Default: 20 iterations
    max_rpm=None,  # Optional: Rate limit for API calls
    max_execution_time=None,  # Optional: Maximum execution time in seconds
    max_retry_limit=2,  # Default: 2 retries on error
    respect_context_window=True,  # Default: True
    use_system_prompt=True,  # Default: True
    # we will pass in the content source as a knowledge base for the aws agent
)


In [None]:
# Next, we will create a code generation agent that will generate code based on the user's input. 
dev_agent = Agent(
    role="Senior Python Developer. All coding related questions are routed to this agent",
    goal="Write and debug Python code",
    backstory="Expert Python developer with 10 years of experience",
    llm=llm,
    allow_code_execution=True,
    code_execution_mode="safe",  # Uses Docker for safety
    max_execution_time=300,  # 5-minute timeout
    max_retry_limit=3  # More retries for complex code tasks
)

### Create agentic tasks
---

Tasks provide all necessary details for execution, such as a description, the agent responsible, required tools, and more, facilitating a wide range of action complexities. In this portion of the notebook, we will create tasks for each agent that it will be able to perform. Tasks within CrewAI can be collaborative, requiring multiple agents to work together. This is managed through the task properties and orchestrated by the Crew’s process, enhancing teamwork and efficiency.

Tasks can either be executed `sequentially` (in the case of which an agent needs to perform with conditions) or `hierarchical` (tasks are assigned to the agent based on the role and expertise)

In [None]:
aws_agent_task = Task(
    description="""
        Conduct findings based on the user question about AWS. Fetch
        all the relevant information from data provided and provide a 
        comprehensive report.
    """,
    expected_output="""
        A detailed report based on the user question about AWS. 
        The report should include all the relevant information fetched from data provided.
    """,
    agent=aws_agent
)

development_task = Task(
    description="""
        Make sure to generate code for what the user is asking for. This
        code is in python and it is simple and executable. Only generate the code if the user
        is asking for a coding problem.
    """,
    expected_output="""
        A  python code that can be executed directly. The code should not
        contain any filler words and should only be the code that can be
        executed directly.
    """,
    agent=dev_agent,
    output_file=f"{DATA_DIR}/code.py"
)


In [None]:
print(f"Defined the AWS task: {aws_agent_task}")
print(f"Defined the development task: {development_task}")

In [None]:
from crewai import Agent, Crew, Process, Task

# define the manager model that is used to route request to the best agent
manager_llm = LLM(model=f"bedrock/{BEDROCK_CLAUDE_3_5_SONNET_V1_MODEL}", 
                    temperature=0.1,        # Higher for more creative outputs
                    timeout=120,           # Seconds to wait for response
                    max_tokens=256,       # Maximum length of response
                    top_p=0.9,            # Nucleus sampling parameter
                )

embedder_config = {
    "provider": "bedrock",
    "config": {
        "model": TITAN_TEXT_EMBED_V2,
    },
}

# Create and run the crew
crew = Crew(
    manager_llm=manager_llm,
    agents=[aws_agent, dev_agent],
    tasks=[aws_agent_task, development_task],
    verbose=True,
    process=Process.hierarchical, 
    # planning=True, # Assemble your crew with planning capabilities,
)

In [None]:
# Example usage
result = crew.kickoff(
    inputs={"user_question": "What are the main categories of AWS services?"}
)
print(result)