# AgenticRag Crew

Source: Lorenze Jay, **Practical Multi Agent RAG using CrewAI, Weaviate, Groq and ExaTool.** https://lorenzejay.dev/articles/practical-agentic-rag

Notebook:
https://github.com/lorenzejay/agentic-rag-practical-example

In [None]:
from IPython.display import Image
Image("figures/practical_agentic_rag.png", width=400, height=200)

**Goal:**
Given a query, find relevant docs to generate a report that includes businesses financial data as well as a graph visualizations as a completed report.

**Steps:**
- Fetching relevant docs based on a query.
- Competitor Research
- Generating a writen doc
- Generating graphs for data analysis

**Tech Stack:**
- CrewAI: The leading multi-agent platform
- Weaviate: The AI-native database for a new generation of software
- Groq: Fast AI Inference
- ExaSearchTool: The search engine for AI
- WebsiteSearchTool: Built in web rag tool from crewai_tools

#### document_rag_agent.yaml:

document_rag_agent:
  role: >
    Document RAG Agent
  goal: >
    Answer questions about the documents in the Weaviate database.
    The question is {query}
  backstory: >
    You are a document retrieval agent that can answer questions about the documents in the Weaviate database.
    Documents are internal documents regarding the company
    You have tools that allow you to search the information in the Weaviate database.

web_agent:
  role: >
    Web Agent
  goal: >
    Answer questions using the web like the EXASearchTool
    The question is {query}
  backstory: >
    You're a web search agent that can answer questions using the web.
    your ability to turn complex data into clear and concise reports, making
    it easy for others to understand and act on the information you provide.
    You have tools that allow you to search the information on the web.

code_execution_agent:
  role: >
    Code Execution Agent for data visualization
  goal: >
    You are a senior python developer that can execute code to generate the output
    Most of your tasks will be to generate python code to visualze data passed to you.  
    Execute the code and return the output.
    The output file should be valid python code only.
  backstory: >
    You are a senior python developer that can execute code to generate the output
    You have tools that allow you to execute python code.
    Execute the code and return the output.
  allow_code_execution: true


#### code_execution_agent:

code_execution_agent:
  role: >
    Code Execution Agent for data visualization
  goal: >
    You are a senior python developer that can execute code to generate the output
    Most of your tasks will be to generate python code to visualze data passed to you.  
    Execute the code and return the output.
    The output file should be valid python code only.
  backstory: >
    You are a senior python developer that can execute code to generate the output
    You have tools that allow you to execute python code.
    Execute the code and return the output.
  allow_code_execution: true


#### tasks.yaml:

fetch_tax_docs_task:
  description: >
    Find the relevant tax documents according to the question: {query}
    You can use the WeaviateTool to find the relevant documents.
    You need to provide the query and generate an appropriate question for the WeaviateTool.
  expected_output: >
    The relevant tax documents according to the question and the query.

answer_question_task:
  description: >
    Find our competitors and their financial data.
    Use the WebsiteSearchTool and EXASearchTool to find the relevant information.
  expected_output: >
    The answer to the question.
    The answer should be in markdown format

business_trends_task:
  description: >
    Generate a report on the latest trends we found in our business. Take our tax data and compare them from year to year: [2020, 2021, 2022, 2023].
    You might want to use WeaviateTool to find the relevant tax documents to generate the trends.
  expected_output: >
    The latest business trends data from all business years.
    The report should describe the trends in the data.
  output_file: "outputs/business_trends.md"

graph_visualization_task:
  description: >
    Generate a graph based on the data generating trends from the business_trends_task.
    Use matplotlib to generate the graphs.
    You can use the code execution agent to execute the code and generate the output.
    The output file should be a python file and not markdown.
  expected_output: >
    The graph as a png file.
    The output file should be a python file and not markdown.
  output_file: "outputs/visualize.ipynb"


#### main.py:

In [3]:
def run():
    """
    Run the crew.
    """
    inputs = {"query": "What was the year with the highest total expenses?"}
    result = AgenticRagCrew().crew().kickoff(inputs=inputs)

    if isinstance(result, str) and result.startswith("```python"):
        code = result[9:].strip()
        if code.endswith("```"):
            code = code[:-3].strip()

        with open("outputs/visualize.ipynb", "w") as f:
            f.write(code)

In [None]:
import json
import weaviate

from crewai_tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type, Optional, Any
from weaviate.classes.config import Configure

class WeaviateToolSchema(BaseModel):
    """Input for WeaviateTool."""

    query: str = Field(
        ...,
        description="The query to search retrieve relevant information from the Weaviate database. Pass only the query, not the question.",
    )

class WeaviateTool(BaseTool):
    """Tool to search the Weaviate database"""

    name: str = "WeaviateTool"
    description: str = "A tool to search the Weaviate database for relevant information on internal documents"
    args_schema: Type[BaseModel] = WeaviateToolSchema
    query: Optional[str] = None

    def _run(self, query: str) -> str:
        """Search the Weaviate database

        Args:
            query (str): The query to search retrieve relevant information from the Weaviate database. Pass only the query as a string, not the question.

        Returns:
            str: The result of the search query
        """
        client = weaviate.connect_to_local()
        internal_docs = client.collections.get("tax_docs")

        if not internal_docs:
            internal_docs = client.collections.create(
                name="tax_docs",
                vectorizer_config=Configure.Vectorizer.text2vec_ollama(  # Configure the Ollama embedding integration
                    api_endpoint="http://host.docker.internal:11434",  # Allow Weaviate from within a Docker container to contact your Ollama instance
                    model="nomic-embed-text",  # The model to use
                ),
                generative_config=Configure.Generative.ollama(
                    model="llama3.2:1b",
                    api_endpoint="http://host.docker.internal:11434",
                ),
            )

        response = internal_docs.query.near_text(
            query=query,
            limit=3,
        )
        json_response = ""
        for obj in response.objects:
            json_response += json.dumps(obj.properties, indent=2)
    
        client.close()
        return json_response


#### docker-compose.yml:

services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.27.2
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_API_BASED_MODULES: 'true'
      ENABLE_MODULES: 'text2vec-ollama,generative-ollama'
      CLUSTER_HOSTNAME: 'node1'
volumes:
  weaviate_data:


In [None]:
import os
import weaviate
import json
from weaviate.classes.config import Configure

client = weaviate.connect_to_local()

print(client.is_ready())

internal_docs = client.collections.get("tax_docs")
if not internal_docs:
    internal_docs = client.collections.create(
        name="tax_docs",
        vectorizer_config=Configure.Vectorizer.text2vec_ollama(  # Configure the Ollama embedding integration
            api_endpoint="http://host.docker.internal:11434",  # Allow Weaviate from within a Docker container to contact your Ollama instance
            model="nomic-embed-text",  # The model to use
        ),
        generative_config=Configure.Generative.ollama(
            model="llama3.2:1b",
            api_endpoint="http://host.docker.internal:11434",
        ),
    )

docs_dir = os.path.join(
    os.path.dirname(os.path.dirname(os.path.dirname(__file__))), "internal_docs"
)
markdown_files = [f for f in os.listdir(docs_dir) if f.endswith(".md")]

with internal_docs.batch.dynamic() as batch:
    for filename in markdown_files:
        with open(os.path.join(docs_dir, filename), "r") as f:
            content = f.read()
            batch.add_object(
                {
                    "content": content,
                    "class_name": filename.split(".")[0],
                }
            )
            print([object Object], [object Object], [object Object])

#### Text document:

[object Object], Year: 2020

[object Object], Income Statement

Gross Sales: $150,000
Returns & Allowances: $3,000
Net Sales: $147,000

[object Object], Cost of Goods Sold

Beginning Inventory: $10,000
Purchases: $60,000
Ending Inventory: $15,000
COGS: $55,000

[object Object], Expenses

Rent: $30,000
Utilities: $6,000
Salaries & Wages: $50,000
Marketing & Advertising: $4,000
Miscellaneous (Supplies, Insurance): $5,000
Total Expenses: $95,000

[object Object], Net Income (Loss)

Gross Profit: $92,000
Net Income (before taxes): $(3,000) (Loss)

[object Object], Balance Sheet (End of Year)

[object Object], Assets

Cash: $5,000
Inventory: $15,000
Equipment: $25,000

[object Object], Liabilities

Accounts Payable: $8,000

[object Object], Owner's Equity

Owner's Equity: $37,000


#### crew.py:

In [None]:
import os
from crewai import Agent, Crew, Process, Task, LLM
from crewai.project import CrewBase, agent, crew, task
from agentic_rag.tools.weaviate_tool import WeaviateTool
from crewai_tools import EXASearchTool

from dotenv import load_dotenv
load_dotenv()

@CrewBase
class AgenticRagCrew:
    """AgenticRag crew"""

    llm = LLM(model="groq/llama-3.1-70b-versatile", api_key=os.getenv("GROQ_API_KEY"))

    @agent
    def document_rag_agent(self) -> Agent:
        return Agent(
            config=self.agents_config["document_rag_agent"],
            tools=[WeaviateTool()],
            verbose=True,
            llm=self.llm,
        )
    @agent
    def web_agent(self) -> Agent:
        return Agent(
            config=self.agents_config["web_agent"],
            tools=[EXASearchTool()],
            verbose=True,
            llm=self.llm,
        )
    @agent
    def code_execution_agent(self) -> Agent:
        return Agent(
            config=self.agents_config["code_execution_agent"],
            verbose=True,
            llm=self.llm,
        )
    @task
    def fetch_tax_docs_task(self) -> Task:
        return Task(
            config=self.tasks_config["fetch_tax_docs_task"],
        )
    @task
    def answer_question_task(self) -> Task:
        return Task(
            config=self.tasks_config["answer_question_task"], output_file="report.md"
        )
    @task
    def business_trends_task(self) -> Task:
        return Task(config=self.tasks_config["business_trends_task"])
    @task
    def graph_visualization_task(self) -> Task:
        return Task(config=self.tasks_config["graph_visualization_task"])
    @crew
    def crew(self) -> Crew:
        """Creates the AgenticRag crew"""
        return Crew(
            agents=self.agents,  # Automatically created by the @agent decorator
            tasks=self.tasks,  # Automatically created by the @task decorator
            process=Process.hierarchical,
            verbose=True,
            manager_llm="openai/gpt-4o",
        )
