<a href="https://colab.research.google.com/github/sutoa/learn-crewai/blob/main/crewAI_agents-dependency_tree.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## This is to compare client ownership structure between the ADV document and crds using crewAI agents

In [1]:
!pip install crewai==0.28.8 crewai_tools==0.1.6 langchain_community==0.0.29 networkx==2.8.8

Collecting crewai==0.28.8
  Downloading crewai-0.28.8-py3-none-any.whl.metadata (13 kB)
Collecting crewai_tools==0.1.6
  Downloading crewai_tools-0.1.6-py3-none-any.whl.metadata (4.6 kB)
Collecting langchain_community==0.0.29
  Downloading langchain_community-0.0.29-py3-none-any.whl.metadata (8.3 kB)
Collecting networkx==2.8.8
  Downloading networkx-2.8.8-py3-none-any.whl.metadata (5.1 kB)
Collecting appdirs<2.0.0,>=1.4.4 (from crewai==0.28.8)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting embedchain<0.2.0,>=0.1.98 (from crewai==0.28.8)
  Downloading embedchain-0.1.127-py3-none-any.whl.metadata (9.3 kB)
Collecting instructor<0.6.0,>=0.5.2 (from crewai==0.28.8)
  Downloading instructor-0.5.2-py3-none-any.whl.metadata (10 kB)
Collecting langchain<0.2.0,>=0.1.10 (from crewai==0.28.8)
  Downloading langchain-0.1.20-py3-none-any.whl.metadata (13 kB)
Collecting opentelemetry-exporter-otlp-proto-http<2.0.0,>=1.22.0 (from crewai==0.28.8)
  Downloading opentelemet

In [25]:
import os
# from utils import get_openai_api_key
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_MODEL_NAME"] = 'gpt-4o-mini'
os.environ["SERPER_API_KEY"] = userdata.get('SERPER_API_KEY')

In [22]:
import requests
from crewai_tools import BaseTool
from typing import Literal

class FileDownloaderTool(BaseTool):
    name: Literal["FileDownloaderTool"] = "FileDownloaderTool"  # Add name attribute with type annotation
    description: str = "Downloads a file from a URL and saves it to a local path."  # Add description attribute with type annotation

    def _run(self, file_url: str, save_path: str):
        """Downloads a file from the given URL and saves it locally."""
        try:
            response = requests.get(file_url, stream=True)
            response.raise_for_status()  # Check for errors
            with open(save_path, "wb") as file:
                for chunk in response.iter_content(chunk_size=8192):
                    file.write(chunk)
            return f"File downloaded successfully: {save_path}"
        except Exception as e:
            return f"Error downloading file: {str(e)}"

In [29]:
import networkx as nx
import matplotlib.pyplot as plt
from crewai import Task, Agent, Crew
from crewai_tools import SerperDevTool

# Define agents
researcher = Agent(
    role="KYC Researcher",
    goal="Download the ADV form for {client} from government website advisorinfo.sec.gov",
    backstory="""As a KYC researcher supporting the onboarding process for client at a major European bank,
        Your responsibilities are to collect the most up-to-date information about the client, {client}, from internal and external data sources to be analyzed by KYC analyst on the case.
        For client ownership information, you need to download the ADV form for {client} to a local folder. The ADV form is usually available at a link like this
        https://reports.advisorinfo.sec.gov/reports/ADV/crdsID/PDF/crdsID.pdf. The crdsID is defined by sec.gov website. For instance, the download link for Goldman Sachs is
        https://reports.advisorinfo.sec.gov/reports/ADV/361/PDF/361.pdf because the crdsID is 361.
        So in order to download the ADV form, you need to do the following
        - find the crdsID for {client}
        - construct the URL and then download the file to a local folder.
    """,
    verbose=True,
    tools=[FileDownloaderTool(), SerperDevTool()]
    )
# analyst = Agent(
#     role="KYC Analyst",
#     goal="Highlight the differences between the KYC ownership structure data collected from different sources for client {client}",
#     backstory="Ownership structure data in cRDS could be out-of-sync with the data from public source such as ADV. It's critical for the analyst the list all the material differences acurately and completely",)
# reviewer = Agent(
#     role="KYC Reviewer",
#     goal="To ensure that differences in ownership structure data from various sources for the same client are reported accurately and thoroughly, listed in bullets with values that are different",
#     backstory="reviewer's job is to ensure the quality of the report by analyst, making sure they are accurate, complete and easily to read.")

# Define tasks
research_task = Task(description="Gather ownership data for {client}",
                     agent=researcher,
                     expected_output="The downloaded copy of the ADV file for client {client} on the local drive and its complete file path")
# analyze_task = Task(description="Report delta in ownership data for {client} gathered by researcher", agent=analyst, expected_output="A report highlighting the differences in ownership data in bullet points")
# review_task = Task(description="Review delta report by analyst. if unsatisfied,  ask analyst to improve with detailed feedback. ", agent=reviewer, expected_output="feedback on the delta report")

# Create a task graph
# task_graph = nx.DiGraph()
# task_graph.add_edge(research_task, analyze_task)
# task_graph.add_edge(analyze_task, review_task)

# Visualize the task graph
# nx.draw(task_graph, with_labels=True, node_size=2000, node_color="lightblue", font_size=7)
# plt.show()

# Create crew
crew = Crew(
    agents=[researcher, ],
    tasks=[research_task, ],
    verbose=2
)

# Execute crew
inputs = {
    "client": "Longtail Alpha",
    "inquiry": "Please download the ADV form for the client"
}
result = crew.kickoff(inputs)
print(result)



[1m[95m [DEBUG]: == Working Agent: KYC Researcher[00m
[1m[95m [INFO]: == Starting Task: Gather ownership data for Longtail Alpha[00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mI need to find the crdsID for Longtail Alpha in order to construct the URL to download the ADV form. 

Action: Search the internet  
Action Input: {"search_query": "Longtail Alpha crdsID site:sec.gov"}  [0m[95m 


Search results: 

[00m
[32;1m[1;3mThought: The search did not yield the crdsID directly, so I need to refine my search to look for more specific information about Longtail Alpha, potentially on the SEC's advisor info or related websites.

Action: Search the internet  
Action Input: {"search_query": "Longtail Alpha site:advisorinfo.sec.gov"}  [0m[95m 


Search results: 

[00m
[32;1m[1;3mThought: The search still did not yield the crdsID directly. I need to try a broader search to see if I can find any references or documents related to Longtail Alpha that might inclu