<a href="https://colab.research.google.com/github/sutoa/learn-crewai/blob/main/crewAI_agents-dependency_tree.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## This is to compare client ownership structure between the ADV document and crds using crewAI agents

In [38]:
!pip install crewai==0.28.8 crewai_tools==0.1.6 langchain_community==0.0.29 networkx==2.8.8



In [39]:
import os
# from utils import get_openai_api_key
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_MODEL_NAME"] = 'gpt-4o-mini'
os.environ["SERPER_API_KEY"] = userdata.get('SERPER_API_KEY')

In [40]:
import requests
from crewai_tools import BaseTool
from typing import Literal

class FileDownloaderTool(BaseTool):
    name: Literal["FileDownloaderTool"] = "FileDownloaderTool"  # Add name attribute with type annotation
    description: str = "Downloads a file from a URL and saves it to a local path."  # Add description attribute with type annotation

    def _run(self, file_url: str, save_path: str):
        """Downloads a file from the given URL and saves it locally."""
        try:
            response = requests.get(file_url, stream=True)
            response.raise_for_status()  # Check for errors
            with open(save_path, "wb") as file:
                for chunk in response.iter_content(chunk_size=8192):
                    file.write(chunk)
            return f"File downloaded successfully: {save_path}"
        except Exception as e:
            return f"Error downloading file: {str(e)}"

In [44]:
import networkx as nx
import matplotlib.pyplot as plt
from crewai import Task, Agent, Crew
from crewai_tools import SerperDevTool, PDFSearchTool

# Define agents
researcher = Agent(
    role="KYC Researcher",
    goal="Download the ADV form for {client} from government website advisorinfo.sec.gov",
    backstory="""As a KYC researcher supporting the onboarding process for client at a major European bank,
        Your responsibilities are to collect the most up-to-date information about the client, {client}, from internal and external data sources to be analyzed by KYC analyst on the case.
        For client ownership information, you need to download the ADV form for {client} to a local folder. The ADV form is usually available at a link like this
        https://reports.advisorinfo.sec.gov/reports/ADV/crdsID/PDF/crdsID.pdf. The crdsID is defined by sec.gov website. For instance, the download link for Goldman Sachs is
        https://reports.advisorinfo.sec.gov/reports/ADV/361/PDF/361.pdf because the crdsID is 361.
        So in order to download the ADV form, you need to do the following
        - find the crdsID for {client}
        - construct the URL and then download the file to a local folder.
    """,
    verbose=True,
    tools=[FileDownloaderTool(), SerperDevTool()]
    )

analyst = Agent(
     role="KYC Analyst",
     goal="Extract the ownership structure information from document or data collected by the research for client {client}",
     backstory="""As a KYC analyst for the bank, one of you jobs is to collect the list of direct owners for client {client} from its ADV document.
     The information can be found from the Schedule A section in the form. The result should meet the following criteria:
     - do NOT include entities from the list. Only include real individuals.
     - include the name, title and ownership code for each person. Note that for ownership code, only description is needed. For instance, if the code is 'E', then return '75% or more'
     """,
     verbose=True,
     tools=[PDFSearchTool()]
)
# reviewer = Agent(
#     role="KYC Reviewer",
#     goal="To ensure that differences in ownership structure data from various sources for the same client are reported accurately and thoroughly, listed in bullets with values that are different",
#     backstory="reviewer's job is to ensure the quality of the report by analyst, making sure they are accurate, complete and easily to read.")

# Define tasks
research_task = Task(description="Gather ownership data for {client}",
                     agent=researcher,
                     expected_output="The downloaded copy of the ADV file for client {client} on the local drive and its complete file path")
analyze_task = Task(description="Retrieve the list of direct owners for {client} from its ADV document",
                    agent=analyst,
                    expected_output="A json file with the list of direct owners of {client}")
# review_task = Task(description="Review delta report by analyst. if unsatisfied,  ask analyst to improve with detailed feedback. ", agent=reviewer, expected_output="feedback on the delta report")

# Create a task graph
# task_graph = nx.DiGraph()
# task_graph.add_edge(research_task, analyze_task)
# task_graph.add_edge(analyze_task, review_task)

# Visualize the task graph
# nx.draw(task_graph, with_labels=True, node_size=2000, node_color="lightblue", font_size=7)
# plt.show()

# Create crew
crew = Crew(
    agents=[researcher, analyst],
    tasks=[research_task, analyze_task],
    verbose=2
)

# Execute crew
inputs = {
    "client": "Longtail Alpha",
    "inquiry": "Please download the ADV form for the client"
}
result = crew.kickoff(inputs)
print(result)



[1m[95m [DEBUG]: == Working Agent: KYC Researcher[00m
[1m[95m [INFO]: == Starting Task: Gather ownership data for Longtail Alpha[00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mI need to find the crdsID for Longtail Alpha to construct the URL needed to download the ADV form. I'll start by searching the internet for the crdsID associated with Longtail Alpha.

Action: Search the internet  
Action Input: {"search_query": "Longtail Alpha crdsID site:sec.gov"}  [0m[95m 


Search results: 

[00m
[32;1m[1;3mThought: The search did not yield any results. I need to refine my search to find the crdsID for Longtail Alpha. I will broaden my search to include more general terms related to Longtail Alpha and its regulatory filings.

Action: Search the internet  
Action Input: {"search_query": "Longtail Alpha ADV form site:sec.gov"}  [0m[95m 


Search results: Title: LONGTAIL ALPHA, LLC - Investment Adviser Firm
Link: https://adviserinfo.sec.gov/firm/summary/282615
S

Inserting batches in chromadb: 100%|██████████| 3/3 [00:04<00:00,  1.35s/it]


[95m 

Relevant Content:

Partner, Manager, Trustee, or Director LONGTAIL ALPHA, LLC (b)If filing an umbrella registration , identify the filing adviser and/or relying adviser(s) that sponsor(s) or manage(s) this private fund . No Information Filed 4.The private fund (check all that apply; you must check at least one): (1) qualifies for the exclusion from the definition of investment company under section 3(c)(1) of the Investment Company Act of 1940 (2) qualifies for the exclusion from the definition of investment company under section 3(c)(7) of the Investment Company Act of 1940 5.List the name and country, in English, of each foreign financial regulatory authority with which the private fund is registered. No Information Filed Yes No 6.(a)Is this a "master fund" in a master-feeder arrangement?

ONETAIL HEDGEHOG FUND II LP (b)Private fund identification number: (include the "805-" prefix also) 805-3865333878 2.Under the laws of what state or country is the private fund organized: S

Inserting batches in chromadb: 100%|██████████| 3/3 [00:03<00:00,  1.12s/it]


[95m 

Relevant Content:
If you are submitting an initial application or report, you must complete Schedule A and Schedule B. Schedule A asks for information about your direct owners and executive officers. Schedule B asks for information about your indirect owners. If this is an amendment and you are updating information you reported on either Schedule A or Schedule B (or both) that you filed with your initial application or report, you must complete Schedule C. Yes No A. Does any person not named in Item 1.A. or Schedules A, B, or C, directly or indirectly, control your management or policies? If yes, complete Section 10.A. of Schedule D. B.If any person named in Schedules A, B, or C or in Section 10.A. of Schedule D is a public reporting company under Sections 12 or 15(d) of the Securities Exchange Act of 1934, please complete Section 10.B. of Schedule D . SECTION 10.A. Control Persons No Information Filed SECTION 10.B. Control Person Public Reporting Companies B. If any person nam