<a href="https://colab.research.google.com/github/sutoa/learn-crewai/blob/main/crewAI_agents-dependency_tree.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## This is to compare client ownership structure between the ADV document and crds using crewAI agents

In [10]:
!pip install crewai==0.28.8 crewai_tools==0.1.6 langchain_community==0.0.29 chromadb==0.4.24 "numpy<2.0" networkx==2.8.8



In [3]:
import os
# from utils import get_openai_api_key
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_MODEL_NAME"] = 'gpt-4o-mini'
os.environ["SERPER_API_KEY"] = userdata.get('SERPER_API_KEY')

In [13]:
from crewai_tools import tool
import requests
import os

@tool("DownloadFileTool")
def download_file(url: str, destination_folder: str = ".") -> str:
    """
    Downloads a file from the given URL and saves it locally.

    Args:
        url (str): The URL of the file to download.
        destination_folder (str, optional): Where to save the file. Defaults to current directory.

    Returns:
        str: Full path to the downloaded file or an error message.
    """
    try:
        os.makedirs(destination_folder, exist_ok=True)

        # Extract base filename from URL
        original_filename = url.split("/")[-1]
        base_name, ext = os.path.splitext(original_filename)
        destination_path = os.path.join(destination_folder, original_filename)

        # Handle filename conflict by appending a number
        counter = 1
        while os.path.exists(destination_path):
            destination_path = os.path.join(destination_folder, f"{base_name}_{counter}{ext}")
            counter += 1

        # Download the file
        response = requests.get(url)
        response.raise_for_status()
        with open(destination_path, 'wb') as f:
            f.write(response.content)

        return f"File downloaded successfully: {destination_path}"
    except Exception as e:
        return f"Failed to download file from {url}. Error: {str(e)}"


In [4]:
import requests
from crewai_tools import BaseTool
from typing import Literal

class FileDownloaderTool(BaseTool):
    name: Literal["FileDownloaderTool"] = "FileDownloaderTool"  # Add name attribute with type annotation
    description: str = "Downloads a file from a URL and saves it to a local path."  # Add description attribute with type annotation

    def _run(self, file_url: str, save_path: str):
        """Downloads a file from the given URL and saves it locally."""
        try:
            response = requests.get(file_url, stream=True)
            response.raise_for_status()  # Check for errors
            with open(save_path, "wb") as file:
                for chunk in response.iter_content(chunk_size=8192):
                    file.write(chunk)
            return f"File downloaded successfully: {save_path}"
        except Exception as e:
            return f"Error downloading file: {str(e)}"

In [18]:
import networkx as nx
import matplotlib.pyplot as plt
from crewai import Task, Agent, Crew
from crewai_tools import SerperDevTool, PDFSearchTool

# Define agents
researcher = Agent(
    role="KYC Researcher",
    goal="Download the ADV form for {client} from government website advisorinfo.sec.gov",
    backstory="""As a KYC researcher supporting the onboarding process for client at a major European bank,
        Your responsibilities are to collect the most up-to-date information about the client, {client}, from internal and external data sources to be analyzed by KYC analyst on the case.
        For client ownership information, you need to download the ADV form for {client} to a local folder. The ADV form is usually available at a link like this
        https://reports.advisorinfo.sec.gov/reports/ADV/crdsID/PDF/crdsID.pdf. The crdsID is defined by sec.gov website. For instance, the download link for Goldman Sachs is
        https://reports.advisorinfo.sec.gov/reports/ADV/361/PDF/361.pdf because the crdsID is 361.
        So in order to download the ADV form, you need to do the following
        - find the crdsID for {client}
        - construct the URL and then download the file to a local folder.
    """,
    verbose=True,
    allow_delegation=False,
    tools=[download_file, SerperDevTool()]
    )

analyst = Agent(
     role="KYC Analyst",
     goal="Extract the ownership structure information from document or data collected by the research for client {client}",
     backstory="""As a KYC analyst for the bank, one of you jobs is to collect the list of DIRECT owners for client {client} from its ADV document.
     The information can be found from the Schedule A section in the form. The result should meet the following criteria:
     - do NOT include entities from the list. Only include real individuals.
     - include the name, title and ownership for each person.
     - Note that for ownership information for each person, pls return the description instead of the ownership code. The description for each code can be found also in 'Schedule A'. For instance, for ownership code 'B', the description is '10% but less than 25%'. For 'C', the description is  '25% but less than 50%'. Do NOT guess or give your own ownership description.
     """,
     verbose=True,
     allow_delegation=False,
     tools=[PDFSearchTool()]
)
# reviewer = Agent(
#     role="KYC Reviewer",
#     goal="To ensure that differences in ownership structure data from various sources for the same client are reported accurately and thoroughly, listed in bullets with values that are different",
#     backstory="reviewer's job is to ensure the quality of the report by analyst, making sure they are accurate, complete and easily to read.")

# Define tasks
research_task = Task(description="Gather ownership data for {client}",
                     agent=researcher,
                     expected_output="The downloaded copy of the ADV file for client {client} on the local drive and its complete file path")
analyze_task = Task(description="Retrieve the list of direct owners for {client} from its ADV document",
                    agent=analyst,
                    expected_output="A json file with the list of direct owners of {client}")
# review_task = Task(description="Review delta report by analyst. if unsatisfied,  ask analyst to improve with detailed feedback. ", agent=reviewer, expected_output="feedback on the delta report")

# Create a task graph
# task_graph = nx.DiGraph()
# task_graph.add_edge(research_task, analyze_task)
# task_graph.add_edge(analyze_task, review_task)

# Visualize the task graph
# nx.draw(task_graph, with_labels=True, node_size=2000, node_color="lightblue", font_size=7)
# plt.show()

# Create crew
crew = Crew(
    agents=[researcher, analyst],
    tasks=[research_task, analyze_task],
    verbose=2
)

# Execute crew
inputs = {
    "client": "Longtail Alpha",
    "inquiry": "Please download the ADV form for the client"
}
result = crew.kickoff(inputs)
print(result)



[1m[95m [DEBUG]: == Working Agent: KYC Researcher[00m
[1m[95m [INFO]: == Starting Task: Gather ownership data for Longtail Alpha[00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mI need to find the crdsID for Longtail Alpha in order to construct the URL for the ADV form. I will start by searching for the crdsID using the company name.

Action: Search the internet  
Action Input: {"search_query": "Longtail Alpha crdsID site:sec.gov"}  [0m



[95m 


Search results: 

[00m
[32;1m[1;3mThought: The search did not return the crdsID for Longtail Alpha. I will refine my search to directly look for the ADV form or any relevant ownership information for the company. 

Action: Search the internet  
Action Input: {"search_query": "Longtail Alpha ADV form site:sec.gov"}  [0m



[95m 


Search results: Title: LONGTAIL ALPHA, LLC - Investment Adviser Firm
Link: https://adviserinfo.sec.gov/firm/summary/282615
Snippet: An ERA is required to file a report using Form ADV, but does not complete all items contained in Form ADV that a registered adviser must complete. Other ...
---
Title: FORM ADV
Link: https://reports.adviserinfo.sec.gov/reports/ADV/282615/PDF/282615.pdf
Snippet: A. LONGTAIL ALPHA, LLC. List on Section 1.B. of Schedule D any additional names under which you conduct your advisory business.
---
Title: Mapping Information
Link: https://www.sec.gov/foia/docs/adv/part2/adv_brochures_2020_may-map.csv
Snippet: ... FORM ADV PART 2A (MAY 29 2020)",335123,1,"05/29/2020 ... FORM 2A",59243,20,"05/29/2020","158381_59243_20_20200529.pdf" "BRIGHT ROCK CAPITAL MANAGEMENT ...
---
Title: Amendment No. 132
Link: https://www.sec.gov/Archives/edgar/data/1810747/000182912624008450/simplifyetf_485a.htm
Snippet: The Fund is designed investors with debt portfolios who may w



[95m 

File downloaded successfully: ./282615_3.pdf
[00m




[32;1m[1;3mThought: I have successfully downloaded the ADV form for Longtail Alpha. The crdsID was found to be 282615, and I constructed the download URL accordingly. Now, I will provide the final answer including the complete file path of the downloaded ADV form.

Final Answer: The downloaded ADV file for client Longtail Alpha is located at: ./282615_3.pdf[0m

[1m> Finished chain.[0m
[1m[92m [DEBUG]: == [KYC Researcher] Task output: The downloaded ADV file for client Longtail Alpha is located at: ./282615_3.pdf

[00m
[1m[95m [DEBUG]: == Working Agent: KYC Analyst[00m
[1m[95m [INFO]: == Starting Task: Retrieve the list of direct owners for Longtail Alpha from its ADV document[00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mI need to extract the list of direct owners for Longtail Alpha from its ADV document, specifically from the Schedule A section. I will start by searching the PDF for relevant content.

Action: Search a PDF's content  
Action Input: 

Inserting batches in chromadb: 100%|██████████| 3/3 [00:04<00:00,  1.44s/it]


[95m 

Relevant Content:
If you are submitting an initial application or report, you must complete Schedule A and Schedule B. Schedule A asks for information about your direct owners and executive officers. Schedule B asks for information about your indirect owners. If this is an amendment and you are updating information you reported on either Schedule A or Schedule B (or both) that you filed with your initial application or report, you must complete Schedule C. Yes No A. Does any person not named in Item 1.A. or Schedules A, B, or C, directly or indirectly, control your management or policies? If yes, complete Section 10.A. of Schedule D. B.If any person named in Schedules A, B, or C or in Section 10.A. of Schedule D is a public reporting company under Sections 12 or 15(d) of the Securities Exchange Act of 1934, please complete Section 10.B. of Schedule D . SECTION 10.A. Control Persons No Information Filed SECTION 10.B. Control Person Public Reporting Companies B. If any person nam



[95m 

Relevant Content:
2.Indirect Owners. With respect to each owner listed on Schedule A (except individual owners), list below: (a)in the case of an owner that is a corporation, each of its shareholders that beneficially owns, has the right to vote, or has the power to sell or direct the sale of, 25% or more of a class of a voting security of that corporation; For purposes of this Schedule, a person beneficially owns any securities: (i) owned by his/her child, stepchild, grandchild, parent, stepparent, grandparent, spouse, sibling, mother-in-law, father-in-law, son-in-law, daughter-in-law, brother-in-law, or sister-in-law, sharing the same residence; or (ii) that he/she has the right to acquire, within 60 days, through the exercise of any option, warrant, or right to purchase the security. (b)in the case of an owner that is a partnership, all general partners and those limited and special partners that have the right to receive upon dissolution, or have contributed, 25% or more of



[95m 

Relevant Content:
MM/YYYYOwnership CodeControl PersonPRCRD No. If None: S.S. No. and Date of Birth, IRS Tax No. or Employer ID No. BHANSALI, VINEER, NMN I MANAGING MEMBER, CHIEF INVESTMENT OFFICER12/2015 E Y N3027768 JANUS HENDERSON INVESTORS US LLCDE MEMBER 06/2016 B N N104652 Chang, Linda I TRADE EXECUTION LEADER AND DIRECTOR OF BUSINESS INFRASTRUCTURE11/2021 NA N N7523025 PHANSALKAR, MOHAN, VIJAY I CHIEF COMPLIANCE OFFICER & CHIEF LEGAL OFFICER08/2022 NA Y N4453079 Schedule B Indirect Owners 1.Complete Schedule B only if you are submitting an initial application or report. Schedule B asks for information about your indirect owners; you must first complete Schedule A, which asks for information about your direct owners. Use Schedule C to amend this information. 2.Indirect Owners. With respect to each owner listed on Schedule A (except individual owners), list below:

MM/YYYYOwnership CodeControl PersonPRCRD No. If None: S.S. No. and Date of Birth, IRS Tax No. or Employer ID N



[95m 

Relevant Content:
required if you are registered or applying for registration and cannot be more than one individual), director, and any other individuals with similar status or functions; (b)if you are organized as a corporation, each shareholder that is a direct owner of 5% or more of a class of your voting securities, unless you are a public reporting company (a company subject to Section 12 or 15(d) of the Exchange Act); Direct owners include any person that owns, beneficially owns, has the right to vote, or has the power to sell or direct the sale of, 5% or more of a class of your voting securities. For purposes of this Schedule, a person beneficially owns any securities: (i) owned by his/her child, stepchild, grandchild, parent, stepparent, grandparent, spouse, sibling, mother-in-law, father-in-law, son-in-law, daughter-in-law, brother-in-law, or sister-in- law, sharing the same residence; or (ii) that he/she has the right to acquire, within 60 days, through the exercise 



[95m 

Relevant Content:
2.Indirect Owners. With respect to each owner listed on Schedule A (except individual owners), list below: (a)in the case of an owner that is a corporation, each of its shareholders that beneficially owns, has the right to vote, or has the power to sell or direct the sale of, 25% or more of a class of a voting security of that corporation; For purposes of this Schedule, a person beneficially owns any securities: (i) owned by his/her child, stepchild, grandchild, parent, stepparent, grandparent, spouse, sibling, mother-in-law, father-in-law, son-in-law, daughter-in-law, brother-in-law, or sister-in-law, sharing the same residence; or (ii) that he/she has the right to acquire, within 60 days, through the exercise of any option, warrant, or right to purchase the security. (b)in the case of an owner that is a partnership, all general partners and those limited and special partners that have the right to receive upon dissolution, or have contributed, 25% or more of



[32;1m[1;3mThought: The previous searches have not yielded specific details about the direct owners listed in Schedule A. I will perform another search focused on the specific owner names and their titles as well as ownership descriptions.

Action: Search a PDF's content  
Action Input: {"query": "Schedule A", "pdf": "./282615_3.pdf"}  
[0m[95m 

Relevant Content:
If you are submitting an initial application or report, you must complete Schedule A and Schedule B. Schedule A asks for information about your direct owners and executive officers. Schedule B asks for information about your indirect owners. If this is an amendment and you are updating information you reported on either Schedule A or Schedule B (or both) that you filed with your initial application or report, you must complete Schedule C. Yes No A. Does any person not named in Item 1.A. or Schedules A, B, or C, directly or indirectly, control your management or policies? If yes, complete Section 10.A. of Schedule D. B.If 



[95m 

Relevant Content:
2.Indirect Owners. With respect to each owner listed on Schedule A (except individual owners), list below: (a)in the case of an owner that is a corporation, each of its shareholders that beneficially owns, has the right to vote, or has the power to sell or direct the sale of, 25% or more of a class of a voting security of that corporation; For purposes of this Schedule, a person beneficially owns any securities: (i) owned by his/her child, stepchild, grandchild, parent, stepparent, grandparent, spouse, sibling, mother-in-law, father-in-law, son-in-law, daughter-in-law, brother-in-law, or sister-in-law, sharing the same residence; or (ii) that he/she has the right to acquire, within 60 days, through the exercise of any option, warrant, or right to purchase the security. (b)in the case of an owner that is a partnership, all general partners and those limited and special partners that have the right to receive upon dissolution, or have contributed, 25% or more of



[95m 

Relevant Content:
MM/YYYYOwnership CodeControl PersonPRCRD No. If None: S.S. No. and Date of Birth, IRS Tax No. or Employer ID No. BHANSALI, VINEER, NMN I MANAGING MEMBER, CHIEF INVESTMENT OFFICER12/2015 E Y N3027768 JANUS HENDERSON INVESTORS US LLCDE MEMBER 06/2016 B N N104652 Chang, Linda I TRADE EXECUTION LEADER AND DIRECTOR OF BUSINESS INFRASTRUCTURE11/2021 NA N N7523025 PHANSALKAR, MOHAN, VIJAY I CHIEF COMPLIANCE OFFICER & CHIEF LEGAL OFFICER08/2022 NA Y N4453079 Schedule B Indirect Owners 1.Complete Schedule B only if you are submitting an initial application or report. Schedule B asks for information about your indirect owners; you must first complete Schedule A, which asks for information about your direct owners. Use Schedule C to amend this information. 2.Indirect Owners. With respect to each owner listed on Schedule A (except individual owners), list below:

MM/YYYYOwnership CodeControl PersonPRCRD No. If None: S.S. No. and Date of Birth, IRS Tax No. or Employer ID N



[95m 

Relevant Content:
the Title or Status column by entering board/management titles; status as partner, trustee, sole proprietor, elected manager, shareholder, or member; and for shareholders or members, the class of securities owned (if more than one is issued). 6.Ownership codes are: NA - less than 5% B - 10% but less than 25% D - 50% but less than 75% A - 5% but less than 10% C - 25% but less than 50% E - 75% or more 7.(a)In the Control Person column, enter "Yes" if the person has control as defined in the Glossary of Terms to Form ADV, and enter "No" if the person does not have control . Note that under this definition, most executive officers and all 25% owners, general partners, elected managers, and trustees are control persons . (b)In the PR column, enter "PR" if the owner is a public reporting company under Sections 12 or 15(d) of the Exchange Act. (c)Complete each column. FULL LEGAL NAME (Individuals: Last Name, First Name, Middle Name)DE/FE/I Title or Status Date Title 