# Company Profile Risk Assessment AI Solution
This Jupyter notebook script outlines an AI solution designed for assessing company profile risks using data from SEC filings and online search tools. The solution aims to automate the risk assessment process by analyzing financial documents and online information to provide a comprehensive risk profile for companies.

## Environment Prepration

In [1]:
import os
from dotenv import load_dotenv
from langchain.chat_models import AzureChatOpenAI
load_dotenv()

True

In [14]:
# Setup LLM with Azure OpenAI and langchain
openai_model = os.getenv("AZURE_OPENAI_MODEL")
openai_api_key = os.getenv("AZURE_OPENAI_API_KEY")
openai_api_version = os.getenv("OPENAI_API_VERSION")
openai_api_base = os.getenv("AZURE_OPENAI_API_BASE")
openai_api_type = os.getenv("OPENAI_API_TYPE")
openai_api_endpoint = os.getenv("AZURE_OPENAI_API_ENDPOINT")
deployment_name = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL")

llm = AzureChatOpenAI(model="gpt-35-turbo",
                    azure_deployment=openai_model,
                    base_url = openai_api_base,
                    azure_endpoint=openai_api_endpoint,
                    api_version=openai_api_version,
                    api_key=openai_api_key,
                    openai_api_type=openai_api_type,
                    temperature=0,
                    )

## Langchain Tool 1: Bing Search

In [15]:
from langchain.utilities import BingSearchAPIWrapper
from langchain.tools.bing_search.tool import BingSearchRun
from langchain.agents import AgentType, initialize_agent, Tool

os.environ['BING_SUBSCRIPTION_KEY']=os.getenv("BING_SUBSCRIPTION_KEY")
os.environ["BING_SEARCH_URL"] = "https://api.bing.microsoft.com/v7.0/search"

searchWrapper = BingSearchAPIWrapper()
tool_bingSearch = BingSearchRun(api_wrapper=searchWrapper)

# Example query to test Bing search API call
# query = "What is the capital of France?"
# result = tool_bingSearch(query)

# print(result)

## Langchain Tool 2: Retrieval Augmented Generation (RAG)

### Start with SEC files - text embedding

Ideally, document embedding could be done best in Azure Cognitive Search Vector Search Indexing, providing data source as function calling to the Azure OpenAI model. For the PoC purpose here, local files are used in combination with the langchain pdf loaders and openai embedding model.

In [4]:
# load PDF files from local computer 
from langchain.document_loaders import PyPDFDirectoryLoader
filepath = ('./data_SEC')
loader = PyPDFDirectoryLoader(filepath)

pages = loader.load()
# document = [page.page_content for page in pages]

### Check total number of tokens for embedding

In [25]:
import tiktoken 

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

document = ' '.join([page.page_content for page in pages])
n_token = num_tokens_from_string(document, "cl100k_base")
print ('Total number of pages: ' + str(len(pages)))
print ('Total number of tokens: ' + str(n_token))
print ('Total cost of embeddings with Ada2 model: USD$' + str(n_token / 1000 * 0.0001))

Total number of pages: 249
Total number of tokens: 213344
Total cost of embeddings with Ada2 model: USD$0.0213344


### Document embeddings

In [16]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
embedding_model = "text-embedding-ada-002"

# Initialize the embedding model, Azure currently support only 16 batches per request call. Increase your request rate limit if needed.
embeddings = OpenAIEmbeddings(deployment=deployment_name,
                              model=embedding_model,
                              openai_api_version=openai_api_version,
                              openai_api_base=openai_api_base,
                              openai_api_type=openai_api_type,
                              openai_api_key=openai_api_key,
                              chunk_size=16)

In [27]:
# Use FAISS to store embedding vectors and save to local index, 
# Run this only if the documents has not been embedded.
"""
db = FAISS.from_documents(pages, embeddings)
db.save_local("faiss_index_SEC")
"""

### Load the saved local embeddings to vectorstore and define the RAG as a tool

In [17]:

vectorstore = FAISS.load_local("faiss_index_SEC", embeddings)

from langchain.tools.vectorstore.tool import VectorStoreQAWithSourcesTool

# the tool takes default of 4 chunks (pages in this case) for retrivaled contents
tool_RAG = VectorStoreQAWithSourcesTool(
    name = "VectorQA",
    # description = "useful for when you need to answer questions from local or own knowledge base",
    description="",
    vectorstore=vectorstore, 
    llm=llm,
    verbose=True
    )

# Test...
question = "What is the company PLTR revenue in the last year?"
answer = tool_RAG(question)
print(answer)

[32;1m[1;3m{"answer": "The company PLTR revenue in the last year was $2.1 billion.\n", "sources": "data_SEC\\PLTR-8K.pdf, data_SEC\\PLTR-10K.pdf"}[0m{"answer": "The company PLTR revenue in the last year was $2.1 billion.\n", "sources": "data_SEC\\PLTR-8K.pdf, data_SEC\\PLTR-10K.pdf"}


## Langchain Tool 3: The general LLMChain model to handle questions without additiona sources

In [18]:
# from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from typing import Optional, Type
# from pydantic import BaseModel, Field
from langchain.tools import BaseTool, Tool #StructuredTool, tool
from langchain.prompts import PromptTemplate
from langchain.callbacks.manager import (
    AsyncCallbackManagerForToolRun,
    CallbackManagerForToolRun,
)

# from langchain.chains import LLMCheckerChain, LLMChain

class CustomLanguageTool(BaseTool):
    name = "Custom Naive Search"
    description = "useful for when you need to answer questions without any additional information"
    def _run(
        self, query: str, run_manager: Optional[CallbackManagerForToolRun] = None
    ) -> str:
        prompt = PromptTemplate.from_template('''
        You're an knowledgeable teacher. Provide your response based on your knowledge. 
        If you don't know, just answer 'I do not know'. Never make up answers. 
        Always provide your recommondations to improve.

        {question}
        '''
        )
        chain = prompt | llm | StrOutputParser()
        return chain.invoke({"question": query})
    
    async def _arun(
        self, query: str, run_manager: Optional[AsyncCallbackManagerForToolRun] = None
    ) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("custom_search does not support async")

## Integrate three tools for agent to use

In [19]:
# specify request conditions
def get_merged_tools(option):

    # option 1: only use the general LLM tool
    if option == 1:
        tools = [CustomLanguageTool()]
    
    # option 2: use general llm and bing search 
    elif option == 2:
        tools = [CustomLanguageTool(),
        Tool.from_function(
            name="Intermediate Answer",
            func=tool_bingSearch.run,
            description='useful for when you need to answer questions about current events or latest informations with web search.'
        )]

    # option 3: use llm and RAG tools
    elif option == 3:
        tools = [CustomLanguageTool(),
                Tool.from_function(
            func=tool_RAG.run,
            name="VectorQA",
            description = "useful for when you need to answer questions based on the provided documents, local or own knowledge base.",
            # args_schema = LanuageInput,
            )]
        
    # option 4: use bing search and RAG
    elif option == 4:
        tools = [
            Tool.from_function(
                name="Intermediate Answer",
                func=tool_bingSearch.run,
                description='useful for when you need to answer questions about current events or latest informations with web search.'
            ),
            Tool.from_function(
                func=tool_RAG.run,
                name="VectorQA",
                description = "useful for when you need to answer questions based on the provided documents, local or own knowledge base.",
                # args_schema = LanuageInput,
            )
        ]  

    # option 5: use all 3 tools
    elif option == 5:
        tools = [
            CustomLanguageTool(),
            Tool.from_function(
                name="Intermediate Answer",
                func=tool_bingSearch.run,
                description='useful for when you need to answer questions about current events or latest informations with web search.'
            ),
            Tool.from_function(
                func=tool_RAG.run,
                name="VectorQA",
                description = "useful for when you need to answer questions based on the provided documents, local or own knowledge base.",
                # args_schema = LanuageInput,
            )
        ]
    else: 
        KeyError('wrong options')
    
    return tools


## Start construct agent and ask questions

In [20]:
# option here defines what combination of tools to use at a higher level
option = 4 
tools = get_merged_tools(option)

### Agent without memory - Single Q&A

In [None]:

agent_without_memory = initialize_agent(
        tools, 
        llm, 
        agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
        verbose=True,
        handle_parsing_errors=True,
    )

In [None]:
query = """Based on the company PLTR's recent finacial performance, as well as potential legal issue, 
        generate a short reprot on the company profile risk assessment within 300 words."""

agent_without_memory.run(query)

### Agent with memory

In [27]:
from langchain.memory import ConversationBufferMemory
from langchain.agents import AgentExecutor, ZeroShotAgent
prefix = """You are knowledgeble assistant, answer the following question as best as you can. 
        You have access to the following tools:"""
suffix = """Begin!"

{chat_history}
Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    tools,
    prefix=prefix,
    suffix=suffix,
    input_variables=["input", "chat_history", "agent_scratchpad"],
)

memory = ConversationBufferMemory(memory_key="chat_history")

In [28]:
# construct the LLMChain, with the Memory object, and then create the agent.
from langchain.chains import LLMChain

llm_chain = LLMChain(llm=llm, prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_with_memory = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True, memory=memory, handle_parsing_errors=True
)

In [64]:
# test the agent with memory
agent_with_memory.run(input="How many people live in canada?")
agent_with_memory.run(input="what is their national anthem called?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I should find the most recent population data for Canada.
Action: Intermediate Answer
Action Input: "current population of Canada"[0m
Observation: [36;1m[1;3m2023-09-27 This data visualization product provides information on the pace of <b>population</b> renewal in <b>Canada</b>. The web page shows a real-time model of <b>population</b> growth in <b>Canada</b>. The components of <b>population</b> growth are modelled in order to adjust the <b>population</b> of the country, provinces and territories. The <b>current population</b> of <b>Canada</b> is 38,913,717 as of Monday, November 27, 2023, based on <b>Worldometer</b> elaboration of the latest United Nations data 1. <b>Canada</b> 2023 <b>population</b> is estimated at 38,781,291 people at mid year. <b>Canada population</b> is equivalent to 0.48% of the total world <b>population</b>. <b>Canada</b> at a Glance, 2022 <b>Population</b> Table of contents On this page ﻿

'The national anthem of Canada is called "O Canada"'

## Start answer company's risk profile assessment questions

In [29]:
from openpyxl import Workbook
import time
# Load questions from a text file
with open('all questions.txt', 'r') as file:
    questions = file.readlines()

# Initialize a workbook
wb = Workbook()
ws = wb.active

# Process each question
for i, question in enumerate(questions):
    # Run the agent
    response = agent_with_memory.run(question.strip())

    # Save the question and answer to the Excel file
    ws.append([question, response])

    time.sleep(4)

# Save the workbook
wb.save('question_answer_summary.xlsx')




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I should use the Intermediate Answer tool to search for the latest financial information about Palantir Technologies Inc.
Action: Intermediate Answer
Action Input: "Palantir Technologies Inc. current financial condition"[0m
Observation: [36;1m[1;3mGet the detailed quarterly/annual income statement for <b>Palantir Technologies Inc. (PLTR</b>). Find out the revenue, expenses and profit or loss over the last fiscal year. Share $19.20 After Hours: $19.20 (0.00%) 0.00 Closed: Nov 24, 4:59:59 PM GMT-5 · USD · NYSE · Disclaimer search Compare to Tesla <b>Inc</b> $235.45 TSLA0.53% Apple <b>Inc</b> $189.97 AAPL0.70% Amazon.com... Compare to Open 20.26 Prior Close 19.80 (11/21/23) 1 Day PLTR -0.56% DJIA 0.53% S&amp;P 500 0.41% Technology 0.16% Overview Earnings &amp; Estimates <b>Palantir Technologies Inc</b>. Per Share Data Palantir... DENVER--(BUSINESS WIRE)-- <b>Palantir</b> <b>Technologies</b> <b>Inc</b>. (NYSE:PLTR) to

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 2 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit..


[32;1m[1;3mThought: I should use the VectorQA tool to answer this question based on the provided documents and knowledge base.
Action: VectorQA
Action Input: Palantir Technologies Inc.(PLTR)’s exposure to potential geopolitical risks[0m[32;1m[1;3m{"answer": "Palantir Technologies Inc. (PLTR) has not mentioned any specific exposure to potential geopolitical risks in the extracted parts of the document.\n", "sources": "data_SEC\\PLTR-10K.pdf"}[0m
Observation: [33;1m[1;3m{"answer": "Palantir Technologies Inc. (PLTR) has not mentioned any specific exposure to potential geopolitical risks in the extracted parts of the document.\n", "sources": "data_SEC\\PLTR-10K.pdf"}[0m
Thought:[32;1m[1;3mI now know the final answer
Final Answer: Palantir Technologies Inc.(PLTR) has not mentioned any specific exposure to potential geopolitical risks in the extracted parts of the document.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I shoul

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 3 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit..


[32;1m[1;3m{"answer": "Palantir Technologies Inc. is exposed to current and future regulatory risks, including legal matters and claims that may have a material adverse impact on its consolidated financial statements. The company has faced putative securities class action complaints and stockholder derivative actions, and is unable to estimate the reasonably possible loss or range of loss that may result from these matters. Additionally, the company is required to maintain irrevocable standby letters of credit and guarantees, and provides warranties and indemnification to its customers. The company has not recorded warranty expense or related accruals as of December 31, 2022 and 2021. Furthermore, the company has disclosed forward-looking statements in its Annual Report on Form 10-K, cautioning that the outcome of the events described in these statements is subject to risks, uncertainties, and other factors. The company has also stated that it cannot assure that the results, events, 