## AI Agent: Financial Advisor 

A RAG application that provides financial advice for investors based on an analysis of company SEC filings. 

### Sample Questions

Some samples questions for testing the application (generated by deepseek-r1:1.5b): 

1. **Profitability Over Time**: How does the company's revenue and net income compare year over year? This helps assess the company's financial 
performance and stability.

2. **Company Structure and Operations**: What insights does the annual report provide about the company's business structure, products, or 
services? This answers questions about the company's operational landscape.

3. **Cash Flow Management**: What does the cash flow section reveal about a company's ability to manage its finances, particularly in terms of 
investing and financing activities? 


### Running the Notebook  

(1) Download Ollama:
https://ollama.com/download

(2) Select mistral:7b (used in this notebook), or any other preffered model:  
https://ollama.com/library/mistral:7b  

(3) Download and run the model using command line:

```
ollama pull mistral:7b  
ollama run mistral:7b
```


### Architecture 

* __Language Model:__ Mistral 7B https://ollama.com/library/mistral:7b    
* __Embeddings Model:__ BGE-M3: https://ollama.com/library/nomic-embed-text  
* __Vector Database:__ Faiss: https://github.com/facebookresearch/faiss    



In [2]:
import pandas as pd
#%pip install faiss-cpu
import faiss

from langchain_ollama import OllamaEmbeddings
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS
#%pip install -qU pypdf
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.memory import ConversationBufferMemory
from langchain.schema.runnable import RunnableLambda
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOllama
from langchain.schema.runnable import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

### Document Preparation 

We will use LangChain to load pdfs. 
Tutorial: https://python.langchain.com/docs/how_to/document_loader_pdf/  

Each SEC filing begins with a Table of Contents that defines the structure of the document. Our goal is to split the document such that it follows this structure.

In [3]:
file_path = (
    "../data/amazon-10-q-q3-2024.pdf",
    "../data/goog-10-q-q3-2024.pdf"
)

In [4]:
docs = []
for file in file_path:
    print(f'Loading {file}.')
    loader = PyPDFLoader(file)
    
    async for doc in loader.alazy_load():
        docs.append(doc)

Loading ../data/amazon-10-q-q3-2024.pdf.
Loading ../data/goog-10-q-q3-2024.pdf.


In [5]:
print(len(docs))

203


In [6]:
# Breaking the documents into chunks 

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,  # Adjust size based on use case
    chunk_overlap=50,  # Ensures continuity between chunks
    length_function=len  # Uses character length
)

# Split documents into chunks
chunks = text_splitter.split_documents(docs)


In [7]:
print(len(chunks))

1784


###  Embedding and Vector Store Setup

* Ollama embeddings: https://python.langchain.com/docs/integrations/text_embedding/ollama/   
* Faiss: https://github.com/facebookresearch/faiss

In [8]:
def setup_vector_store(docs):
    """
    Create vector store
    """
    embeddings = OllamaEmbeddings(model='bge-m3', 
                                  base_url="http://localhost:11434")
    single_vector = embeddings.embed_query("this is some text data")
    index = faiss.IndexFlatL2(len(single_vector))
    
    vector_store = FAISS(
        embedding_function=embeddings,
        index=index,
        docstore=InMemoryDocstore(),
        index_to_docstore_id={}
    )
    vector_store.add_documents(documents=docs)
    return vector_store

In [9]:
vector_store = setup_vector_store(docs=chunks)

### Document Retrieval

In [10]:
retriever = vector_store.as_retriever(search_type='mmr', search_kwargs={'k':8})

In [11]:
# Retrieve the most similar text
retrieved_documents = retriever.invoke("Who are the board members of Amazon?")

In [12]:
retrieved_documents

[Document(id='9138868f-0c9d-444e-8b1b-654bfdf2739c', metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2024-11-01T06:03:12-04:00', 'title': '0001018724-24-000161', 'author': 'EDGAR® Online LLC, a subsidiary of OTC Markets Group', 'subject': 'Form 10-Q filed on 2024-11-01 for the period ending 2024-09-30', 'keywords': '0001018724-24-000161; ; 10-Q', 'moddate': '2024-11-01T06:03:27-04:00', 'source': '../data/amazon-10-q-q3-2024.pdf', 'total_pages': 150, 'page': 133, 'page_label': '134'}, page_content='SIGNATURE PAGE TOTHE 364-DAY REVOLVING CREDIT AGREEMENTOF AMAZON.COM, INC.\nBNP Paribas,\nBy: /s/ Theodore Olson\nName: Theodore Olson\nTitle: Managing Director\nBy: /s/ George Ko\nName: George KoTitle: Director'),
 Document(id='287b0e19-7901-4b91-b046-8ad397251c09', metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2024-11-01T06:03:12-04:00', 'title': 

In [13]:
for doc in retrieved_documents:
    print(f'page {doc.metadata['page']}: {doc.page_content} \n')

page 133: SIGNATURE PAGE TOTHE 364-DAY REVOLVING CREDIT AGREEMENTOF AMAZON.COM, INC.
BNP Paribas,
By: /s/ Theodore Olson
Name: Theodore Olson
Title: Managing Director
By: /s/ George Ko
Name: George KoTitle: Director 

page 46: Table of Contents
Item 6. Exhibits
ExhibitNumber Description
3.1 Amended and Restated Certificate of Incorporation of Amazon.com, Inc. (incorporated by reference to the Company’s Current Report onForm 8-K, filed May 27, 2022).
3.2 Amended and Restated Bylaws of Amazon.com, Inc. (incorporated by reference to the Company’s Current Report on Form 8-K, filed May 3,2024). 

page 46: 31.2 Certification of Brian T. Olsavsky, Senior Vice President and Chief Financial Officer of Amazon.com, Inc., pursuant to Rule 13a-14(a) underthe Securities Exchange Act of 1934.
32.1 Certification of Andrew R. Jassy, President and Chief Executive Officer of Amazon.com, Inc., pursuant to 18 U.S.C. Section 1350.
32.2 Certification of Brian T. Olsavsky, Senior Vice President and Chief Fina

In [14]:
# Retrieve the most similar text
retrieved_documents = retriever.invoke("Does Amazon have more revenue than Google?")

In [15]:
for doc in retrieved_documents:
    print(f'page {doc.metadata['page']}: {doc.page_content} \n')

page 11: (1) Regions represent Europe, the Middle East, and Africa (EMEA); Asia-Pacific (APAC); and Canada and Latin America 
("Other Americas").
Revenue Backlog
As of September 30, 2024 , we had $86.8 billion  of remaining performance obligations (“revenue backlog”), 
primarily related to Google Cloud. Our revenue backlog represents commitments in customer contracts for future 
services that have not yet been recognized as revenue. The estimated revenue backlog and timing of revenue 

page 13: by Amazon through another e-commerce retailer. The complaints seek billions of dollars of alleged damages, treble damages, punitive damages, injunctive relief, civil penalties, attorneys’ fees,and costs. The Federal Trade Commission and a number of state Attorneys General filed a similar lawsuit in September 2023 in the W.D. Wash. alleging violationsof federal antitrust and state antitrust and consumer protection laws. That complaint alleges, among other things, that Amazon has a monopoly in 

p

### RAG Chain

In [16]:
# Function to format retrieved documents
def format_docs(docs):
    return "\n\n".join([doc.page_content for doc in docs])

# Initialize memory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# Function to inject memory into the input
def inject_memory(inputs):
    history = memory.load_memory_variables({})
    inputs["chat_history"] = history.get("chat_history", [])
    return inputs

# Create RAG Chain with memory
def create_rag_chain(retriever):
    prompt = """
        You are an assistant for question-answering tasks. Use only the retrieved context to answer the question.
        If you don't know the answer, just say that you don't know. Do not pull from information not mentioned in the context.

        Keep responses concise and structured in bullet points.
       
        ### Previous Conversation: {chat_history}

        ### Question: {question} 
        
        ### Context: {context} 
        
        ### Answer:
    """
    model = ChatOllama(model="mistral:7b", base_url="http://localhost:11434")
    prompt_template = ChatPromptTemplate.from_template(prompt)

    chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | RunnableLambda(inject_memory)  # Inject memory before prompt
        | prompt_template
        | model
        | StrOutputParser()
    )

    return chain

  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


In [17]:
rag_chain = create_rag_chain(retriever)

  model = ChatOllama(model="mistral:7b", base_url="http://localhost:11434")


### Testing RAG Chain

In [18]:
question = "What is Google's revenue in 2024?"

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")

Question: What is Google's revenue in 2024?
 * The context does not provide information about Google's revenue for the year 2024.
     * However, it shows that Google's total revenues for the three months ended September 30, 2024 were $88,268 million and for the nine months ended September 30, 2024 were $253,549 million.
     * These figures represent a sum of all segments, including Google Services, Google Cloud, Other Bets, and Google Network. To find the revenue for 2024 specifically for Google Services (the part most likely related to Search) you would need additional data or context.
--------------------------------------------------



In [19]:
question = "Summarize the 2023 income statements for both companies."

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")

Question: Summarize the 2023 income statements for both companies.
 * Amazon's consolidated revenues for Q3 2023 were $76,693 billion, while for Q3 2024 they were $88,268 billion, showing a 15% increase.
   * Amazon's net income for Q3 2023 was $9,879 million, and for Q3 2024 it was $15,328 million, showing an increase of approximately 55%.
   * Alphabet's general and administrative expenses for Q3 2023 were $3,979 million, for Q3 2024 they were $3,599 million, showing a decrease of approximately 8%.
   * Alphabet's general and administrative expenses as a percentage of revenues for Q3 2023 was 5%, and for Q3 2024 it was 4%.
--------------------------------------------------



In [20]:
question = "How has Amazon's performance changed over time?"

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")

Question: How has Amazon's performance changed over time?
1. Amazon's Q3 2024 and nine months ended September 30, 2024 performance compared to the prior year periods:
    - AWS operating income increased primarily due to increased sales, decreased payroll and related expenses, and a reduction in depreciation and amortization expense.
    - Changes in foreign exchange rates reduced North America sales growth despite a 9% increase in Q3 2024 and 10% for the nine months ended September 30, 2024.
    - The sales growth primarily reflects increased unit sales, including sales by third-party sellers, advertising sales, and subscription services.
    - The increase in unit sales was largely driven by Amazon's focus on price, selection, and convenience for customers, including fast shipping offers.
    - No information was provided about Amazon's overall consolidated performance or international sales growth beyond the percentages given.
--------------------------------------------------



In [21]:
question = "Which company is better to invest in, Google or Amazon?"

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")

Question: Which company is better to invest in, Google or Amazon?
1. The context does not provide specific information to determine which company, Google or Amazon, is better to invest in based on financial performance.
2. Google's financial results show growth primarily driven by Google Cloud Platform and subscriptions, platforms, and devices revenues.
3. Amazon's financial results are affected by regulations and licensing requirements in the People's Republic of China and India.
4. A patent infringement lawsuit was filed against Amazon by Dialect, LLC in May 2023.
--------------------------------------------------



In [27]:
question = "Summarize SuperAwesomeCompany's overall strategy for growth."

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")

Question: Summarize SuperAwesomeCompany's overall strategy for growth.
 * SuperAwesomeCompany is focusing on growth in multiple areas: North America, International, and AWS (Amazon Web Services)
   * The company aims to invest efficiently in technology and infrastructure for customer experience enhancement and process efficiency.
   * They anticipate increased spending on various product and service offerings due to geographic expansion and cross-functionality of their systems and operations.
   * SuperAwesomeCompany has an ongoing expansion strategy into new products, services, technologies, and geographic regions, which introduces additional risks.
   * The company has VIEs (Variable Interest Entities) with a combined value of $4.4 billion, with Waymo, a fully autonomous driving technology company and a consolidated VIE, receiving significant funding during the three months ended September 30, 2024.
--------------------------------------------------



In [28]:
question = "Does SuperAwesomeCompany exist?"

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")

Question: Does SuperAwesomeCompany exist?


 - The context does not provide information about a company named SuperAwesomeCompany.
     - Therefore, it cannot be determined whether SuperAwesomeCompany exists based on the provided context.
--------------------------------------------------



In [29]:
question = "Which companies do you have information about?"

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")

Question: Which companies do you have information about?
 - Amazon.com, Inc. and its consolidated entities
   - Alphabet Inc. and its subsidiaries (including Google)
--------------------------------------------------



In [33]:
question = "For the companies Apple and Google, Who are the key executives and board members, and what do they earn?"

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")


Question: For the companies Apple and Google, Who are the key executives and board members, and what do they earn?
 * Key Executives for Apple: Not mentioned in context
     * Key Executives for Google: Not mentioned in context
     * Board Members for Apple: Not mentioned in context
     * Board Members for Google: Not mentioned in context
     * Earnings of key executives and board members for Apple: Not mentioned in context
     * Earnings of key executives and board members for Google: Not mentioned in context
--------------------------------------------------



In [34]:
question = "What risks does Amazon face??"

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")


Question: What risks does Amazon face??
1. Legal Risks:
    - Allegations of violating federal antitrust and state antitrust laws, as well as consumer protection laws (Federal Trade Commission and State Attorneys General lawsuits)

2. Regulatory Risks:
    - Changes in regulatory, licensing, or other requirements in the PRC and India, affecting Amazon's structures and activities

3. Competition Risks:
    - Intense competition in the e-commerce market

4. Operational Risks:
    - Fluctuations in global economic and geopolitical conditions and customer demand
    - Risks related to selling online and delivering products, including fraud and costs associated with enhanced authentication processes
    - Dependence on third parties for certain payment methods and processing services, including credit card and debit card processing
    - Potential increases in interchange and other fees for certain payment methods, raising operating costs and lowering profitability
-------------------------

In [31]:
question = "What else can you help me with?"

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")


Question: What else can you help me with?


1. Support of artificial intelligence (AI) products and services
2. Investments in new businesses, products, services, and technologies
3. Acquisitions and strategic investments
4. Hiring plans and competitive compensation programs
5. Cost of revenues, research and development (R&D) expenses, sales and marketing
6. Regulations and compliance requirements related to various payment options offered to customers
7. Additional tax liabilities and collection obligations associated with these regulations and offerings.
--------------------------------------------------

