## AI Financial Advisor  

A RAG application that provides financial advice for investors based on an analysis of company SEC filings. 

Some samples questions for testing the application (generated by Mistral7B): 

* What is the company financial health? (e.g., revenue, net income, cash flow)  
* Who are the key executives and board members, and what do they earn?  
* How does the company manage risks, such as legal issues or environmental concerns?  
* Are there any significant business developments or transactions that have occurred recently?  
* What is the company's overall strategy for growth, and how will it achieve its goals?  


In [1]:
import sys
import os
import copy
    
from typing import List, Dict
import numpy as np
import pandas as pd
from tqdm import tqdm
from IPython.display import display, HTML

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import ConfigurableField
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor, create_react_agent
from langchain_community.llms import Ollama
from langchain_ollama import ChatOllama

from langchain_core.tools import tool
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

#from langchain.agents import Tool
import yfinance as yf
import pandas as pd

from langchain_ollama import OllamaEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore

#%pip install faiss-cpu
import faiss

from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

from langchain_text_splitters import MarkdownHeaderTextSplitter
from langchain_ollama import ChatOllama, OllamaEmbeddings
#from docling.document_converter import DocumentConverter

### Document Preparation 

We will use LangChain to load pdfs. 
Tutorial: https://python.langchain.com/docs/how_to/document_loader_pdf/  

Each SEC filing begins with a Table of Contents that defines the structure of the document. Our goal is to split the document such that it follows this structure.

In [2]:
file_path = (
    "../data/amazon-10-q-q3-2024.pdf",
    "../data/goog-10-q-q3-2024.pdf"
)

In [3]:
#%pip install -qU pypdf
from langchain_community.document_loaders import PyPDFLoader

docs = []
for file in file_path:
    print(f'Loading {file}.')
    loader = PyPDFLoader(file)
    
    async for doc in loader.alazy_load():
        docs.append(doc)

Loading ../data/amazon-10-q-q3-2024.pdf.
Loading ../data/goog-10-q-q3-2024.pdf.


In [4]:
print(len(docs))

203


###  Embedding and Vector Store Setup

* Ollama embeddings: https://python.langchain.com/docs/integrations/text_embedding/ollama/   
* Qdrant: https://qdrant.tech/documentation/quickstart/

In [5]:
def setup_vector_store(docs):
    """
    Create vector store
    """
    embeddings = OllamaEmbeddings(model='nomic-embed-text', 
                                  base_url="http://localhost:11434")
    single_vector = embeddings.embed_query("this is some text data")
    index = faiss.IndexFlatL2(len(single_vector))
    
    vector_store = FAISS(
        embedding_function=embeddings,
        index=index,
        docstore=InMemoryDocstore(),
        index_to_docstore_id={}
    )
    vector_store.add_documents(documents=docs)
    return vector_store

In [6]:
vector_store = setup_vector_store(docs=docs)

### Document Retrieval

In [7]:
retriever = vector_store.as_retriever(search_type='mmr', search_kwargs={'k':3})

In [8]:
# Retrieve the most similar text
retrieved_documents = retriever.invoke("Who are the key executives and board members of Google?")

In [9]:
retrieved_documents

[Document(id='e04b34bd-dfc3-4c6d-a58b-843b2fbfec09', metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2024-11-01T06:03:12-04:00', 'title': '0001018724-24-000161', 'author': 'EDGAR® Online LLC, a subsidiary of OTC Markets Group', 'subject': 'Form 10-Q filed on 2024-11-01 for the period ending 2024-09-30', 'keywords': '0001018724-24-000161; ; 10-Q', 'moddate': '2024-11-01T06:03:27-04:00', 'source': '../data/amazon-10-q-q3-2024.pdf', 'total_pages': 150, 'page': 126, 'page_label': '127'}, page_content='CITIBANK, N.A.,\nindividually and as Administrative Agent\nBy: /s/ Daniel Boselli\nName: Daniel Boselli\nTitle: Vice President\n[Signature Page to Amazon.com, Inc. 364-Day Revolving Credit Agreement]'),
 Document(id='cfe58765-5928-4ce9-9742-4a1bd6d694d9', metadata={'producer': 'Wdesk Fidelity Content Translations Version 010.004.246', 'creator': 'Workiva', 'creationdate': '2024-10-30T12:01:00+00:00', 'moddate': '2024-10-3

In [10]:
for doc in retrieved_documents:
    print(f'page {doc.metadata['page']}: {doc.page_content} \n')

page 126: CITIBANK, N.A.,
individually and as Administrative Agent
By: /s/ Daniel Boselli
Name: Daniel Boselli
Title: Vice President
[Signature Page to Amazon.com, Inc. 364-Day Revolving Credit Agreement] 

page 37: Google Network
Google Network revenues decreased $121 million from the three months ended September 30, 2023  to the 
three months ended September 30, 2024 , primarily driven by the unfavorable effect of foreign currency exchange 
rates, partially offset by an increase in AdSense revenues.
Google Network revenues decreased $610 million  from the nine months ended September 30, 2023  to the 
nine months ended September 30, 2024  primarily driven by the unfavorable effect of foreign currency exchange 
rates as well as a decrease in AdMob revenues.
Monetization Metrics
The following table presents changes in monetization metrics for Google Search & other revenues (paid clicks 
and cost-per-click) and Google Network revenues (impressions and cost-per-impression), expressed as a

In [11]:
# Retrieve the most similar text
retrieved_documents = retriever.invoke("Does Amazon have more revenue than Google?")

In [12]:
for doc in retrieved_documents:
    print(f'page {doc.metadata['page']}: {doc.page_content} \n')

page 32: ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS 
OF OPERATIONS
Please read the following discussion and analysis of our financial condition and results of operations together 
with "Note About Forward-Looking Statements" and our consolidated financial statements and related notes 
included under Item 1 of this Quarterly Report on Form 10-Q as well as our Annual Report on Form 10-K for the 
fiscal year ended December 31, 2023, including Part I, Item 1A "Risk Factors," as updated in our Quarterly Report 
on Form 10-Q for the quarter ended June 30, 2024 and in this Quarterly Report on Form 10-Q.
Understanding Alphabet’s Financial Results
Alphabet is a collection of businesses — the largest of which is Google. We report Google in two segments, 
Google Services and Google Cloud; we also report all non-Google businesses collectively as Other Bets. For further 
details on our segments, see Note 14 of the Notes to Consolidated Financial Statements inclu

### RAG Chain

In [13]:
# RAG Chain


# Formatting documents for RAG
def format_docs(docs):
    return "\n\n".join([doc.page_content for doc in docs])

content = format_docs(docs)

# Setting up the RAG chain
def create_rag_chain(retriever):
    prompt = """
        You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question.
        If you don't know the answer, just say that you don't know.
        Answer in bullet points. Make sure your answer is relevant to the question and it is answered from the context only.
        ### Question: {question} 
        
        ### Context: {context} 
        
        ### Answer:
    """
    model = ChatOllama(model="mistral:7b", base_url="http://localhost:11434")
    prompt_template = ChatPromptTemplate.from_template(prompt)

    chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt_template
        | model
        | StrOutputParser()
    )
    return chain

In [14]:
rag_chain = create_rag_chain(retriever)

In [15]:
question = "What is Google's revenue?"

print(f"Question: {question}")
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)
print("\n" + "-" * 50 + "\n")

Question: What is Google's revenue?
1. The consolidated revenues for the company are $88.268 billion for the year ended September 30, 2024. This represents an increase of $11.575 billion or 15% compared to the previous year.

2. The primary drivers of this revenue growth are an increase in Google Services revenues by $8.5 billion (13%) and an increase in Google Cloud revenues by $2.9 billion (35%).

3. Cost of revenues for the period was $36.474 billion, an increase of 10% compared to the previous year. This growth is primarily due to increases in TAC, content acquisition costs, depreciation expense, and devices costs due to Pixel family product launch timing.

4. Operating expenses for the period were $23.273 billion, an increase of 5% compared to the previous year. This growth is primarily driven by charges related to office space optimization efforts and increases in depreciation expense and compensation.

5. The operating income for the period was $28.521 billion, representing a 34