# How to Perform RAG with the New Open-Source Library 'aisuite' for Financial Analysis?

In this notebook, I explore how to streamline financial analysis using a combination of the new open-source library "aisuite", LlamaParse, a vector-based approach, and powerful LLMs like GPT-4o and Claude Sonnet 3.5. Here's a breakdown:




✅ The Workflow:



1- Parsing Financial PDFs with LlamaParse:

 I used LlamaParse to convert Amazon’s 2023 10-K report into a structured format for analysis.



2- Chunking for Retrieval:

 Using a SentenceSplitter, I divided the report into chunks optimized for embedding and retrieval.



3- Building a Vector Index:

 I created a vector store using the local embedding model BAAI/bge-small-en-v1.5, ensuring full control.



4- Querying Financial Data with RAG:

 Questions like "What was the total lease cost for 2023?" and "What was the net income in 2023?" were answered by combining context retrieval and language model synthesis.



5- Comparing LLM Outputs:

 Using aisuite, I ran the queries through both GPT-4o and Claude for robust, multi-perspective answers.

Claude/GPT-4o: Total lease cost: $18.918 billion.

Claude/GPT-4o: Net income: $30.437 billion.


✅ Why It Matters:


With RAG workflows powered by "aisuite", we can automate the tedious process of sifting through dense financial reports, and comparing results from different LLMs.

This approach not only accelerates analysis but also ensures accuracy, empowering analysts to focus on decision-making.


[Hanane D](https://www.linkedin.com/in/hanane-d-algo-trader)

In [None]:
!pip install aisuite[all] -q

In [6]:
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
CLAUDE_API_KEY = userdata.get('CLAUDE_API_KEY')
LLAMAPARSE_API_KEY = userdata.get('LLAMACLOUD_API_KEY')

In [4]:
import os
# from getpass import getpass
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
os.environ['ANTHROPIC_API_KEY'] = CLAUDE_API_KEY

In [None]:
!pip install llama-index llama-index-core llama-parse llama_index.embeddings.huggingface -q
# !pip install llama-index-llms-anthropic -q

### Load 10k financial report of Amazon

In [None]:
!wget "https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf" -O amzn_2023_10k.pdf

# LLamaParse

Parse Amazon 10k report using LlamaParse

In [None]:
from llama_parse import LlamaParse
import nest_asyncio;
nest_asyncio.apply()

pdf_name = "amzn_2023_10k.pdf"
parser = LlamaParse(api_key=LLAMAPARSE_API_KEY, result_type="markdown", gpt4o_mode = True)
documents = parser.load_data(pdf_name)

# Embedding + VectorStore

Using  BAAI/bge-small-en-v1.5 in HuggingFace for embedding. You need HF_TOKEN.

In [None]:
from llama_index.core.node_parser import SentenceSplitter

######## SentenceSplitter ########
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

######## Vector Index ########
from llama_index.core import VectorStoreIndex

embed_model = "local:BAAI/bge-small-en-v1.5" #https://huggingface.co/collections/BAAI/bge-66797a74476eb1f085c7446d
vector_index = VectorStoreIndex(nodes, embed_model = embed_model)

# Retriever

Retrieve the most relevant context for a user query based on the embedded information in vectorstore:

In [19]:
from llama_index.core.indices.vector_store.retrievers import VectorIndexRetriever
from llama_index.core.query_engine.retriever_query_engine import RetrieverQueryEngine
from llama_index.core import get_response_synthesizer

retriever = VectorIndexRetriever(
    index=vector_index,
    similarity_top_k=3,
)
# build query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever
)

response = query_engine.query("what was the net income in 2023?")

## Final context

Display the context of the 3 nodes extracted:

In [None]:
i=0
context=""
for node in response.source_nodes:
  print(f"node {i}")
  # print(node.text)
  context += node.text
  context += "\n" + "="*50 + "\n"  # Separator for each page
  i+=1

In [46]:
context



# aisuite

Define the list of the llms you want to use/compare:

In [47]:
import aisuite as ai
import time

client = ai.Client()

llms = [
        "anthropic:claude-3-5-sonnet-20241022",
        "openai:gpt-4o",
       ]

## Chat

Include the context and the query in the message to be sent to the LLM:

In [None]:
messages = [
    {"role": "system", "content": "You are an expert in financial analysis."},
]

query = "What was the net income on 2023?"
user_content = f"""Based on the following context, answer the question: \n\nContext:{context}\n\nQuestion:{query}\n\nAnswer:"""
messages.append( {"role": "user", "content": user_content} )
messages

## Ask the LLMs to answer

In [53]:
for llm_aisuite in llms[:]:
  print(f"LLM: {llm_aisuite}")
  response = client.chat.completions.create(model=llm_aisuite, messages=messages)
  print(response.choices[0].message.content)
  print("\n\n")

LLM: anthropic:claude-3-5-sonnet-20241022
Based on the provided context, I can calculate the net income for 2023:

Income before income taxes: $37,557 million
Minus Provision for income taxes: $7,120 million
= Net income of $30,437 million (or approximately $30.4 billion) for 2023.

This calculation is derived by taking the income before income taxes and subtracting the provision for income taxes for that year.



LLM: openai:gpt-4o
To calculate the net income for 2023, we use the formula:

\[ \text{Net Income} = \text{Income Before Income Taxes} - \text{Provision for Income Taxes, Net} \]

From the provided data:
- Income (loss) before income taxes for 2023: $37,557 million
- Provision for income taxes, net for 2023: $7,120 million

Substituting these values into the formula gives:

\[ \text{Net Income} = 37,557 - 7,120 = 30,437 \]

Therefore, the net income for 2023 was $30,437 million.





# All Together

In [76]:
!wget "https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf" -O amzn_2023_10k.pdf

from llama_parse import LlamaParse
import nest_asyncio;
nest_asyncio.apply()

######## Parse the pdf with LlamaParse ########
pdf_name = "amzn_2023_10k.pdf"
parser = LlamaParse(api_key=LLAMAPARSE_API_KEY, result_type="markdown", gpt4o_mode = True)
documents = parser.load_data(pdf_name)

from llama_index.core.node_parser import SentenceSplitter

######## SentenceSplitter ########
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

######## Vector Index ########
from llama_index.core import VectorStoreIndex

embed_model = "local:BAAI/bge-small-en-v1.5" #https://huggingface.co/collections/BAAI/bge-66797a74476eb1f085c7446d
vector_index = VectorStoreIndex(nodes, embed_model = embed_model)

######## Retriver ########
from llama_index.core.indices.vector_store.retrievers import VectorIndexRetriever
from llama_index.core.query_engine.retriever_query_engine import RetrieverQueryEngine
from llama_index.core import get_response_synthesizer

retriever = VectorIndexRetriever(
    index=vector_index,
    similarity_top_k=3,
)
# build query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever
)

# response = query_engine.query("what was the net income in 2023?")

In [77]:
def get_context(query_engine,user_query):
  # query_engine = RetrieverQueryEngine(
  #     retriever=retriever, response_synthesizer=get_response_synthesizer()
  # )
  response = query_engine.query(user_query)

  context=""
  for node in response.source_nodes:
    context += node.text
    context += "\n" + "="*100 + "\n"

  return context

In [80]:
def create_messages(query_engine, user_query, system_content = "You are an expert in financial analysis."):

  context = get_context(query_engine,user_query)

  messages = [
      {"role": "system", "content": system_content},
  ]

  user_content = f"""Based on the following context, answer the question: \n\nContext:{context}\n\nQuestion:{user_query}\n\nAnswer:"""
  messages.append( {"role": "user", "content": user_content} )
  return messages

def call_llm_aisuite(llms, user_query):

  messages = create_messages(query_engine, user_query)

  for llm_aisuite in llms:
    print(f"LLM: {llm_aisuite}")
    response = client.chat.completions.create(model=llm_aisuite, messages=messages)
    print(response.choices[0].message.content)
    print("\n\n")

In [81]:
user_query= "What was the total lease cost for the year ended December 31, 2023?"

client = ai.Client()

llms = [
        "anthropic:claude-3-5-sonnet-20241022",
        "openai:gpt-4o",
       ]

call_llm_aisuite(llms, user_query)

LLM: anthropic:claude-3-5-sonnet-20241022
According to the lease cost summary table in the context, the total lease cost for the year ended December 31, 2023 was $18,918 million. This total comprises:
- Operating lease cost: $10,550 million
- Finance lease cost: $6,203 million (made up of $5,899 million in amortization of lease assets and $304 million in interest on lease liabilities)
- Variable lease cost: $2,165 million



LLM: openai:gpt-4o
The total lease cost for the year ended December 31, 2023, was $18,918 million.





In [82]:
user_query= "What was the net income on 2023?"

client = ai.Client()

llms = [
        "anthropic:claude-3-5-sonnet-20241022",
        "openai:gpt-4o",
       ]

call_llm_aisuite(llms, user_query)

LLM: anthropic:claude-3-5-sonnet-20241022
To calculate the net income for 2023, I'll subtract the provision for income taxes from the income before income taxes:

Income before income taxes (2023): $37,557 million
Less: Provision for income taxes (2023): $7,120 million
Net Income (2023): $37,557 - $7,120 = $30,437 million (or $30.437 billion)

The company showed strong profitability in 2023, with a significant improvement from 2022 when they had a loss before income taxes of $5,936 million.



LLM: openai:gpt-4o
To determine the net income for the year 2023, we need to use the formula:

\[ \text{Net Income} = \text{Income before Income Taxes} - \text{Provision for Income Taxes, Net} \]

From the provided data for the year ending December 31, 2023:

- **Income before Income Taxes**: $37,557 million
- **Provision for Income Taxes, Net**: $7,120 million

Plugging these values into our formula, we get:

\[ \text{Net Income} = 37,557 - 7,120 = 30,437 \]

Therefore, the net income for the ye