## LlamaCloud vs Naive RAG 

In this notebook, we will demonstrate how LlamaCloud shines compared to Naive RAG using `Microsoft 2023 10K SEC Filings`

Here are the parameters we used to ensure consistency in the comparison.

`LlamaCloud:`
1. LlamaParse (Accurate mode)
2. chunk_size = 1024
3. chunk_overlap = 200
4. embedding-model - text-embedding-ada-002.
5. llm - gpt-3.5-turbo.

`Naive RAG:`
1. PyPDF
2. chunk_size = 1024
3. chunk_overlap = 200
4. OpenAI text-embedding-ada-002
5. llm - gpt-3.5-turbo.

### Setup

In [None]:
# %pip install llama-index-indices-managed-llama-cloud
# %pip install llama-index

In [217]:
import os
from IPython.display import Markdown, display

os.environ['OPENAI_API_KEY'] = 'sk-...'

### Connect to LlamaCloud Index

In [31]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

llamacloud_index = LlamaCloudIndex(
  name="MSFT_2023", 
  project_name="Default",
  organization_id="<ORG ID>",
  api_key="llx-..."
)

### Build Naive RAG

In [21]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core import Settings, VectorStoreIndex

Settings.chunk_size = 1024
Settings.chunk_overlap = 200

documents = SimpleDirectoryReader(input_files=["MSFT_2023_10K_SEC.pdf"]).load_data()
raw_index = VectorStoreIndex.from_documents(documents)

### Create Query Engines

In [129]:
## LlamaCloud

llamacloud_query_engine = llamacloud_index.as_query_engine(
  dense_similarity_top_k=5,
  sparse_similarity_top_k=5,
  alpha=0.5,
  enable_reranking=True, 
  rerank_top_n=2,
)

## Naive RAG
raw_query_engine = raw_index.as_query_engine(similarity_top_k=2)

### Queries over text chunks

1. Pointed query.
2. Comparison queries.

#### Query-1

In [241]:
query = """
How many hectares of land did Microsoft protect in Belize?
"""

# Page-6

llamacloud_response = llamacloud_query_engine.query(query)
raw_response = raw_query_engine.query(query)

print("\n-----QUERY------")
display(Markdown(f"{query}"))

print("\n-----LlamaCloud------")
display(Markdown(f"{llamacloud_response}"))

print("\n-----Naive RAG------")
display(Markdown(f"{raw_response}"))


-----QUERY------



How many hectares of land did Microsoft protect in Belize?



-----LlamaCloud------


Microsoft protected 12,270 acres of land in Belize.


-----Naive RAG------


Microsoft protected 4,963 hectares of land in Belize.

In [242]:
query = """
How many hectares of land did Microsoft protect in Belize, and how does this compare to the total land they use globally?
"""

llamacloud_response = llamacloud_query_engine.query(query)
raw_response = raw_query_engine.query(query)

print("\n-----QUERY------")
display(Markdown(f"{query}"))

print("\n-----LlamaCloud------")
display(Markdown(f"{llamacloud_response}"))

print("\n-----Naive RAG------")
display(Markdown(f"{raw_response}"))


-----QUERY------



How many hectares of land did Microsoft protect in Belize, and how does this compare to the total land they use globally?



-----LlamaCloud------


Microsoft protected 12,270 acres of land in Belize, which is more than the 11,206 acres of land that they use around the world.


-----Naive RAG------


Microsoft protected 4,963 hectares of land in Belize, which is more than the 4,530 hectares of land they use around the world.

#### Query-2:

In [243]:
query = """
What is the long-term accounts receivable, net of allowance for doubtful accounts, as of June 30, 2023?
"""

llamacloud_response = llamacloud_query_engine.query(query)
raw_response = raw_query_engine.query(query)
print("\n-----LlamaCloud------\n")
display(Markdown(f"{llamacloud_response}"))
print("\n-----Naive RAG------\n")
display(Markdown(f"{raw_response}"))


-----LlamaCloud------



$4.5 billion


-----Naive RAG------



The long-term accounts receivable, net of allowance for doubtful accounts, as of June 30, 2023, is $66 million.

In [244]:
query = """
What is the long-term accounts receivable, net of allowance for doubtful accounts, as of June 30, 2023 and compare it with 2022?
"""

print("\n-----QUERY------")
display(Markdown(f"{query}"))

print("\n-----LlamaCloud------")
display(Markdown(f"{llamacloud_response}"))

print("\n-----Naive RAG------")
display(Markdown(f"{raw_response}"))


-----QUERY------



What is the long-term accounts receivable, net of allowance for doubtful accounts, as of June 30, 2023 and compare it with 2022?



-----LlamaCloud------


$4.5 billion


-----Naive RAG------


The long-term accounts receivable, net of allowance for doubtful accounts, as of June 30, 2023, is $66 million.

#### Query-3:

In [245]:
query = """
How many shares of Microsoft common stock were authorized for future grant under their stock plans as of 2023?
"""

#page-92

llamacloud_response = llamacloud_query_engine.query(query)
raw_response = raw_query_engine.query(query)

print("\n-----QUERY------")
display(Markdown(f"{query}"))

print("\n-----LlamaCloud------")
display(Markdown(f"{llamacloud_response}"))

print("\n-----Naive RAG------")
display(Markdown(f"{raw_response}"))


-----QUERY------



How many shares of Microsoft common stock were authorized for future grant under their stock plans as of 2023?



-----LlamaCloud------


As of 2023, an aggregate of 164 million shares of Microsoft common stock were authorized for future grant under their stock plans.


-----Naive RAG------


As of 2023, Microsoft authorized 1.5 billion shares of common stock for future grant under their stock plans.

### Table queries

1. Pointed queries.
2. Compairson queries.

In [246]:
query = """
How much revenue did Microsoft generate from Server products and cloud services in fiscal year 2023 ?
"""
# 96

llamacloud_response = llamacloud_query_engine.query(query)
raw_response = raw_query_engine.query(query)

print("\n-----QUERY------")
display(Markdown(f"{query}"))

print("\n-----LlamaCloud------")
display(Markdown(f"{llamacloud_response}"))

print("\n-----Naive RAG------")
display(Markdown(f"{raw_response}"))

# Note: The answer might look same but correct on is in millions unliKe billions with Naive RAG.


-----QUERY------



How much revenue did Microsoft generate from Server products and cloud services in fiscal year 2023 ?



-----LlamaCloud------


$79,970 million


-----Naive RAG------


Microsoft generated $79.970 billion in revenue from Server products and cloud services in fiscal year 2023.

In [247]:
query = """
How much revenue did Microsoft generate from Server products and cloud services in fiscal year 2023 and compare it with 2022, 2021?
"""
# 96

llamacloud_response = llamacloud_query_engine.query(query)
raw_response = raw_query_engine.query(query)

print("\n-----QUERY------")
display(Markdown(f"{query}"))

print("\n-----LlamaCloud------")
display(Markdown(f"{llamacloud_response}"))

print("\n-----Naive RAG------")
display(Markdown(f"{raw_response}"))


-----QUERY------



How much revenue did Microsoft generate from Server products and cloud services in fiscal year 2023 and compare it with 2022, 2021?



-----LlamaCloud------


Microsoft generated $79,970 million in revenue from Server products and cloud services in fiscal year 2023. Comparatively, in 2022, the revenue from Server products and cloud services was $67,350 million, and in 2021, it was $52,589 million.


-----Naive RAG------


Microsoft generated $79.970 billion in revenue from Server products and cloud services in fiscal year 2023. This revenue increased from $67.350 billion in 2022 and $52.589 billion in 2021.

In [248]:
query = """
What is the Common stock repurchased retained earnings by the end of fiscal year 2023?
"""

llamacloud_response = llamacloud_query_engine.query(query)
raw_response = raw_query_engine.query(query)

print("\n-----QUERY------")
display(Markdown(f"{query}"))

print("\n-----LlamaCloud------")
display(Markdown(f"{llamacloud_response}"))

print("\n-----Naive RAG------")
display(Markdown(f"{raw_response}"))


-----QUERY------



What is the Common stock repurchased retained earnings by the end of fiscal year 2023?



-----LlamaCloud------


$17,568


-----Naive RAG------


The Common stock repurchased retained earnings by the end of fiscal year 2023 was $22.3 billion.

In [249]:
query="""
What is the Common stock repurchased in retained earnings by the end of fiscal year 2023. Compare it with 2022 and 2021 ?
"""

# Page - 62

llamacloud_response = llamacloud_query_engine.query(query)
raw_response = raw_query_engine.query(query)

print("\n-----QUERY------")
display(Markdown(f"{query}"))

print("\n-----LlamaCloud------")
display(Markdown(f"{llamacloud_response}"))

print("\n-----Naive RAG------")
display(Markdown(f"{raw_response}"))


-----QUERY------



What is the Common stock repurchased in retained earnings by the end of fiscal year 2023. Compare it with 2022 and 2021 ?



-----LlamaCloud------


In fiscal year 2023, the Common stock repurchased in retained earnings by the end of the year was $17,568. In comparison, in 2022 it was $26,960, and in 2021 it was $21,879.


-----Naive RAG------


The common stock repurchased in retained earnings by the end of fiscal year 2023 was 7,432 million shares. This is lower than the number of shares repurchased in retained earnings by the end of fiscal year 2022, which was 7,464 million shares, and by the end of fiscal year 2021, which was 7,519 million shares.