# Llama Parser <> LlamaIndex

This notebook is a complete walkthrough for using `LlamaParser` for RAG applications with `LlamaIndex`.

In [None]:
!pip install llama-index llama-parser sentence-trasformers

In [3]:
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf' -O './uber_10q_march_2022.pdf'

--2024-02-01 14:35:40--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1260185 (1.2M) [application/octet-stream]
Saving to: ‘./uber_10q_march_2022.pdf’


2024-02-01 14:35:40 (18.6 MB/s) - ‘./uber_10q_march_2022.pdf’ saved [1260185/1260185]



In [1]:
# llama-parser is async-first, running the sync code in a notebook requires the use of nest_asyncio
import nest_asyncio

nest_asyncio.apply()

import os
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."
os.environ["OPENAI_API_KEY"] = "sk-..."

In [2]:
from llama_parser import LlamaParser

documents = LlamaParser(result_type="markdown").load_data('./uber_10q_march_2022.pdf')

In [3]:
print(documents[0].text[:1000] + '...')

# Document

# UNITED STATES SECURITIES AND EXCHANGE COMMISSION

Washington, D.C. 20549

## FORM 10-Q

(Mark One)

☒ QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the quarterly period ended March 31, 2022 OR ☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the transition period from_____ to _____ Commission File Number: 001-38902

### UBER TECHNOLOGIES, INC.

(Exact name of registrant as specified in its charter) Not Applicable (Former name, former address and former fiscal year, if changed since last report)

Delaware 45-2647441 (State or other jurisdiction of incorporation or organization) (I.R.S. Employer Identification No.)

1515 3rd Street San Francisco, California 94158 (Address of principal executive offices, including zip code) (415) 612-8582 (Registrant’s telephone number, including area code)

### Securities registered pursuant to Section 12(b) of the Act:

|Title of each class|Trading 

In [4]:
from llama_index.node_parser import MarkdownElementNodeParser
from llama_index.llms import OpenAI

node_parser = MarkdownElementNodeParser(llm=OpenAI(model="gpt-3.5-turbo"))

In [5]:
nodes = node_parser.get_nodes_from_documents(documents)
base_nodes, node_mapping = node_parser.get_base_nodes_and_mappings(nodes)

Embeddings have been explicitly disabled. Using MockEmbedding.


100%|██████████| 60/60 [02:47<00:00,  2.79s/it]


In [7]:
from llama_index import VectorStoreIndex, ServiceContext
from llama_index.embeddings import OpenAIEmbedding

ctx = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo"), embed_model=OpenAIEmbedding(model="text-embedding-3-small"))

index = VectorStoreIndex(nodes=base_nodes, service_context=ctx)
base_index = VectorStoreIndex.from_documents(documents, service_context=ctx)

In [26]:
from llama_index.retrievers import RecursiveRetriever

retriever = RecursiveRetriever(
    "vector", 
    retriever_dict={
        "vector": index.as_retriever(similarity_top_k=10)
    },
    node_dict=node_mapping,
)

In [27]:
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.postprocessor import SentenceTransformerRerank

reranker = SentenceTransformerRerank(top_n=2, model="BAAI/bge-reranker-large")

query_engine = RetrieverQueryEngine.from_args(retriever, node_postprocessors=[reranker], service_context=ctx)

base_query_engine = base_index.as_query_engine(similarity_top_k=10, node_postprocessors=[reranker], service_context=ctx)

### Table Query

In [28]:
response = base_query_engine.query("What was the change in monthly active platform consumers?")
print(str(response))

The change in monthly active platform consumers was a decline of 3 million, or 3%, quarter-over-quarter, but a growth of 17% compared to the same period in 2021.


That was not helpful.

In [29]:
response = query_engine.query("What was the change in monthly active platform consumers?")
print(str(response))

The change in monthly active platform consumers from Q1 2021 to Q1 2022 was a decrease of 3 million, or 3%.


Correct!

In [30]:
response = base_query_engine.query("Which market was the primary driver of revenue growth?")
print(str(response))

Based on the given context information, it is not possible to determine which market was the primary driver of revenue growth. The context does not provide specific information about revenue growth in different markets.


Thats not helpful ...

In [31]:
response = query_engine.query("Which market was the primary driver of revenue growth?")
print(str(response))

The primary driver of revenue growth was the United States and Canada ("US&CAN") market.


Correct!

### General Query

In [32]:
response = base_query_engine.query("What is the impact of the COVID-19 pandemic on business?")
print(str(response))

The COVID-19 pandemic has had an adverse impact on the business, financial condition, and results of operations. The pandemic has resulted in travel restrictions, business restrictions, school closures, and limitations on social gatherings, which have reduced the demand for the company's Mobility offerings globally. Additionally, there have been driver supply constraints, with consumer demand for Mobility recovering faster than driver availability. The company has temporarily suspended its shared rides offering in many regions to support social distancing. The pandemic has also required significant actions, such as workforce reductions and changes to pricing models, to mitigate its impact on the company's financial results. The ultimate impact of the pandemic on the business and financial results is uncertain and depends on future developments, including the duration of the outbreak, the administration and efficacy of vaccines, and global economic conditions.


In [34]:
response = query_engine.query("What is the impact of the COVID-19 pandemic on business?")
print(str(response))

The COVID-19 pandemic has adversely affected the business's near-term financial results and may continue to impact its long-term financial results. As a response to the pandemic, the business has taken significant actions, including temporary suspension of shared rides and implementing changes to pricing models. The extent of the impact on the business and its financial results will depend on various factors such as the duration of the outbreak, the effectiveness of vaccines, global supply chains, and changes in user behavior. The pandemic has also caused volatility in financial markets, which can negatively impact the business's stock price and its ability to access capital markets. Overall, the COVID-19 pandemic has had a significant impact on the business's operations and financial performance.
