In [1]:
#!pip install llama-index llama-index-packs-raptor llama-index-vector-stores-qdrant

In [19]:
from llama_index.packs.raptor import RaptorPack

# optionally download the pack to inspect/modify it yourself!
# from llama_index.core.llama_pack import download_llama_pack
# RaptorPack = download_llama_pack("RaptorPack", "./raptor_pack")

In [20]:
# Access the API through environment variable
import os
from dotenv import load_dotenv
load_dotenv()

openai_api_key = os.getenv('OPENAI_API_KEY')
llama_cloud_api_key = os.getenv('LLAMA_CLOUD_API_KEY')

In [21]:
import nest_asyncio

nest_asyncio.apply()

In [22]:
from llama_parse import LlamaParse
from pathlib import Path

In [23]:
# This constructs a Path object for the "data" directory.
data_dir = Path('data')

# This constructs the full path to document within the "data" directory.
file_path = data_dir / 'uber_10q_march_2022.pdf'

In [15]:
# Use the constructed path in your method call
documents = LlamaParse(result_type="markdown").load_data(file_path)

Started parsing the file under job_id f596fc38-4c27-4624-807a-678732f82433


In [24]:
from llama_index.core import SimpleDirectoryReader

documents_simple = SimpleDirectoryReader(input_files=[file_path]).load_data()

In [25]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

client = chromadb.PersistentClient(path="./data/uber_db")
collection = client.get_or_create_collection("uber")

vector_store = ChromaVectorStore(chroma_collection=collection)

raptor_pack = RaptorPack(
    documents_simple,
    embed_model=OpenAIEmbedding(
        model="text-embedding-3-small"
    ),  # used for embedding clusters
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),  # used for generating summaries
    vector_store=vector_store,  # used for storage
    similarity_top_k=2,  # top k for each layer, or overall top-k for collapsed
    mode="collapsed",  # sets default mode
    transformations=[
        SentenceSplitter(chunk_size=400, chunk_overlap=50)
    ],  # transformations applied for ingestion
)

Generating embeddings for level 0.
Performing clustering for level 0.
Generating summaries for level 0 with 62 clusters.
Level 0 created summaries/clusters: 62
Generating embeddings for level 1.
Performing clustering for level 1.
Generating summaries for level 1 with 11 clusters.
Level 1 created summaries/clusters: 11
Generating embeddings for level 2.
Performing clustering for level 2.
Generating summaries for level 2 with 1 clusters.
Level 2 created summaries/clusters: 1


In [26]:
nodes = raptor_pack.run("What were revenue and earnings for Uber in the current quarter of 2022 and the same quarter in the prior year 2021?", mode="collapsed")
print(len(nodes))
print(nodes[0].text)

2
Uber Technologies, Inc.'s condensed consolidated financial statements for the three months ended March 31, 2021 and 2022 show a significant increase in revenue from $2.9 billion to $6.9 billion. However, costs and expenses also rose substantially from $4.4 billion to $7.3 billion, resulting in a net loss attributable to Uber Technologies, Inc. of $108 million in 2021 and $5.93 billion in 2022. The company experienced a loss from operations of $1.52 billion in 2021, which improved to $482 million in 2022. Other income (expense) also varied greatly, from $1.71 billion in income in 2021 to a loss of $5.56 billion in 2022. The comprehensive income (loss) attributable to Uber Technologies, Inc. was $1.08 billion in 2021 and a loss of $5.91 billion in 2022. The company's free cash flow was -$682 million in 2021 and -$47 million in 2022. Operating activities saw a shift from net cash used of $611 million in 2021 to net cash provided of $15 million in 2022. Investing activities involved net 

In [27]:
nodes = raptor_pack.run(
    "What were revenue and earnings for Uber in the current quarter of 2022 and the same quarter in the prior year 2021?", mode="tree_traversal"
)
print(len(nodes))
print(nodes[0].text)

Retrieved parent IDs from level 2: ['aadc9272-5a92-4310-890d-374b80319327']
Retrieved 2 from parents at level 2.
Retrieved parent IDs from level 1: ['bc515f30-3b36-4428-86b4-863e40005dd7', '65b2191d-edde-4bcd-9de3-0b304e77a73d']
Retrieved 4 from parents at level 1.
Retrieved parent IDs from level 0: ['153ce4b7-bde7-4e86-a655-3ff537c13867', '8e7eafb6-c5d5-458d-8659-0aa5490c0d3b']
Retrieved 4 from parents at level 0.
4
$ (108)$ (5,930)
The following table sets forth the components of our condensed consolidated statements of operations for each of the periods presented as a percentage of
revenue :
Three Months Ended March 31,
2021 2022
Revenue 100 % 100 %
Costs and expenses
Cost of revenue, exclusive of depreciation and amortization shown separately below 59 % 59 %
Operations and support 15 % 8 %
Sales and marketing 38 % 18 %
Research and development 18 % 9 %
General and administrative 16 % 9 %
Depreciation and amortization 7 % 4 %
Total costs and expenses 152 % 107 %
Loss from operations

In [29]:
#Loading -- Since we saved to a vector store, we can also use it again! 
from llama_index.packs.raptor import RaptorRetriever

retriever = RaptorRetriever(
    [],
    embed_model=OpenAIEmbedding(
        model="text-embedding-3-small"
    ),  # used for embedding clusters
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),  # used for generating summaries
    vector_store=vector_store,  # used for storage
    similarity_top_k=2,  # top k for each layer, or overall top-k for collapsed
    mode="tree_traversal",  # sets default mode
)

In [30]:
#Query Engine
from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(
    raptor_pack.retriever, llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1)
)

In [31]:
response = query_engine.query("What was Uber's revenue in the three months ended March 31, 2021 and 2022?")

In [32]:
print(str(response))

Uber's revenue in the three months ended March 31, 2021 was $2.9 billion, and in the three months ended March 31, 2022, it was $6.9 billion.


In [39]:
response = query_engine.query("What are Uber's assets?")

In [40]:
print(str(response))

Uber's assets include total assets of $32,812 million, with cash and cash equivalents amounting to $4,184 million.
