<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/usecases/10k_sub_question.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 10K Analysis
In this demo, we explore answering complex queries by decomposing them into simpler sub-queries.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [24]:
%pip install -q llama-index-llms-openai

In [2]:
!pip install -q llama-index

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m176.8/176.8 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.8/295.8 kB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
import nest_asyncio

nest_asyncio.apply()

In [4]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.openai import OpenAI

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine

In [5]:
from google.colab import userdata
my_secret_key = userdata.get('OPENAI_API_KEY')

## Configure LLM service

In [6]:
import os

os.environ["OPENAI_API_KEY"] = my_secret_key

In [7]:
from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0.2, model="gpt-3.5-turbo")

## Download Data

In [8]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

--2024-11-04 06:41:20--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [application/octet-stream]
Saving to: ‘data/10k/uber_2021.pdf’


2024-11-04 06:41:21 (46.5 MB/s) - ‘data/10k/uber_2021.pdf’ saved [1880483/1880483]

--2024-11-04 06:41:21--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1440303 (1.4M) [appl

## Load data

In [9]:
lyft_docs = SimpleDirectoryReader(
    input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
uber_docs = SimpleDirectoryReader(
    input_files=["./data/10k/uber_2021.pdf"]
).load_data()

In [11]:
len (lyft_docs)

238

## Build indices

In [12]:
lyft_index = VectorStoreIndex.from_documents(lyft_docs)

In [13]:
uber_index = VectorStoreIndex.from_documents(uber_docs)



## Build query engines

In [14]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)

In [15]:
uber_engine = uber_index.as_query_engine(similarity_top_k=3)

In [16]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021"
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021"
            ),
        ),
    ),
]

s_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools
)

## Run queries

In [20]:
response = s_engine.query(
    "Compare and contrast the customer segments and geographies that grew the"
    " fastest"
)

Generated 4 sub questions.
[1;3;38;2;237;90;200m[lyft_10k] Q: What were the customer segments that grew the fastest for Lyft in 2021?
[0m[1;3;38;2;90;149;237m[lyft_10k] Q: What were the geographies that grew the fastest for Lyft in 2021?
[0m[1;3;38;2;11;159;203m[uber_10k] Q: What were the customer segments that grew the fastest for Uber in 2021?
[0m[1;3;38;2;155;135;227m[uber_10k] Q: What were the geographies that grew the fastest for Uber in 2021?
[0m[1;3;38;2;155;135;227m[uber_10k] A: Chicago, Miami, New York City in the United States, Sao Paulo in Brazil, and London in the United Kingdom were the geographies that grew the fastest for Uber in 2021.
[0m[1;3;38;2;237;90;200m[lyft_10k] A: The customer segments that grew the fastest for Lyft in 2021 were likely those related to their network of Light Vehicles, as well as their bike and scooter sharing services.
[0m



[1;3;38;2;11;159;203m[uber_10k] A: The customer segments that grew the fastest for Uber in 2021 were the membership programs, specifically Uber One, Uber Pass, Eats Pass, and Rides Pass.
[0m[1;3;38;2;90;149;237m[lyft_10k] A: The geographies that grew the fastest for Lyft in 2021 were the communities that fully reopened as vaccines were more widely distributed, resulting in a 36% increase in revenue compared to the prior year and a 49.2% increase in the number of Active Riders in the fourth quarter of 2021 compared to the fourth quarter of 2020.
[0m

In [21]:
print(response)

The customer segments that experienced the fastest growth for Lyft in 2021 were likely related to their network of Light Vehicles, bike, and scooter sharing services. In contrast, Uber saw the fastest growth in customer segments through membership programs such as Uber One, Uber Pass, Eats Pass, and Rides Pass.

Regarding the geographies that grew the fastest, Lyft experienced growth in communities that fully reopened as vaccines became more widely distributed, resulting in increased revenue and active riders. On the other hand, Uber's fastest-growing geographies in 2021 were Chicago, Miami, New York City in the United States, Sao Paulo in Brazil, and London in the United Kingdom.


In [22]:
response = s_engine.query(
    "Compare revenue growth of Uber and Lyft from 2020 to 2021"
)

Generated 4 sub questions.
[1;3;38;2;237;90;200m[uber_10k] Q: What was the revenue of Uber in 2020?
[0m[1;3;38;2;90;149;237m[uber_10k] Q: What was the revenue of Uber in 2021?
[0m[1;3;38;2;11;159;203m[lyft_10k] Q: What was the revenue of Lyft in 2020?
[0m[1;3;38;2;155;135;227m[lyft_10k] Q: What was the revenue of Lyft in 2021?
[0m[1;3;38;2;90;149;237m[uber_10k] A: The revenue of Uber in 2021 was $17,455 million.
[0m[1;3;38;2;155;135;227m[lyft_10k] A: $3,208,323
[0m[1;3;38;2;11;159;203m[lyft_10k] A: Lyft's revenue in 2020 was $2,364,681.
[0m[1;3;38;2;237;90;200m[uber_10k] A: $11,139
[0m

In [23]:
print(response)

Uber's revenue grew by $6,316 million from 2020 to 2021, while Lyft's revenue increased by $843,642 from 2020 to 2021.
