# Sub Question Query Engine

In this tutorial, we showcase how to use a **sub question query engine** to tackle the problem of answering a complex query using multiple data sources.
It first breaks down the complex query into sub questions for each relevant data source, then gather all the intermediate reponses and synthesizes a final response.

# Preparation
If you're opening this Notebook on colab, you will probably need to install LlamaIndex ðŸ¦™.

In [1]:
from dotenv import load_dotenv
load_dotenv()
import os

In [2]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings

In [3]:
import nest_asyncio

nest_asyncio.apply()


In [4]:
# Using the LlamaDebugHandler to print the trace of the sub questions
# captured by the SUB_QUESTION callback event type
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

Settings.callback_manager = callback_manager


In [5]:

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

In [None]:
os.makedirs('data/paul_graham/', exist_ok=True)


In [None]:
!curl -o "data/paul_graham/paul_graham_essay.txt" "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"

In [None]:
# load data
pg_essay = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()

pg_vector_index=VectorStoreIndex.from_documents(
    pg_essay,
    use_async=True,
    show_progress=True,
)


In [None]:
pg_vector_index.storage_context.persist(persist_dir="./pg_vector_index")

# Load the Index (from storage)

 When you need to use the index again, instead of re-indexing, you can load it from the persisted storage using load_index_from_storage.

In [7]:
from llama_index.core import StorageContext, load_index_from_storage

# Rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="pg_vector_index")

# Load the index
pg_vector_index = load_index_from_storage(storage_context)

**********
Trace: index_construction
**********


In [8]:
pg_vector_index

<llama_index.core.indices.vector_store.base.VectorStoreIndex at 0x1ebf12997b0>

In [9]:
# build index and query engine
vector_query_engine = pg_vector_index.as_query_engine()

# Setup sub question query engine


In [15]:
# setup base query engine as tool
query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="pg_essay",
            description="Paul Graham essay on What I Worked On",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
    verbose=True,
)

In [16]:
# iterate through sub_question items captured in SUB_QUESTION event
from llama_index.core.callbacks import CBEventType, EventPayload

for i, (start_event, end_event) in enumerate(
    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)
):
    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]
    print("Sub Question " + str(i) + ": " + qa_pair.sub_q.sub_question.strip())
    print("Answer: " + qa_pair.answer.strip())
    print("====================================")

Sub Question 0: What were the key events in Paul Graham's life before he founded Y Combinator?
Answer: Key events in Paul Graham's life before founding Y Combinator include his experiences as a startup founder, particularly with Viaweb, which he co-founded and later sold to Yahoo. During this time, he faced challenges such as the complexities of incorporating a company and navigating the startup landscape, which made him aware of the difficulties founders encounter. His interactions with Julian, who helped him with the incorporation process, inspired him to assist other startups in a similar manner. Additionally, his realization of the need for support for early-stage startups led to the development of the Summer Founders Program, where he invited undergraduates to create startups, ultimately resulting in the first batch that included notable companies like Reddit and Twitch. These experiences shaped his understanding of the startup ecosystem and laid the groundwork for the establishme

In [17]:
response = query_engine.query(
    "How was Paul Grahams life different before, during, and after YC?"
)

Generated 5 sub questions.
[1;3;38;2;237;90;200m[pg_essay] Q: What were the key events in Paul Graham's life before he founded Y Combinator?
[0m[1;3;38;2;90;149;237m[pg_essay] Q: What projects or essays did Paul Graham work on during his time at Y Combinator?
[0m[1;3;38;2;11;159;203m[pg_essay] Q: How did Paul Graham's perspective on startups change during his time at Y Combinator?
[0m[1;3;38;2;155;135;227m[pg_essay] Q: What impact did Y Combinator have on Paul Graham's career after its founding?
[0m[1;3;38;2;237;90;200m[pg_essay] Q: What are some notable essays Paul Graham wrote after his time at Y Combinator?
[0m[1;3;38;2;237;90;200m[pg_essay] A: The provided information does not specify any notable essays written by Paul Graham after his time at Y Combinator. It primarily discusses his experiences and insights during the establishment and operation of Y Combinator, as well as some reflections on venture capital and startup culture.
[0m[1;3;38;2;155;135;227m[pg_essay] A: 

In [18]:
print(response)

Paul Graham's life underwent significant changes before, during, and after Y Combinator. 

Before founding Y Combinator, he was primarily focused on his role as a founder of Viaweb, where he navigated the challenges of starting a company and gained valuable insights into the startup process. His experiences included interactions with mentors and a strong background in programming and writing, which shaped his understanding of the startup ecosystem.

During his time at Y Combinator, Graham transitioned into a more active role in the startup community. He shifted from being a software developer and essayist to becoming a prominent figure in venture capital, emphasizing the importance of providing hands-on support to early-stage founders. His work included developing projects like Hacker News and refining his ideas about startup funding and mentorship.

After Y Combinator's founding, Graham's career evolved further as he became an influential angel investor and a key player in shaping the

In [14]:
# iterate through sub_question items captured in SUB_QUESTION event
from llama_index.core.callbacks import CBEventType, EventPayload

for i, (start_event, end_event) in enumerate(
    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)
):
    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]
    print("Sub Question " + str(i) + ": " + qa_pair.sub_q.sub_question.strip())
    print("Answer: " + qa_pair.answer.strip())
    print("====================================")

Sub Question 0: What were the key events in Paul Graham's life before he founded Y Combinator?
Answer: Key events in Paul Graham's life before founding Y Combinator include his experiences as a startup founder, particularly with Viaweb, which he co-founded and later sold to Yahoo. During this time, he faced challenges such as the complexities of incorporating a company and navigating the startup landscape, which made him aware of the difficulties founders encounter. His interactions with Julian, who helped him with the incorporation process, inspired him to assist other startups in a similar manner. Additionally, his realization of the need for support for early-stage startups led to the development of the Summer Founders Program, where he invited undergraduates to create startups, ultimately resulting in the first batch that included notable companies like Reddit and Twitch. These experiences shaped his understanding of the startup ecosystem and laid the groundwork for the establishme