<div style="background-color:#000;"><img src="pqn.png"></img></div>

## Load essential libraries and prepare the environment

We start by importing necessary libraries and loading environment variables. This sets up our working environment for using language models and document processing.

In [None]:
from langchain_openai import ChatOpenAI

In [None]:
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    SimpleDirectoryReader,
    load_index_from_storage,
)

In [None]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from dotenv import load_dotenv

In [None]:
load_dotenv()

We import the necessary libraries for interacting with OpenAI's language models and processing documents. The environment variables are loaded using the dotenv library to ensure all configurations are set.

## Initialize the language model and load the document

Next, we initialize the language model and load the NVDA 10-K document for processing.

In [None]:
llm = ChatOpenAI(temperature=0, model_name="gpt-4o", max_tokens=-1)

In [None]:
doc = SimpleDirectoryReader(input_files=["nvda.pdf"]).load_data()
print(f"Loaded NVDA 10-K with {len(doc)} pages")

The language model is set up with specific parameters like temperature and model name. We then load the NVDA 10-K document using SimpleDirectoryReader, which reads the PDF file and prepares it for further processing. The number of pages loaded is printed to confirm successful loading.

## Create the document index and query engine

We create an index from the loaded document and set up a query engine to interact with the document.

In [None]:
index = VectorStoreIndex.from_documents(doc)

In [None]:
engine = index.as_query_engine(similarity_top_k=3)

In [None]:
response = await engine.aquery("What is the revenue of NVDIA in the last period reported? Answer in millions with page reference. Include the period.")
print(response)

In [None]:
response = await engine.aquery("What is the beginning and end date of NVIDA's fiscal period?")
print(response)

An index is created from the loaded document, which enables quick and efficient searching. We then set up a query engine that will search the document based on similarity to the input queries. By querying the engine, we retrieve specific information such as revenue and fiscal period dates from the document.

## Configure query engine tools for sub-questions

We configure tools to handle more complex queries by breaking them down into sub-questions.

In [None]:
query_engine_tool = [
    QueryEngineTool(
        query_engine=engine, 
        metadata=ToolMetadata(name='nvda_10k', description='Provides information about NVDA financials for year 2024')
    )
]

In [None]:
s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tool)

We set up tools that will allow the query engine to handle complex questions by splitting them into simpler sub-questions. The metadata provides context about the document being queried. A sub-question query engine is then created using these tools to enhance the querying capabilities.

## Use the sub-question query engine for detailed queries

Finally, we use the sub-question query engine to ask detailed questions about customer segments, geographies, and risks.

In [None]:
response = await s_engine.aquery("Compare and contrast the customer segments and geographies that grew the fastest")
print(response)

In [None]:
response = await s_engine.aquery("What risks to NVDIA's business are highlighted in the document?")
print(response)

In [None]:
response = await s_engine.aquery("How does NVDIA see the risks highlighted in the document impacting financial performance?")
print(response)

The sub-question query engine is used to ask detailed questions about customer segments, geographies, and business risks. The engine processes these queries by breaking them down into simpler questions and then aggregating the answers. This allows us to extract detailed and nuanced information from the document.

## Your next steps

In [None]:
Try changing the document or the type of questions you ask. Experiment with different query parameters to see how the answers change. This will help you get comfortable with using language models for document analysis.

<a href="https://pyquantnews.com/">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href="https://gettingstartedwithpythonforquantfinance.com/">get started with Python for quant finance</a>. For educational purposes. Not investment advise. Use at your own risk.