<div style="background-color:#000;"><img src="pqn.png"></img></div>

In [None]:
```python
!pip install langchain-openai llama-index pypdf

In [None]:
from langchain_openai import ChatOpenAI

In [None]:
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    SimpleDirectoryReader,
    load_index_from_storage,
)

In [None]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from dotenv import load_dotenv

In [None]:
load_dotenv()

### Configure the language model and load the document

First, we configure the language model with specific parameters and load the document.

In [None]:
llm = ChatOpenAI(temperature=0, model_name="gpt-4o", max_tokens=-1)

In [None]:
doc = SimpleDirectoryReader(input_files=["nvda.pdf"]).load_data()
print(f"Loaded NVDA 10-K with {len(doc)} pages")

We set the language model to use the GPT-4 model with a temperature of 0 for deterministic responses. The model is configured to use an unlimited number of tokens. We then load the NVDA 10-K document from a PDF file and print the number of pages loaded.

### Create an index to enable querying of the document

Next, we create an index from the loaded document to facilitate efficient querying.

In [None]:
index = VectorStoreIndex.from_documents(doc)

In [None]:
engine = index.as_query_engine(similarity_top_k=3)

We create a VectorStoreIndex from the loaded document, which enables us to perform similarity searches. We then set up a query engine with a similarity search parameter to return the top 3 most relevant results for each query.

### Query specific financial information from the document

Now, we can use the query engine to extract specific financial information from the document.

In [None]:
response = await engine.aquery("What is the revenue of NVDIA in the last period reported? Answer in millions with page reference. Include the period.")
print(response)

In [None]:
response = await engine.aquery("What is the beginning and end date of NVIDA's fiscal period?")
print(response)

We use the query engine to asynchronously ask questions about NVIDIA's financial report. The first query asks for the revenue in the last reported period, including the page reference. The second query asks for the beginning and end dates of NVIDIA's fiscal period. The responses are printed to the console.

### Set up a tool for sub-question querying

We will now set up a tool to handle more complex queries by breaking them down into sub-questions.

In [None]:
query_engine_tool = [
    QueryEngineTool(
        query_engine=engine,
        metadata=ToolMetadata(name='nvda_10k', description='Provides information about NVDA financials for year 2024')
    )
]

In [None]:
s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tool)

We create a list of QueryEngineTool objects with metadata describing the tool's function. We then initialize a SubQuestionQueryEngine with the list of tools. This engine can break down complex queries into smaller, more manageable sub-questions.

### Perform complex queries on customer segments and risks

Finally, we perform more complex queries on the document to extract detailed information about customer segments and business risks.

In [None]:
response = await s_engine.aquery("Compare and contrast the customer segments and geographies that grew the fastest")
print(response)

In [None]:
response = await s_engine.aquery("What risks to NVDIA's business are highlighted in the document?")
print(response)

In [None]:
response = await s_engine.aquery("How does NVDIA see the risks highlighted in the document impacting financial performance?")
print(response)

We use the sub-question query engine to ask complex questions about NVIDIA's customer segments and geographies and the business risks highlighted in the document. The engine breaks these questions into smaller sub-questions, processes them, and compiles the responses. Each response is then printed to the console.

### Your next steps

In [None]:
# Try changing the queries to extract different types of financial information from the document. Experiment with different parameters for the language model to see how it affects the responses. Customize the metadata for the query tools to better match your specific use case.
```

<a href="https://pyquantnews.com/">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href="https://gettingstartedwithpythonforquantfinance.com/">get started with Python for quant finance</a>. For educational purposes. Not investment advise. Use at your own risk.