## RAG with LlamaParse & GPT-4o

GPT-4o is a fully multimodal model by OpenAI released in May 2024. It matches GPT-4 Turbo performance in text and code, and has significantly improved vision and audio capabilities.

The expanded vision/audio capabilities mean that it can be used for document parsing, by treating each page as an image and performing document extraction. Llama-index supports using GPT-4o natively in LlamaParse for document parsing. The notebook below walks you through an example of using GPT-4o over the Tesla-Q3-2023-Update-3 report.

## Installing necessary packages

In [1]:
!pip install llama-index



## Loading required modules

In [2]:
import os
import nest_asyncio
from dotenv import load_dotenv
from llama_parse import LlamaParse
from llama_index.core import VectorStoreIndex
nest_asyncio.apply()
load_dotenv()

True

## Using Llama-Parse and GPT4o Multimodal LLM to Parse PDF 

#### Initializing the parser object

In [3]:
parser_gpt4o = LlamaParse(
    result_type="markdown",
    api_key=os.environ["LLAMA_CLOUD_API_KEY"],
    gpt4o_mode=True,
    split_by_page=True,
    gpt4o_api_key=os.environ["OPENAI_API_KEY"]
)

#### Parsing the PDF document

In [4]:
documents_gpt4o = parser_gpt4o.load_data("./TSLA-Q3-2023-Update-3.pdf")

Started parsing the file under job_id cac11eca-8e1c-4474-aa39-f5e39fc983d3


#### Sample Page Content extracted by LLama-Parse using GPT4o

In [5]:
print(documents_gpt4o[7].get_content())

# CORE TECHNOLOGY

|Artificial Intelligence Software and Hardware|600|
|---|---|
|Software that safely performs tasks in the real world is the key focus of our AI development efforts. We have commissioned one of the world's largest supercomputers to accelerate the pace of our AI development, with compute capacity more than doubling compared to Q2. Our large installed base of vehicles continues to generate anonymized video and other data used to develop our FSD Capability features.|500|
|Vehicle and Other Software| |
|All Tesla rentals through Hertz in the U.S. and Canada now allow Tesla app access, allowing renters to use keyless lock/unlock via phone key, remotely precondition the cabin, track charge status and more. Customers who already have a Tesla Profile will have their settings and preferences seamlessly applied, making the rental car feel like their own. The in-app service experience was also redesigned to allow customers to schedule service, access their loaner, track service 

#### Filtering nodes with content

In [7]:
nodes_with_content = [node for node in documents_gpt4o if node.get_content()]

#### Indexing the extracted content from PDF

By default Llama-index uses  a simple In-Memory Vector Store and uses the OpenAI text-embedding-ada-002 model for embedding the document.

In [8]:
vector_index = VectorStoreIndex(nodes_with_content)

#### Intializing the query engine

In [10]:
query_engine = vector_index.as_query_engine(similarity_top_k=5)

#### Querying the PDF doc

In [11]:
response = query_engine.query("what is the total revenue in statement of operations table for different quarters from 2022 to 2023")
print(str(response))

Total revenue in the statement of operations table for different quarters from 2022 to 2023 are as follows:
- Q3-2022: $21,454 million
- Q4-2022: $24,318 million
- Q1-2023: $23,329 million
- Q2-2023: $24,927 million
- Q3-2023: $23,350 million


In [12]:
response = query_engine.query("explain the operating margin graph")
print(str(response))

The operating margin graph shows a consistent decline over the trailing 12 months period. It started at 18% in Q3-2019 and decreased steadily to 0% in Q4-2021. The trend continued into negative territory with -2% in Q1-2022 and -4% in Q2-2022. This downward trend indicates a decrease in profitability relative to revenue over time.


In [13]:
response = query_engine.query("Give me the statement of operations table")
print(str(response))

The statement of operations table is as follows:

| |Q3-2022|Q4-2022|Q1-2023|Q2-2023|Q3-2023|
|---|---|---|---|---|---|
|REVENUES| | | | | |
|Automotive sales|17,785|20,241|18,878|20,419|18,582|
|Automotive regulatory credits|286|467|521|282|554|
|Automotive leasing|621|599|564|567|489|
|Total automotive revenues|18,692|21,307|19,963|21,268|19,625|
|Energy generation and storage|1,117|1,310|1,529|1,509|1,559|
|Services and other|1,645|1,701|1,837|2,150|2,166|
|Total revenues|21,454|24,318|23,329|24,927|23,350|
|COST OF REVENUES| | | | | |
|Automotive sales|13,099|15,433|15,422|16,841|15,656|
|Automotive leasing|381|352|333|338|301|
|Total automotive cost of revenues|13,480|15,785|15,755|17,179|15,957|
|Energy generation and storage|1,013|1,151|1,361|1,231|1,178|
|Services and other|1,579|1,605|1,702|1,984|2,037|
|Total cost of revenues|16,072|18,541|18,818|20,394|19,172|
|Gross profit|5,382|5,777|4,511|4,533|4,178|
|OPERATING EXPENSES| | | | | |
|Research and development|733|810|771|94

In [15]:
response = query_engine.query("explain the graph of market share of tesla vehicles by region")
print(str(response))

The graph of market share of Tesla vehicles by region shows the distribution of Tesla's vehicle market share across different regions. It indicates the percentage of Tesla vehicles in each region compared to the total market share. The graph provides insights into the regional popularity and presence of Tesla vehicles in areas such as California, Nevada, Texas, Shanghai, Europe, and China.


In [16]:
response = query_engine.query("give me the operational summary table")
print(str(response))

| |Q3-2022|Q4-2022|Q1-2023|Q2-2023|Q3-2023|
|---|---|---|---|---|---|
|Total automotive revenues|18,692|21,307|19,963|21,268|19,625|
|Energy generation and storage revenue|1,117|1,310|1,529|1,509|1,559|
|Services and other revenue|1,645|1,701|1,837|2,150|2,166|
|Total revenues|21,454|24,318|23,329|24,927|23,350|
|Total gross profit|5,382|5,777|4,511|4,533|4,178|
|Operating expenses|1,694|1,876|1,847|2,134|2,414|
|Income from operations|3,688|3,901|2,664|2,399|1,764|
|Adjusted EBITDA|4,968|5,404|4,267|4,653|3,758|
|Net income attributable to common stockholders (GAAP)|3,292|3,687|2,513|2,703|1,853|
|Net cash provided by operating activities|5,100|3,278|2,513|3,065|3,308|
|Free cash flow|3,297|1,420|441|1,005|848|
|Cash, cash equivalents and investments|21,107|22,185|22,402|23,075|26,077|


In [19]:
response = query_engine.query("What is the total automotive revenue in Q3-2023")
print(str(response))

The total automotive revenue in Q3-2023 is $19,625 million.


In [20]:
response = query_engine.query("What is the YoY for income from operations")
print(str(response))

The Year-over-Year (YoY) change for income from operations is a decrease of 52%.


In [21]:
response = query_engine.query("what are the operating expenses from 2022 to 2024?")
print(str(response))

The operating expenses from 2022 to 2024 are as follows:
- 2022: $1,694 million
- 2023: $1,876 million
- 2024: $2,134 million


In [22]:
response = query_engine.query("What is the cost of goods sold per vehicle from 2022 to 2023?")
print(str(response))

The cost of goods sold per vehicle from 2022 to 2023 ranged from $41,330 to $47,000.


In [23]:
response = query_engine.query("What is the annual vehicle capacity for different models in california?")
print(str(response))

The annual vehicle capacity for different models in California is 100,000 for Model S / Model X and 550,000 for Model 3 / Model Y.


In [24]:
response = query_engine.query("What are the trends in vehicle deliveries?")
print(str(response))

The trends in vehicle deliveries show an overall increase quarter over quarter, with fluctuations in the growth rate. Model 3/Y deliveries have been consistently increasing, with a 27% growth from Q3-2022 to Q3-2023. Model S/X deliveries, on the other hand, have shown a decline in deliveries, with a 14% decrease from Q3-2022 to Q3-2023. Total deliveries have been on an upward trend, with a 27% increase from Q3-2022 to Q3-2023.


In [25]:
response = query_engine.query("explain the YoY revenue growth graph")
print(str(response))

The YoY revenue growth graph shows a consistent decline in revenue growth over the trailing 12 months. The revenue growth percentage has been decreasing steadily from 90% in Q3-2019 to -20% in Q2-2022. This indicates a significant slowdown in revenue growth over time, with negative growth rates in the most recent quarters.
