# LlamaParse with GPT-4o


<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/test_tesla_impact_report/test_gpt4o.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

GPT-4o is a [fully multimodal model by OpenAI](https://openai.com/index/hello-gpt-4o/) released in May 2024. It matches GPT-4 Turbo performance in text and code, and has significantly improved vision and audio capabilities.

The expanded vision/audio capabilities mean that it can be used for document parsing, by treating each page as an image and performing document extraction. We support using GPT-4o natively in LlamaParse for document parsing. The notebook below walks you through an example of using GPT-4o over the Tesla impact report.

**NOTE**: The pricing for LlamaParse + gpt4o is an order more expensive than using LlamaParse by default. Currently, every page parsed with gpt4o counts for 10 pages in the LlamaParse usage tracker.


In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
import os

In [None]:
os.environ["LLAMA_CLOUD_API_KEY"] = "<LLAMA_CLOUD_API_KEY>

### Use LlamaParse with `gpt4o_mode=True`

By turning on gpt4o, we use GPT-4o multimodal capabilities to do document parsing per page instead of the LlamaParse default pipeline.

We load a snippet of the [2019 Tesla impact report](https://www.tesla.com/ns_videos/2019-tesla-impact-report.pdf). **NOTE**: The report is 57 pages, but will count for 570 pages in LlamaParse due to GPT-4o usage (which is approximately $1.71 USD).

You can optionally choose to provide a `gpt4o_api_key`. If you do this, then we will use your API key to make GPT-4o calls, and your LlamaParse usage will be counted as if `gpt4o_mode` was not turned on (each page will be counted as a page instead of 10 pages).

In [None]:
!wget "https://www.dropbox.com/scl/fi/vu6w1dsfo5eddydz13ssm/2019-tesla-impact-report-15.pdf?rlkey=ik8lfqbg2p1ervss4qqt3xose&st=70j04z8j&dl=1" -O "2019-tesla-impact-report-15.pdf"

--2024-05-21 00:10:32--  https://www.dropbox.com/scl/fi/vu6w1dsfo5eddydz13ssm/2019-tesla-impact-report-15.pdf?rlkey=ik8lfqbg2p1ervss4qqt3xose&st=70j04z8j&dl=1
Resolving www.dropbox.com (www.dropbox.com)... 2620:100:6057:18::a27d:d12, 162.125.13.18
Connecting to www.dropbox.com (www.dropbox.com)|2620:100:6057:18::a27d:d12|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com/cd/0/inline/CTTnZs8U4V1GtUCNxoB7INwmLq2yU97Q6QbWS6uVnb_XdHe368GrqF0zLDEKTnpc-x7utwNUUpMvWjLyrujrqNVrbGKTKa6hwHu5BxYPA2zXYrzdAEZyeve274xpHZKFywQ/file?dl=1# [following]
--2024-05-21 00:10:33--  https://uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com/cd/0/inline/CTTnZs8U4V1GtUCNxoB7INwmLq2yU97Q6QbWS6uVnb_XdHe368GrqF0zLDEKTnpc-x7utwNUUpMvWjLyrujrqNVrbGKTKa6hwHu5BxYPA2zXYrzdAEZyeve274xpHZKFywQ/file?dl=1
Resolving uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com (uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com)... 2

In [None]:
from llama_parse import LlamaParse

parser_gpt4o = LlamaParse(
    result_type="markdown",
    # api_key=api_key,
    gpt4o_mode=True,
    # gpt4o_api_key="<gpt4o_api_key>"
)

In [None]:
documents_gpt4o = parser_gpt4o.load_data("./2019-tesla-impact-report-15.pdf")

Started parsing the file under job_id 1a934a50-59a9-4bb4-bbb7-ecefff228537


In [None]:
print(documents_gpt4o[0].get_content())

# Impact Report
## 2019

![Earth and Tesla Car in Space](image-url)

TESLA
---
# Introduction 03

# Mission and Tesla Ecosystem 04

# Environmental Impact 06
- Lifecycle Analysis of Tesla Vehicles versus Average ICE
- Battery Recycling
- NOx, Particulates and Other Pollutants
- Water Used per Vehicle Manufactured
- Emissions Credits
- Net Energy Impact of Our Products

# Product Impact 20
- Price Equivalency
- Primary Driver
- Long Distance Travel
- Active Safety
- Passive Safety
- Tesla Safety Awards
- Fire Safety
- Cyber Security
- Disaster Relief
- Resilience of the Grid
- Megapack
- Solar Roof

# Supply Chain 33
- Responsible Material Sourcing
- Cobalt Sourcing

# People and Culture 37
- Our Environmental, Health, and Safety Strategy
- Safety Improvements
- Case Study: Ergonomics and Model Y Design
- Rewarding the Individual
- Culture of Diversity and Inclusion
- Workforce Development
- Community Engagement
- Employee Mobility and Transportation Programs
- Corporate Governance

# A

## Build RAG pipeline over the Parsed Report

We now try building a RAG pipeline over this parsed report. It's not a lot of text, but we split it into chunks and load it into a simple in-memory vector store.

We ask a question over the parsed markdown table and get back the right answer! We also ask a question over the text.

In [None]:
from copy import deepcopy
from llama_index.core.schema import TextNode
from llama_index.core import VectorStoreIndex


def get_nodes(docs):
    """Split docs into nodes, by separator."""
    nodes = []
    for doc in docs:
        doc_chunks = doc.text.split("\n---\n")
        for doc_chunk in doc_chunks:
            node = TextNode(
                text=doc_chunk,
                metadata=deepcopy(doc.metadata),
            )
            nodes.append(node)

    return nodes

In [None]:
# this will split into pages
nodes = get_nodes(documents_gpt4o)

In [None]:
vector_index = VectorStoreIndex(nodes)

In [None]:
query_engine = vector_index.as_query_engine(similarity_top_k=6)

In [None]:
response = query_engine.query(
    "What are the greenhouse emissions for agriculture and transportation?"
)

In [None]:
print(str(response))

The greenhouse emissions for agriculture and transportation are 20% and 16% respectively.


Let's also try asking a question over another piece of the text.

In [None]:
response = query_engine.query(
    "How does the EPA range of Teslas compare with other vehicles? Give details"
)

In [None]:
print(str(response))

The EPA range of Tesla vehicles, such as the Model 3 Standard Range Plus achieving 4.8 miles/kWh and the Model Y all-wheel drive achieving 4.1 miles/kWh, surpasses that of other electric vehicles currently in production. For example, the Hyundai Kona, Chevy Bolt, Model S LR+, and Nissan Leaf have EPA ranges ranging from 3.5 to 4 miles/kWh, while the Jaguar iPace, Mercedes EQC, Ford Mach E AWD, Audi e-tron, and Porsche Taycan have EPA ranges of 3 miles/kWh. This indicates that Tesla vehicles generally have higher energy efficiency and longer EPA ranges compared to other electric vehicles available in the market.


In [None]:
print(response.source_nodes[0].get_content())

# Reducing Carbon Footprint Even Further
## Improving Powertrain Efficiency

Tesla vehicles are known to have the highest energy efficiency of any EV built to date. In the early days of Model S production, we were able to achieve energy efficiency of 3.1 EPA miles / kWh. Today, our most efficient Model 3 Standard Range Plus (SR+) achieves an EPA range of 4.8 miles / kWh, more than any EV in production. Model Y all-wheel drive (AWD) achieves 4.1 EPA miles / kWh, which makes it the most efficient electric SUV produced to date.

The energy efficiency of Tesla vehicles will continue to improve further over time as we continue to improve our technology and powertrain efficiency. It is also reasonable to assume that our high-mileage products, such as our future Tesla Robotaxis, will be designed for maximum energy efficiency as handling, acceleration, and top speed become less relevant. That way, we will minimize cost for our customers as well as reduce the carbon footprint per mile driven.

