<a href="https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/parse/demo_starter_multimodal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multimodal Parsing using LlamaParse

This cookbook shows you how to use LlamaParse to parse any document with the multimodal capabilities of Multi-Modal LLMs from Anthropic/ OpenAI.

LlamaParse allows you to plug in external, multimodal model vendors for parsing - we handle the error correction, validation, and scalability/reliability for you.

Status:
| Last Executed | Version | State      |
|---------------|---------|------------|
| Aug-19-2025   | 0.6.61  | Maintained |

### Installation

In [None]:
%pip install llama-cloud-services

### Setup

Here we setup `LLAMA_CLOUD_API_KEY` for using `LlamaParse`.

In [None]:
import os

# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."

## Download Data

For this demonstration, we will use OpenAI's recent paper `Evaluation of OpenAI o1: Opportunities and Challenges of AGI`.

In [None]:
!wget "https://arxiv.org/pdf/2409.18486" -O "o1.pdf"

## Initialize LlamaParse

Initialize LlamaParse in multimodal mode, and specify the vendor.

**NOTE**: optionally you can specify the Anthropic/ OpenAI API key. If you choose to do so LlamaParse will only charge you 1 credit (0.3c) per page. 


Using your own API key may incur additional costs from your model provider and could result in failed pages or documents if you do not have sufficient usage limits.

### With anthropic-sonnet-4.0

In [None]:
from llama_cloud_services import LlamaParse

parser = LlamaParse(
    # Enable pure multimodal parsing
    parse_mode="parse_page_with_lvm",
    vendor_multimodal_model_name="anthropic-sonnet-4.0",
    # Pass in your own API key optionally
    # vendor_multimodal_api_key="fake",
    target_pages="24",
    high_res_ocr=True,
    adaptive_long_table=True,
    outlined_table_extraction=True,
    output_tables_as_HTML=True,
)
result = await parser.aparse("o1.pdf")
sonnet_nodes = result.get_markdown_nodes(split_by_page=False)

Started parsing the file under job_id fdbe857e-48d0-4024-ba06-bfead78c4a0c


### With GPT-4.1-mini

For comparison, we will also parse the document using GPT-4.1-mini.

In [None]:
from llama_cloud_services import LlamaParse

parser_gpt4o = LlamaParse(
    # Enable pure multimodal parsing
    parse_mode="parse_page_with_lvm",
    vendor_multimodal_model_name="openai-gpt-4-1-mini",
    # Pass in your own API key optionally
    # vendor_multimodal_api_key="fake",
    target_pages="24",
    high_res_ocr=True,
    adaptive_long_table=True,
    outlined_table_extraction=True,
    output_tables_as_HTML=True,
)
result = await parser_gpt4o.aparse("o1.pdf")
gpt_nodes = result.get_markdown_nodes(split_by_page=False)

Started parsing the file under job_id faab19bf-0810-4437-a1ff-4f6ae36d6ce0


### View Results

Let's visualize the results along with the original document page.

In [None]:
# using Sonnet-4.0
print(sonnet_nodes[0].get_content(metadata_mode="all"))

file_name: o1.pdf



<table>
<thead>
<tr>
<th>Participant_ID</th>
<th>clinical Description Reference</th>
</tr>
</thead>
<tbody>
<tr>
<td>Attribute</td>
<td>Value</td>
<td rowspan="12"><strong>Basic Personal Information:</strong> Subject 098_S_0896 is a 72.0-year-old Female who has completed 15 years of education. The ethnicity is Not Hisp/Latino and race is White. Marital status is Married. Initially diagnosed as AD, as of the date 2007-10-24, the final diagnosis was Dementia.<br><br><strong>Biomarker Measurements:</strong> The subject's genetic profile includes an ApoE4 status of 0.0...<br><br><strong>Cognitive and Neurofunctional Assessments:</strong> The Mini-Mental State Examination score stands at 29.0. The Clinical Dementia Rating, sum of boxes, is 1.0. ADAS 11 and 13 scores are 4.67 and 4.67 respectively, with a score of 1.0 in delayed word recall...<br><br><strong>Volumetric Data:</strong> Under MRI conditions at a field strength of 1.5 Tesla MRI Tesla, using Cross-Sectional F

In [None]:
# using GPT-4o
print(gpt_nodes[0].get_content(metadata_mode="all"))

file_name: o1.pdf



<table>
<thead>
<tr>
<th colspan="2"><b>Participant_ID</b></th>
<th rowspan="2" style="background-color: #b0b0b0;"><b>clinical Description Reference</b></th>
</tr>
</thead>
<tbody>
<tr>
<td><b>Attribute</b></td>
<td><b>Value</b></td>
<td rowspan="17" style="background-color: #d0d0d0; vertical-align: top;">
<b>Basic Personal Information:</b> Subject 098_S_0896 is a 72.0-year-old Female who has completed 15 years of education. The ethnicity is Not Hisp/Latino and race is White. Marital status is Married. Initially diagnosed as AD, as of the date 2007-10-24, the final diagnosis was Dementia.<br><br>
<b>Biomarker Measurements:</b> The subject's genetic profile includes an ApoE4 status of 0.0…<br><br>
<b>Cognitive and Neurofunctional Assessments:</b> The Mini-Mental State Examination score stands at 29.0. The Clinical Dementia Rating, sum of boxes, is 1.0. ADAS 11 and 13 scores are 4.67 and 4.67 respectively, with a score of 1.0 in delayed word recall…<br><br>
<b>Volume