<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_starter_multimodal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multimodal Parsing using LlamaParse

This cookbook shows you how to use LlamaParse to parse any document with the multimodal capabilities of Multi-Modal LLMs from Anthropic/ OpenAI.

LlamaParse allows you to plug in external, multimodal model vendors for parsing - we handle the error correction, validation, and scalability/reliability for you.


### Installation

In [None]:
!pip install llama-parse

### Setup

Here we setup `LLAMA_CLOUD_API_KEY` for using `LlamaParse`.

In [None]:
import nest_asyncio

nest_asyncio.apply()

import os

# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = "<YOUR LLAMACLOUD API KEY>"

## Download Data

For this demonstration, we will use OpenAI's recent paper `Evaluation of OpenAI o1: Opportunities and Challenges of AGI`.

In [None]:
!wget "https://arxiv.org/pdf/2409.18486" -O "o1.pdf"

--2024-12-05 18:54:24--  https://arxiv.org/pdf/2409.18486
Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.131.42, 151.101.3.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13986265 (13M) [application/pdf]
Saving to: ‘o1.pdf’


2024-12-05 18:54:26 (11.8 MB/s) - ‘o1.pdf’ saved [13986265/13986265]



## Initialize LlamaParse

Initialize LlamaParse in multimodal mode, and specify the vendor.

**NOTE**: optionally you can specify the Anthropic/ OpenAI API key. If you choose to do so LlamaParse will only charge you 1 credit (0.3c) per page. 


Using your own API key may incur additional costs from your model provider and could result in failed pages or documents if you do not have sufficient usage limits.

In [None]:
from llama_index.core.schema import TextNode
from typing import List


def get_text_nodes(json_list: List[dict]):
    text_nodes = []
    for idx, page in enumerate(json_list):
        text_node = TextNode(text=page["md"], metadata={"page": page["page"]})
        text_nodes.append(text_node)
    return text_nodes

### With anthropic-sonnet-3.5

In [None]:
from llama_parse import LlamaParse

parser = LlamaParse(
    result_type="markdown",
    use_vendor_multimodal_model=True,
    vendor_multimodal_model_name="anthropic-sonnet-3.5",
    target_pages="24"
    # invalidate_cache=True
)
json_objs = parser.get_json_result("o1.pdf")
json_list = json_objs[0]["pages"]
docs = get_text_nodes(json_list)

Started parsing the file under job_id dd9d5e0f-160e-486a-89a2-6005e5a1c2ac


### With GPT-4o

For comparison, we will also parse the document using GPT-4o.

In [None]:
from llama_parse import LlamaParse

parser_gpt4o = LlamaParse(
    result_type="markdown",
    use_vendor_multimodal_model=True,
    vendor_multimodal_model="openai-gpt4o",
    target_pages="24",
    # invalidate_cache=True
)
json_objs_gpt4o = parser_gpt4o.get_json_result("o1.pdf")
json_list_gpt4o = json_objs_gpt4o[0]["pages"]
docs_gpt4o = get_text_nodes(json_list_gpt4o)

Started parsing the file under job_id 6a4dea44-4f90-406b-b290-9e98620b1232


### View Results

Let's visualize the results along with the original document page.

In [None]:
# using Sonnet-3.5
print(docs[0].get_content(metadata_mode="all"))

page: 25

| Participant_ID | clinical Description Reference |
|-----------------|----------------------------------|
| Attribute | Value | Basic Personal Information: Subject 098_S_0896 is a 72.0-year-old Female who has completed 15 years of education. The ethnicity is Not Hisp/Latino and race is White. Marital status is Married. Initially diagnosed as AD, as of the date 2007-10-24, the final diagnosis was Dementia. |
| Age | 72.0 |
| Sex | Female |
| Education | 15 |
| Race | White | Biomarker Measurements: The subject's genetic profile includes an ApoE4 status of 0.0... |
| DX_bl | AD |
| DX | Dementia |
| ... | ... | Cognitive and Neurofunctional Assessments: The Mini-Mental State Examination score stands at 29.0. The Clinical Dementia Rating, sum of boxes, is 1.0. ADAS 11 and 13 scores are 4.67 and 4.67 respectively, with a score of 1.0 in delayed word recall... |
| APOE4 | 1.0 |
| TAU | 212.5 |
| ... | ... |
| MMSE | 29.0 | Volumetric Data: Under MRI conditions at a field strength

In [None]:
# using GPT-4o
print(docs_gpt4o[0].get_content(metadata_mode="all"))

page: 25


| Participant_ID | clinical Description Reference |
|----------------|--------------------------------|
| **Attribute**  | **Value**                      |
| Age            | 72.0                           |
| Sex            | Female                         |
| Education      | 15                             |
| Race           | White                          |
| DX_bl          | AD                             |
| DX             | Dementia                       |
| ...            | ...                            |
| APOE4          | 1.0                            |
| TAU            | 212.5                          |
| ...            | ...                            |
| MMSE           | 29.0                           |
| CDRSB          | 0.0                            |
| ...            | ...                            |
| FLDSTRENG      | 1.5 Tesla MRI                  |
| Ventricles     | 84599                          |
| Hippocampus    | 5319                           |
|