# Multimodal Parsing using Anthropic Claude 

This cookbook shows you how to use LlamaParse to parse any document with the multimodal capabilities of Sonnet 3.5. 

LlamaParse allows you to plug in external, multimodal model vendors for parsing - we handle the error correction, validation, and scalability/reliability for you.


## Setup

Download the data.

In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
!wget "https://arxiv.org/pdf/2307.09288" -O data/llama2.pdf

--2024-07-11 23:44:38--  https://arxiv.org/pdf/2307.09288
Resolving arxiv.org (arxiv.org)... 151.101.195.42, 151.101.131.42, 151.101.3.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13661300 (13M) [application/pdf]
Saving to: ‘data/llama2.pdf’


2024-07-11 23:44:38 (69.3 MB/s) - ‘data/llama2.pdf’ saved [13661300/13661300]



## Initialize LlamaParse

Initialize LlamaParse in multimodal mode, and specify the vendor.

**NOTE**: optionally you can specify the Anthropic API key. If you do so you will be charged our base LlamaParse price of 0.3c per page. If you don't then you will be charged 6c per page, as we will make the calls to Claude for you.

In [None]:
from llama_parse import LlamaParse

parser = LlamaParse(
    use_multimodal_model=True, vendor_multimodal_model="anthropic-sonnet-3.5"
)
docs = parser.load_data("./data/llama2.pdf")

In [None]:
print(docs[0].get_content())

### Setup GPT-4o baseline

For comparison, we will also parse the document using GPT-4o (3c per page).

In [None]:
from llama_parse import LlamaParse

parser_gpt4o = LlamaParse(
    use_multimodal_model=True, vendor_multimodal_model="openai-gpt4o"
)
docs_gpt4o = parser.load_data("./data/llama2.pdf")

## View Results

Let's visualize the results along with the original document page.

We see that Sonnet is able to extract complex visual elements like graphs in way more detail! 

In [None]:
# using Sonnet-3.5
print(docs[32].get_content(metadata_mode="all"))

In [None]:
# using GPT-4o
print(docs_gpt4o[32].get_content(metadata_mode="all"))

## Setup RAG Pipeline

These parsing capabilities translate to great RAG performance as well. Let's setup a RAG pipeline over this data.

(we'll use GPT-4o from OpenAI for the actual text synthesis step).

In [None]:
from llama_index.core import SummaryIndex
from llama_index.llms.openai import OpenAI

index = SummaryIndex.from_documents(docs)
query_engine = index.as_query_engine(llm=llm)

index_gpt4o = SummaryIndex.from_documents(docs_gpt4o)
query_engine_gpt4o = index_gpt4o.as_query_engine(llm=llm)

In [None]:
query = "Tell me more about all the values for each line in the RLHF graph."

response = query_engine.query(query)
response_gpt4o = query_engine_gpt4o.query(query)