# Multimodal Parsing using Anthropic Claude (Sonnet 3.5)

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/multimodal/claude_parse.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This cookbook shows you how to use LlamaParse to parse any document with the multimodal capabilities of Sonnet 3.5. 

LlamaParse allows you to plug in external, multimodal model vendors for parsing - we handle the error correction, validation, and scalability/reliability for you.


## Setup

Download the data.

In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
!wget "https://arxiv.org/pdf/2307.09288" -O data/llama2.pdf

--2024-07-11 23:44:38--  https://arxiv.org/pdf/2307.09288
Resolving arxiv.org (arxiv.org)... 151.101.195.42, 151.101.131.42, 151.101.3.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13661300 (13M) [application/pdf]
Saving to: ‘data/llama2.pdf’


2024-07-11 23:44:38 (69.3 MB/s) - ‘data/llama2.pdf’ saved [13661300/13661300]



## Initialize LlamaParse

Initialize LlamaParse in multimodal mode, and specify the vendor.

**NOTE**: optionally you can specify the Anthropic API key. If you do so you will be charged our base LlamaParse price of 0.3c per page. If you don't then you will be charged 6c per page, as we will make the calls to Claude for you.

In [None]:
from llama_index.core.schema import TextNode
from typing import List


def get_text_nodes(json_list: List[dict]):
    text_nodes = []
    for idx, page in enumerate(json_list):
        text_node = TextNode(text=page["text"], metadata={"page": page["page"]})
        text_nodes.append(text_node)
    return text_nodes

In [None]:
from llama_parse import LlamaParse

parser = LlamaParse(
    result_type="markdown",
    use_vendor_multimodal_model=True,
    vendor_multimodal_model_name="anthropic-sonnet-3.5",
    # parsing_instruction="Output any charts in a well-formatted markdown table such that all the vertical/horizontal bars align nicely in a 2D grid.",
    invalidate_cache=True,
)
# json_objs = parser.get_json_result("./data/llama2-p33.pdf")
# json_list = json_objs[0]["pages"]

In [None]:
docs = parser.load_data("./data/llama2-p33.pdf")

Started parsing the file under job_id 6baaaba1-3f58-4fde-92d3-39c0a6ffb254


In [None]:
print(docs[0].get_content(metadata_mode="all"))

The image contains two figures and accompanying text. I'll describe each component in markdown format:

![Graphs showing RLHF adaptation to temperature for factual and creative prompts](image_url_here)

Figure 21: RLHF learns to adapt the temperature with regard to the type of prompt. Lower Self-BLEU corresponds to more diversity: RLHF eliminates diversity in responses to factual prompts but retains more diversity when generating responses to creative prompts. We prompt each model with a diverse set of 10 creative and 10 factual instructions and sample 25 responses. This is repeated for the temperatures T ∈ {k/10 | k ∈ N : 1 ≤ k ≤ 15}. For each of the 25 responses we compute the Self-BLEU metric and report the mean and standard deviation against the temperature.

![Example of time awareness in AI responses](image_url_here)

Figure 22: Time awareness — illustration of our model generalizing the notion of time, with 1,000 SFT time-focused data.

| Date: 01/01/2023 | Year: 2023 | Year: 85

In [None]:
json_list

### Setup GPT-4o baseline

For comparison, we will also parse the document using GPT-4o (3c per page).

In [None]:
from llama_parse import LlamaParse

parser_gpt4o = LlamaParse(
    result_type="markdown",
    use_vendor_multimodal_model=True,
    vendor_multimodal_model="openai-gpt4o",
    # parsing_instruction="Output markdown tables such that all the vertical/horizontal bars align nicely in a 2D grid.",
    invalidate_cache=True,
)
json_objs_gpt4o = parser_gpt4o.get_json_result("./data/llama2-p33.pdf")
json_list_gpt4o = json_objs_gpt4o[0]["pages"]

Started parsing the file under job_id ab86c158-5b10-4086-81de-4741b7cc4b2e


In [None]:
print(json_list_gpt4o[0]["md"])

# Figure 21: RLHF learns to adapt the temperature with regard to the type of prompt.

Lower Self-BLEU corresponds to more diversity: RLHF eliminates diversity in responses to factual prompts but retains more diversity when generating responses to creative prompts. We prompt each model with a diverse set of 10 creative and 10 factual instructions and sample 25 responses. This is repeated for the temperatures \( T \in \{k/10 \mid k \in \mathbb{N} : 1 \leq k \leq 15\} \). For each of the 25 responses we compute the Self-BLEU metric and report the mean and standard deviation against the temperature.

| Temperature | Factual Prompts (Self-BLEU) | Creative Prompts (Self-BLEU) |
|-------------|------------------------------|------------------------------|
| 0.4         | RLHF v3, RLHF v2, RLHF v1, SFT | RLHF v3, RLHF v2, RLHF v1, SFT |
| 0.6         | RLHF v3, RLHF v2, RLHF v1, SFT | RLHF v3, RLHF v2, RLHF v1, SFT |
| 0.8         | RLHF v3, RLHF v2, RLHF v1, SFT | RLHF v3, RLHF v2, RLHF v1, S

## View Results

Let's visualize the results along with the original document page.

We see that Sonnet is able to extract complex visual elements like graphs in way more detail! 

In [None]:
# using Sonnet-3.5
print(docs[32].get_content(metadata_mode="all"))

In [None]:
# using GPT-4o
print(docs_gpt4o[32].get_content(metadata_mode="all"))

## Setup RAG Pipeline

These parsing capabilities translate to great RAG performance as well. Let's setup a RAG pipeline over this data.

(we'll use GPT-4o from OpenAI for the actual text synthesis step).

In [None]:
from llama_index.core import SummaryIndex
from llama_index.llms.openai import OpenAI

index = SummaryIndex.from_documents(docs)
query_engine = index.as_query_engine(llm=llm)

index_gpt4o = SummaryIndex.from_documents(docs_gpt4o)
query_engine_gpt4o = index_gpt4o.as_query_engine(llm=llm)

In [None]:
query = "Tell me more about all the values for each line in the RLHF graph."

response = query_engine.query(query)
response_gpt4o = query_engine_gpt4o.query(query)