# LlamaParse JSON Mode + Multimodal RAG

<a href="https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/parse/demo_json.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook shows you how to use LlamaParse JSON mode with LlamaIndex to build a simple multimodal RAG pipeline.

Using JSON mode gives you back a list of json dictionaries, which contains both text and images. You can then download these images and use a multimodal model to extract information and index them.

Status:
| Last Executed | Version | State      |
|---------------|---------|------------|
| Aug-19-2025   | 0.6.61  | Maintained |

## Setup

Define imports, env variables, global LLM/embedding models.

In [None]:
%pip install llama-index
%pip install "llama-index-core>=0.13.2<0.14.0"
%pip install "llama-index-llms-anthropic>=0.8.4<0.9.0"
%pip install "llama-index-embeddings-huggingface>=0.6.0<0.7.0"
%pip install llama-cloud-services

In [None]:
import os

# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."

# Using Anthropic API for LLMs
os.environ["ANTHROPIC_API_KEY"] = "sk-..."

In [None]:
from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-4-sonnet-20250514")

In [None]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = "local:Qwen/Qwen3-Embedding-0.6B"

  from .autonotebook import tqdm as notebook_tqdm


## Load Data

Let's load in the Uber 10Q report.

In [None]:
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf' -O './uber_10q_march_2022.pdf'

## Using LlamaParse in JSON Mode for PDF Reading

We show you how to run LlamaParse in JSON mode for PDF reading.

In [None]:
from llama_cloud_services import LlamaParse

parser = LlamaParse(
    parse_mode="parse_page_with_agent",
    model="openai-gpt-4-1-mini",
    high_res_ocr=True,
    adaptive_long_table=True,
    outlined_table_extraction=True,
    output_tables_as_HTML=True,
)

result = await parser.aparse("./uber_10q_march_2022.pdf")

Started parsing the file under job_id 33d93a46-1b43-4619-b4ff-0c272cbca4b3
..

In [None]:
text_nodes = await result.aget_text_nodes(split_by_page=True)
image_nodes = await result.aget_image_nodes(
    include_screenshot_images=True,
    include_object_images=False,
    image_download_dir="./uber_10q_images",
)

## Extract/Index images from image dicts

Here we use a multimodal model to caption images and create text nodes for indexing.

In [None]:
from llama_index.core.llms import ChatMessage, ImageBlock, TextBlock
from llama_index.core.schema import ImageNode, TextNode
from llama_index.llms.anthropic import Anthropic


async def get_image_text_nodes(image_nodes: list[ImageNode]):
    """Extract out text from images using a multimodal model."""
    llm = Anthropic(model="claude-3-5-haiku-20241022", max_tokens=300)
    img_text_nodes = []
    for image_node in image_nodes:
        image_path = image_node.image_path
        message = ChatMessage(
            role="user",
            blocks=[
                TextBlock(text="Describe the images as alt text"),
                ImageBlock(path=image_path),
            ],
        )
        response = await llm.achat([message])
        text_node = TextNode(
            text=str(response.message.content), metadata={"path": image_path}
        )
        img_text_nodes.append(text_node)

    return img_text_nodes

In [None]:
image_text_nodes = await get_image_text_nodes(image_nodes)

In [None]:
image_text_nodes[0].get_content()

'Alt text: United States Securities and Exchange Commission Form 10-Q for Uber Technologies, Inc., dated for the quarterly period ended March 31, 2022. The document shows company details including incorporation state (Delaware), address (1515 3rd Street, San Francisco), and indicates Uber is a large accelerated filer listed on the New York Stock Exchange with the trading symbol UBER.'

## Build Index across image and text nodes

Here we build a vector index across both text nodes and text nodes extracted from images.

In [None]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex(text_nodes + image_text_nodes)

In [None]:
query_engine = index.as_query_engine()

In [None]:
# ask question over image!
response = query_engine.query(
    "What does the bar graph titled 'Monthly Active Platform Consumers' show?"
)
print(str(response))

The bar graph titled 'Monthly Active Platform Consumers' shows the growth in platform users measured in millions from Q2 2020 to Q1 2022. The graph demonstrates a steady increase in the number of consumers using the platform, starting at 55 million users in Q2 2020 and rising to 115 million users in Q1 2022. The visualization displays notable growth between quarters, with the vertical axis representing the number of consumers in millions and the horizontal axis showing the quarterly progression over this two-year period.


In [None]:
# ask question over text!
response = query_engine.query("What are the main risk factors for Uber?")
print(str(response))

Based on the financial documents provided, I can identify some key risk factors for Uber, though the context is limited to specific pages:

**Legal and Regulatory Risks:**
- Driver classification issues pose significant business risks, as legal determinations about whether drivers are employees or independent contractors could substantially impact Uber's operations and cost structure.

**Operational Risks:**
- The company continues to report net losses, indicating ongoing profitability challenges across its business segments.

**Business Model Risks:**
- Uber operates across multiple segments (Mobility, Delivery, and Freight), which creates exposure to various market conditions and regulatory environments in different industries.

**Geographic Concentration Risk:**
- The company has operations across different geographic regions, which exposes it to varying regulatory frameworks, economic conditions, and competitive landscapes in different markets.

However, the provided context appear