# LlamaParse JSON Mode + Multimodal RAG

Use LlamaParse JSON mode with LlamaIndex to build a simple multimodal RAG pipeline.

Using JSON mode gives you back a list of json dictionaries, which contains both text and images. You can then download these images and use a multimodal model to extract information and index them.

## Setup
Define imports, env variables, global LLM/embedding models.


In [1]:
!pip install llama-index
!pip install llama-index-core
!pip install llama-index-llms-anthropic llama-index-multi-modal-llms-anthropic
!pip install llama-index-embeddings-huggingface
!pip install llama-parse

Collecting llama-index
  Downloading llama_index-0.10.23-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.2.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.1.7-py3-none-any.whl.metadata (644 bytes)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.11-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.11.0,>=0.10.23 (from llama-index)
  Downloading llama_index_core-0.10.23.post1-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.7-py3-none-any.whl.metadata (603 bytes)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.1.5-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48-py3-none-any.whl.metadata (8.5 kB)


In [9]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()

import os
# API access to llama-cloud
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("LLamaCloud")
os.environ["LLAMA_CLOUD_API_KEY"] = secret_value_0#"llx-"

user_secrets = UserSecretsClient()
secret_value_1 = user_secrets.get_secret("Anthropic")
# Using Anthropic API for embeddings/LLMs
os.environ["ANTHROPIC_API_KEY"] = secret_value_1 # "sk-"

In [10]:
from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-3-opus-20240229", temperature=0.0)

In [11]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = "local:BAAI/bge-small-en-v1.5"

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

## Load Data
Let's load in the Uber 10Q report.

In [12]:
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf' -O './uber_10q_march_2022.pdf'


  pid, fd = os.forkpty()


--2024-03-24 23:17:52--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1260185 (1.2M) [application/octet-stream]
Saving to: './uber_10q_march_2022.pdf'


2024-03-24 23:17:52 (33.1 MB/s) - './uber_10q_march_2022.pdf' saved [1260185/1260185]



## Using LlamaParse in JSON Mode for PDF Reading

We show you how to run LlamaParse in JSON mode for PDF reading.

In [13]:
from llama_parse import LlamaParse

parser = LlamaParse(verbose=True)
json_objs = parser.get_json_result("/kaggle/working/uber_10q_march_2022.pdf")
json_list = json_objs[0]["pages"]

Started parsing the file under job_id 2d487d9e-8fb5-46f1-86d7-94f7152b69b0


In [27]:
len(json_list)

106

In [15]:
from llama_index.core.schema import TextNode
from typing import List


def get_text_nodes(json_list: List[dict]):
    text_nodes = []
    for idx, page in enumerate(json_list):
        text_node = TextNode(
            text=page["text"],
            metadata={
                "page": page["page"]
            }
        )
        text_nodes.append(text_node)
    return text_nodes

In [16]:
text_nodes = get_text_nodes(json_list)

In [17]:
text_nodes[0]

TextNode(id_='bbc1f87a-88d0-44bb-9a45-d6b99fcc63b6', embedding=None, metadata={'page': 1}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="                                                                                         UNITED STATES\n                                                    SECURITIES AND EXCHANGE COMMISSION\n                                                                                         Washington, D.C. 20549\n                                                                        ____________________________________________\n                                                                                               FORM 10-Q\n                                                                        ____________________________________________\n(Mark One)\n☒ QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\n                                                                       

## Extract/Index images from image dicts
Here we use a multimodal model to extract and index images from image dictionaries.

In [18]:
# call get_images on parser, convert to ImageDocuments
!mkdir llama2_images

from llama_index.core.schema import ImageDocument
from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal


def get_image_text_nodes(json_objs: List[dict]):
    """Extract out text from images using a multimodal model."""
    anthropic_mm_llm = AnthropicMultiModal(max_tokens=300)
    image_dicts = parser.get_images(json_objs, download_path="llama2_images")
    image_documents = []
    img_text_nodes = []
    for image_dict in image_dicts:
        image_doc = ImageDocument(image_path=image_dict["path"])
        response = anthropic_mm_llm.complete(
            prompt="Describe the images as an alternative text",
            image_documents=[image_doc],
        )
        text_node = TextNode(
            text=str(response),
            metadata={"path": image_dict["path"]}
        )
        img_text_nodes.append(text_node)
    return img_text_nodes

In [19]:
image_text_nodes = get_image_text_nodes(json_objs)

> Image for page 1: []
> Image for page 2: []
> Image for page 3: []
> Image for page 4: []
> Image for page 5: []
> Image for page 6: []
> Image for page 7: []
> Image for page 8: []
> Image for page 9: []
> Image for page 10: []
> Image for page 11: []
> Image for page 12: []
> Image for page 13: []
> Image for page 14: []
> Image for page 15: []
> Image for page 16: []
> Image for page 17: []
> Image for page 18: []
> Image for page 19: []
> Image for page 20: []
> Image for page 21: []
> Image for page 22: []
> Image for page 23: []
> Image for page 24: []
> Image for page 25: []
> Image for page 26: []
> Image for page 27: []
> Image for page 28: []
> Image for page 29: []
> Image for page 30: []
> Image for page 31: []
> Image for page 32: []
> Image for page 33: []
> Image for page 34: []
> Image for page 35: []
> Image for page 36: []
> Image for page 37: []
> Image for page 38: []
> Image for page 39: []
> Image for page 40: []
> Image for page 41: []
> Image for page 42: []
>

In [21]:
image_text_nodes[1].get_content()

'The image shows a bar graph of Gross Bookings (in millions of dollars) over time from Q2 2020 to Q1 2022, broken down by category: Mobility, Delivery, Freight, and All Other.\n\nThe total Gross Bookings start at $10,224 million in Q2 2020 and generally increase each quarter, reaching $26,449 million by Q1 2022.\n\nMobility bookings declined sharply in 2020 during the pandemic but have steadily recovered. Delivery bookings grew substantially, overtaking Mobility by Q4 2020. Freight and All Other remain small portions of the total.\n\nThe graph uses shades of gray to distinguish the categories and has data labels above each quarterly bar to show the total Gross Bookings amount.'

## Build Index across image and text nodes
Here we build a vector index across both text nodes and text nodes extracted from images.

In [22]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex(text_nodes + image_text_nodes)

In [23]:
query_engine = index.as_query_engine()

In [25]:
# ask question over image! 
response = query_engine.query("What does the bar graph titled 'Monthly Active Platform Consumers' show?") 
print(str(response)) 

The bar graph titled 'Monthly Active Platform Consumers (in millions)' shows the number of monthly active consumers on Uber's platform over a two year period from Q2 2020 to Q1 2022. The graph covers 8 quarters in total.

In Q2 2020, there were 55 million monthly active platform consumers. This number steadily increased each quarter over the two year period shown, ultimately reaching 115 million monthly active consumers by Q1 2022.


In [26]:
# ask question over text! 
response = query_engine.query("What are the main risk factors for Uber?") 
print(str(response)) 

Based on the context provided, some of the main risk factors for Uber include:

- A significant percentage of Uber's bookings come from large metropolitan areas, which could be negatively impacted by various economic, social, weather, regulatory and other conditions, including COVID-19.

- Uber may fail to successfully offer autonomous vehicle technologies on its platform or these technologies may not perform as expected. 

- Retaining and attracting high-quality personnel is important for Uber's business and continued attrition could adversely impact the company.

- Security breaches, data privacy issues, cyberattacks and unauthorized access to Uber's proprietary data and systems pose risks.

- Uber is subject to climate change risks, both physical and transitional, that could adversely impact its business if not managed properly. 

- Uber relies on third parties for open marketplaces to distribute its platform and software, and interference from these third parties could harm its bus