# Building a Natively Multimodal RAG Pipeline (over a Slide Deck)

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_rag_slide_deck.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this cookbook we show you how to build a multimodal RAG pipeline over a slide deck, with text, tables, images, diagrams, and complex layouts.

A gap of text-based RAG is that they struggle with purely text-based representations of complex documents. For instance, if a page contains a lot of images and diagrams, a text parser would need to rely on raw OCR to extract out text. You can also use a multimodal model (e.g. gpt-4o and up) to do text extraction, but this is inherently a lossy conversion.

Instead a **native multimodal pipeline** stores both a text and image representation of a document chunk. They are indexed via embeddings (text or image), and during synthesis both text and image are directly fed to the multimodal model for synthesis.

This can have the following advantages:
- **Robustness**: This solution is more robust than a pure text or even a pure image-based approach. In a pure text RAG approach, the parsing piece can be lossy. In a pure image-based approach, multimodal OCR is not perfect and may lose out against text parsing for text-heavy documents.
- **Cost Optimization**: You may choose to dynamically include text-only, or text + image depending on the content of the page.

![mm_rag_diagram](./multimodal_rag_slide_deck_img.png)

## Setup

In [1]:
import nest_asyncio

nest_asyncio.apply()

### Setup Observability

We setup an integration with LlamaTrace (integration with Arize).

If you haven't already done so, make sure to create an account here: https://llamatrace.com/login. Then create an API key and put it in the `PHOENIX_API_KEY` variable below.

In [None]:
# !pip install -U llama-index-callbacks-arize-phoenix

In [2]:
import os
import openai

# Configure OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-api-key-here"  # Replace with actual API key
openai.api_key = os.getenv("OPENAI_API_KEY")

# Or

from dotenv import load_dotenv
load_dotenv()

True

In [3]:
# setup Arize Phoenix for logging/observability
import llama_index.core
import os

PHOENIX_API_KEY = "<PHOENIX_API_KEY>"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
    "arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
)

### Load Data

Here we load the [Conoco Phillips 2023 investor meeting slide deck](https://static.conocophillips.com/files/2023-conocophillips-aim-presentation.pdf).

In [6]:
import os
import subprocess
import requests

def download_file(url, output_path):
    try:
        # Try curl first
        subprocess.run(["curl", "-L", url, "-o", output_path], check=True)
        print(f"File downloaded successfully using curl: {output_path}")
    except subprocess.CalledProcessError:
        print("curl is not available. Trying with requests...")
        try:
            # If curl fails, use requests
            response = requests.get(url)
            response.raise_for_status()  # Raises an HTTPError if the HTTP request returned an unsuccessful status code
            with open(output_path, 'wb') as f:
                f.write(response.content)
            print(f"File downloaded successfully using requests: {output_path}")
        except Exception as e:
            print(f"Error downloading file: {e}")
            raise

# Create necessary directories and download the file
os.makedirs("data", exist_ok=True)
os.makedirs("data_images", exist_ok=True)
download_file("https://static.conocophillips.com/files/2023-conocophillips-aim-presentation.pdf", "data/conocophillips.pdf")

File downloaded successfully using curl: data_rag/conocophillips.pdf


### Model Setup

Setup models that will be used for downstream orchestration.

In [7]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(model="text-embedding-3-large")
llm = OpenAI(model="gpt-4o")

Settings.embed_model = embed_model
Settings.llm = llm

## Use LlamaParse to Parse Text and Images

In this example, use LlamaParse to parse both the text and images from the document.

We parse out the text in two ways: 
- in regular `text` mode using our default text layout algorithm
- in `markdown` mode using GPT-4o (`gpt4o_mode=True`). This also allows us to capture page screenshots

In [8]:
from llama_parse import LlamaParse


parser_text = LlamaParse(result_type="text")
parser_gpt4o = LlamaParse(result_type="markdown", gpt4o_mode=True)

In [10]:
print(f"Parsing text...")
docs_text = parser_text.load_data("data/conocophillips.pdf")
print(f"Parsing PDF file...")
md_json_objs = parser_gpt4o.get_json_result("data/conocophillips.pdf")
md_json_list = md_json_objs[0]["pages"]

Parsing text...


INFO:httpx:HTTP Request: POST https://api.cloud.llamaindex.ai/api/parsing/upload "HTTP/1.1 200 OK"


Started parsing the file under job_id a140e3fd-ebda-4ca7-9bfe-212a5152a506


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job

.

INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job

.

INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job

.

INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506 "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a140e3fd-ebda-4ca7-9bfe-212a5152a506/result/text "HTTP/1.1 200 OK"


Parsing PDF file...


INFO:httpx:HTTP Request: POST https://api.cloud.llamaindex.ai/api/parsing/upload "HTTP/1.1 200 OK"


Started parsing the file under job_id a0af2de7-8064-4372-9094-0c91b54ed38f


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/json "HTTP/1.1 200 OK"


In [65]:
print(docs_text[0].get_content())

ConocoPhillips
                2023 Analyst & Investor Meeting


In [66]:
print(md_json_list[10]["md"])

# Commitment to Disciplined Reinvestment Rate

| | Industry Growth Focus | ConocoPhillips Strategy Reset | Disciplined Reinvestment Rate is the Foundation for Superior Returns on and of Capital, while Driving Durable CFO Growth |
|---|---|---|---|
| | >100% Reinvestment Rate | <60% Reinvestment Rate | ~50% 10-Year Reinvestment Rate <br> ~6% CFO CAGR 2024-2032 <br> at $60/BBL WTI Mid-Cycle Planning Price |

| ConocoPhillips Average Annual Reinvestment Rate (%) | 2012-2016 | 2017-2022 | 2023E | 2024-2028 | 2029-2032 |
|---|---|---|---|---|---|
| 100% | | | | | |
| 75% | | | | | |
| 50% | | | | | |
| 25% | | | | | |
| 0% | | | | | |

| | | | | | |
|---|---|---|---|---|---|
| | ~ $75/BBL WTI Average | ~ $63/BBL WTI Average | at $80/BBL WTI | at $60/BBL WTI <br> at $80/BBL WTI | at $60/BBL WTI <br> at $80/BBL WTI |

*Reinvestment rate and cash from operations (CFO) are non-GAAP measures. Definitions and reconciliations are included in the Appendix.*


In [67]:
print(md_json_list[1].keys())

dict_keys(['page', 'md', 'images', 'items'])


In [73]:
image_dicts = parser_gpt4o.get_images(md_json_objs, download_path="data_images")

> Image for page 1: [{'name': 'page-0.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-0.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 1}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-0.jpg "HTTP/1.1 200 OK"


> Image for page 2: [{'name': 'page-1.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-1.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 2}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-1.jpg "HTTP/1.1 200 OK"


> Image for page 3: [{'name': 'page-2.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-2.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 3}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-2.jpg "HTTP/1.1 200 OK"


> Image for page 4: [{'name': 'page-3.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-3.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 4}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-3.jpg "HTTP/1.1 200 OK"


> Image for page 5: [{'name': 'page-4.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-4.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 5}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-4.jpg "HTTP/1.1 200 OK"


> Image for page 6: [{'name': 'page-5.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-5.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 6}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-5.jpg "HTTP/1.1 200 OK"


> Image for page 7: [{'name': 'page-6.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-6.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 7}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-6.jpg "HTTP/1.1 200 OK"


> Image for page 8: [{'name': 'page-7.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-7.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 8}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-7.jpg "HTTP/1.1 200 OK"


> Image for page 9: [{'name': 'page-8.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-8.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 9}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-8.jpg "HTTP/1.1 200 OK"


> Image for page 10: [{'name': 'page-9.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-9.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 10}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-9.jpg "HTTP/1.1 200 OK"


> Image for page 11: [{'name': 'page-10.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-10.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 11}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-10.jpg "HTTP/1.1 200 OK"


> Image for page 12: [{'name': 'page-11.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-11.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 12}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-11.jpg "HTTP/1.1 200 OK"


> Image for page 13: [{'name': 'page-12.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-12.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 13}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-12.jpg "HTTP/1.1 200 OK"


> Image for page 14: [{'name': 'page-13.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-13.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 14}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-13.jpg "HTTP/1.1 200 OK"


> Image for page 15: [{'name': 'page-14.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-14.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 15}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-14.jpg "HTTP/1.1 200 OK"


> Image for page 16: [{'name': 'page-15.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-15.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 16}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-15.jpg "HTTP/1.1 200 OK"


> Image for page 17: [{'name': 'page-16.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-16.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 17}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-16.jpg "HTTP/1.1 200 OK"


> Image for page 18: [{'name': 'page-17.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-17.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 18}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-17.jpg "HTTP/1.1 200 OK"


> Image for page 19: [{'name': 'page-18.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-18.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 19}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-18.jpg "HTTP/1.1 200 OK"


> Image for page 20: [{'name': 'page-19.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-19.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 20}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-19.jpg "HTTP/1.1 200 OK"


> Image for page 21: [{'name': 'page-20.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-20.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 21}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-20.jpg "HTTP/1.1 200 OK"


> Image for page 22: [{'name': 'page-21.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-21.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 22}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-21.jpg "HTTP/1.1 200 OK"


> Image for page 23: [{'name': 'page-22.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-22.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 23}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-22.jpg "HTTP/1.1 200 OK"


> Image for page 24: [{'name': 'page-23.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-23.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 24}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-23.jpg "HTTP/1.1 200 OK"


> Image for page 25: [{'name': 'page-24.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-24.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 25}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-24.jpg "HTTP/1.1 200 OK"


> Image for page 26: [{'name': 'page-25.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-25.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 26}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-25.jpg "HTTP/1.1 200 OK"


> Image for page 27: [{'name': 'page-26.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-26.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 27}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-26.jpg "HTTP/1.1 200 OK"


> Image for page 28: [{'name': 'page-27.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-27.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 28}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-27.jpg "HTTP/1.1 200 OK"


> Image for page 29: [{'name': 'page-28.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-28.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 29}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-28.jpg "HTTP/1.1 200 OK"


> Image for page 30: [{'name': 'page-29.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-29.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 30}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-29.jpg "HTTP/1.1 200 OK"


> Image for page 31: [{'name': 'page-30.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-30.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 31}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-30.jpg "HTTP/1.1 200 OK"


> Image for page 32: [{'name': 'page-31.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-31.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 32}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-31.jpg "HTTP/1.1 200 OK"


> Image for page 33: [{'name': 'page-32.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-32.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 33}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-32.jpg "HTTP/1.1 200 OK"


> Image for page 34: [{'name': 'page-33.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-33.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 34}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-33.jpg "HTTP/1.1 200 OK"


> Image for page 35: [{'name': 'page-34.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-34.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 35}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-34.jpg "HTTP/1.1 200 OK"


> Image for page 36: [{'name': 'page-35.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-35.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 36}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-35.jpg "HTTP/1.1 200 OK"


> Image for page 37: [{'name': 'page-36.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-36.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 37}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-36.jpg "HTTP/1.1 200 OK"


> Image for page 38: [{'name': 'page-37.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-37.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 38}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-37.jpg "HTTP/1.1 200 OK"


> Image for page 39: [{'name': 'page-38.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-38.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 39}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-38.jpg "HTTP/1.1 200 OK"


> Image for page 40: [{'name': 'page-39.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-39.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 40}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-39.jpg "HTTP/1.1 200 OK"


> Image for page 41: [{'name': 'page-40.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-40.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 41}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-40.jpg "HTTP/1.1 200 OK"


> Image for page 42: [{'name': 'page-41.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot', 'path': 'data_images_rag\\a0af2de7-8064-4372-9094-0c91b54ed38f-page-41.jpg', 'job_id': 'a0af2de7-8064-4372-9094-0c91b54ed38f', 'original_pdf_path': 'data_rag/conocophillips.pdf', 'page_number': 42}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-41.jpg "HTTP/1.1 200 OK"


> Image for page 43: [{'name': 'page-42.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-42.jpg "HTTP/1.1 200 OK"


> Image for page 44: [{'name': 'page-43.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-43.jpg "HTTP/1.1 200 OK"


> Image for page 45: [{'name': 'page-44.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-44.jpg "HTTP/1.1 200 OK"


> Image for page 46: [{'name': 'page-45.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-45.jpg "HTTP/1.1 200 OK"


> Image for page 47: [{'name': 'page-46.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-46.jpg "HTTP/1.1 200 OK"


> Image for page 48: [{'name': 'page-47.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-47.jpg "HTTP/1.1 200 OK"


> Image for page 49: [{'name': 'page-48.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-48.jpg "HTTP/1.1 200 OK"


> Image for page 50: [{'name': 'page-49.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-49.jpg "HTTP/1.1 200 OK"


> Image for page 51: [{'name': 'page-50.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-50.jpg "HTTP/1.1 200 OK"


> Image for page 52: [{'name': 'page-51.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-51.jpg "HTTP/1.1 200 OK"


> Image for page 53: [{'name': 'page-52.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-52.jpg "HTTP/1.1 200 OK"


> Image for page 54: [{'name': 'page-53.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-53.jpg "HTTP/1.1 200 OK"


> Image for page 55: [{'name': 'page-54.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-54.jpg "HTTP/1.1 200 OK"


> Image for page 56: [{'name': 'page-55.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-55.jpg "HTTP/1.1 200 OK"


> Image for page 57: [{'name': 'page-56.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-56.jpg "HTTP/1.1 200 OK"


> Image for page 58: [{'name': 'page-57.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-57.jpg "HTTP/1.1 200 OK"


> Image for page 59: [{'name': 'page-58.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-58.jpg "HTTP/1.1 200 OK"


> Image for page 60: [{'name': 'page-59.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-59.jpg "HTTP/1.1 200 OK"


> Image for page 61: [{'name': 'page-60.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-60.jpg "HTTP/1.1 200 OK"


> Image for page 62: [{'name': 'page-61.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0, 'type': 'full_page_screenshot'}]


INFO:httpx:HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/a0af2de7-8064-4372-9094-0c91b54ed38f/result/image/page-61.jpg "HTTP/1.1 200 OK"


## Build Multimodal Index

In this section we build the multimodal index over the parsed deck. 

We do this by creating **text** nodes from the document that contain metadata referencing the original image path.

In this example we're indexing the text node for retrieval. The text node has a reference to both the parsed text as well as the image screenshot.

#### Get Text Nodes

In [74]:
from llama_index.core.schema import TextNode
from typing import Optional

In [79]:
# get pages loaded through llamaparse
import re


def get_page_number(file_name):
    match = re.search(r"-page-(\d+)\.jpg$", str(file_name))
    if match:
        return int(match.group(1))
    return 0


def _get_sorted_image_files(image_dir):
    """Get image files sorted by page."""
    raw_files = [f for f in list(Path(image_dir).iterdir()) if f.is_file()]
    sorted_files = sorted(raw_files, key=get_page_number)
    return sorted_files

In [86]:
import os

image_files = os.listdir("data_images)
print(f"Number of image files: {len(image_files)}")
print("First few image files:")
for file in image_files[:5]:
    print(file)

Number of image files: 62
First few image files:
a0af2de7-8064-4372-9094-0c91b54ed38f-page-0.jpg
a0af2de7-8064-4372-9094-0c91b54ed38f-page-1.jpg
a0af2de7-8064-4372-9094-0c91b54ed38f-page-10.jpg
a0af2de7-8064-4372-9094-0c91b54ed38f-page-11.jpg
a0af2de7-8064-4372-9094-0c91b54ed38f-page-12.jpg


In [87]:
from copy import deepcopy
from pathlib import Path


# attach image metadata to the text nodes
def get_text_nodes(docs, image_dir=None, json_dicts=None):
    nodes = []
    image_files = _get_sorted_image_files(image_dir) if image_dir is not None else None
    md_texts = [d["md"] for d in json_dicts] if json_dicts is not None else None

    for doc_idx, doc in enumerate(docs):
        doc_chunks = doc.text.split("---")
        for chunk_idx, doc_chunk in enumerate(doc_chunks):
            page_num = doc_idx * len(doc_chunks) + chunk_idx + 1
            chunk_metadata = {"page_num": page_num}
            if image_files is not None and page_num <= len(image_files):
                chunk_metadata["image_path"] = str(image_files[page_num - 1])
            if md_texts is not None and page_num <= len(md_texts):
                chunk_metadata["parsed_text_markdown"] = md_texts[page_num - 1]
            chunk_metadata["parsed_text"] = doc_chunk
            node = TextNode(
                text="",
                metadata=chunk_metadata,
            )
            nodes.append(node)

    return nodes

In [88]:
text_nodes = get_text_nodes(docs_text, image_dir="data_images", json_dicts=md_json_list)

print(f"Number of text nodes: {len(text_nodes)}")

if len(text_nodes) > 0:
    last_index = len(text_nodes) - 1
    print(f"\nContent of the last node (index {last_index}):")
    print(text_nodes[last_index].get_content(metadata_mode="all"))
else:
    print("The text_nodes list is empty.")

# Print content of the 10th node (if it exists)
if len(text_nodes) > 10:
    print("\nContent of the 10th node (index 9):")
    print(text_nodes[9].get_content(metadata_mode="all"))
else:
    print("\nThere are fewer than 11 nodes.")

Number of text nodes: 62

Content of the last node (index 61):
page_num: 62
image_path: data_images_rag\a0af2de7-8064-4372-9094-0c91b54ed38f-page-61.jpg
parsed_text_markdown: # Definitions

## Other Terms

**Cost of Supply** is the WTI equivalent price that generates a 10% after-tax return on a point-forward and fully burdened basis. Fully burdened includes capital infrastructure, foreign exchange, price-related inflation, G&A and carbon tax (if currently assessed). If no carbon tax exists for the asset, carbon pricing aligned with internal energy scenarios are applied. All barrels of resource in the Cost of Supply calculation are discounted at 10%.

**Distributions** is defined as the total of the ordinary dividend, share repurchases and variable return of cash (VROC). Also referred to as return of capital.

**Free cash flow breakeven** is the WTI price at which cash from operations equals capital expenditures and investments. Also referred to as capital breakeven. Cash from operation

In [90]:
print(text_nodes[10].get_content(metadata_mode="all"))

page_num: 11
image_path: data_images_rag\a0af2de7-8064-4372-9094-0c91b54ed38f-page-10.jpg
parsed_text_markdown: # Commitment to Disciplined Reinvestment Rate

| | Industry Growth Focus | ConocoPhillips Strategy Reset | Disciplined Reinvestment Rate is the Foundation for Superior Returns on and of Capital, while Driving Durable CFO Growth |
|---|---|---|---|
| | >100% Reinvestment Rate | <60% Reinvestment Rate | ~50% 10-Year Reinvestment Rate <br> ~6% CFO CAGR 2024-2032 <br> at $60/BBL WTI Mid-Cycle Planning Price |

| ConocoPhillips Average Annual Reinvestment Rate (%) | 2012-2016 | 2017-2022 | 2023E | 2024-2028 | 2029-2032 |
|---|---|---|---|---|---|
| 100% | | | | | |
| 75% | | | | | |
| 50% | | | | | |
| 25% | | | | | |
| 0% | | | | | |

| | | | | | |
|---|---|---|---|---|---|
| | ~ $75/BBL WTI Average | ~ $63/BBL WTI Average | at $80/BBL WTI | at $60/BBL WTI <br> at $80/BBL WTI | at $60/BBL WTI <br> at $80/BBL WTI |

*Reinvestment rate and cash from operations (CFO) are non-GAAP me

#### Build Index

Once the text nodes are ready, we feed into our vector store index abstraction, which will index these nodes into a simple in-memory vector store (of course, you should definitely check out our 40+ vector store integrations!)

In [91]:
import os
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)

if not os.path.exists("storage_nodes"):
    index = VectorStoreIndex(text_nodes, embed_model=embed_model)
    # save index to disk
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage_nodes")
else:
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="storage_nodes")
    # load index
    index = load_index_from_storage(storage_context, index_id="vector_index")

retriever = index.as_retriever()

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


## Build Multimodal Query Engine

We now use LlamaIndex abstractions to build a **custom query engine**. In contrast to a standard RAG query engine that will retrieve the text node and only put that into the prompt (response synthesis module), this custom query engine will also load the image document, and put both the text and image document into the response synthesis module.

In [109]:
from llama_index.core.query_engine import CustomQueryEngine, SimpleMultiModalQueryEngine
from llama_index.core.retrievers import BaseRetriever
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.schema import ImageNode, NodeWithScore, MetadataMode
from llama_index.core.prompts import PromptTemplate
from llama_index.core.base.response.schema import Response
from typing import Optional


gpt_4o = OpenAIMultiModal(model="gpt-4o", max_new_tokens=4096)

QA_PROMPT_TMPL = """\
Below we give parsed text from slides in two different formats, as well as the image.

We parse the text in both 'markdown' mode as well as 'raw text' mode. Markdown mode attempts \
to convert relevant diagrams into tables, whereas raw text tries to maintain the rough spatial \
layout of the text.

Use the image information first and foremost. ONLY use the text/markdown information 
if you can't understand the image.

---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query. Explain whether you got the answer
from the parsed markdown or raw text or image, and if there's discrepancies, and your reasoning for the final answer.

Query: {query_str}
Answer: """

QA_PROMPT = PromptTemplate(QA_PROMPT_TMPL)


class MultimodalQueryEngine(CustomQueryEngine):
    """Custom multimodal Query Engine.

    Takes in a retriever to retrieve a set of document nodes.
    Also takes in a prompt template and multimodal model.

    """

    qa_prompt: PromptTemplate
    retriever: BaseRetriever
    multi_modal_llm: OpenAIMultiModal

    def __init__(self, qa_prompt: Optional[PromptTemplate] = None, **kwargs) -> None:
        """Initialize."""
        super().__init__(qa_prompt=qa_prompt or QA_PROMPT, **kwargs)

    def custom_query(self, query_str: str):
        # retrieve text nodes
        nodes = self.retriever.retrieve(query_str)
        # create ImageNode items from text nodes
        image_nodes = [
            NodeWithScore(node=ImageNode(image_path=n.metadata["image_path"]))
            for n in nodes
        ]

        # create context string from text nodes, dump into the prompt
        context_str = "\n\n".join(
            [r.get_content(metadata_mode=MetadataMode.LLM) for r in nodes]
        )
        fmt_prompt = self.qa_prompt.format(context_str=context_str, query_str=query_str)

        # synthesize an answer from formatted text and images
        llm_response = self.multi_modal_llm.complete(
            prompt=fmt_prompt,
            image_documents=[image_node.node for image_node in image_nodes],
        )
        return Response(
            response=str(llm_response),
            source_nodes=nodes,
            metadata={"text_nodes": nodes, "image_nodes": image_nodes},
        )

In [110]:
query_engine = MultimodalQueryEngine(
    retriever=index.as_retriever(similarity_top_k=9), multi_modal_llm=gpt_4o
)

### Define Baseline

In addition, we define a "baseline" where we rely only on text-based indexing. Here we define an index using only the nodes that are parsed in text-mode from LlamaParse. 

**NOTE**: We don't currently include the markdown-parsed text because that was parsed with GPT-4o, so already uses a multimodal model during the text extraction phase.

It is of course a valid experiment to compare RAG where multimodal extraction only happens during indexing, vs. the current multimodal RAG implementation where images are fed during synthesis to the LLM. 

In [111]:
def get_nodes(docs):
    """Split docs into nodes, by separator."""
    nodes = []
    for doc in docs:
        doc_chunks = doc.text.split("\n---\n")
        for doc_chunk in doc_chunks:
            node = TextNode(
                text=doc_chunk,
                metadata=deepcopy(doc.metadata),
            )
            nodes.append(node)

    return nodes

In [112]:
base_nodes = get_nodes(docs_text)

In [113]:
print(base_nodes[13].get_content(metadata_mode="all"))

Our Differentiated Portfolio: Deep; Durable and Diverse
                              20 BBOE of Resource                                           Diverse Production Base
                            Under $40/BBL Cost of Supply                              10-Year Plan Cumulative Production (BBOE)
      S50                   S32/BBL                                                Lower 48                           Alaska
                    Average Cost of Supply
  3 $40                                                                                                                       GKA        GWA
                                                                                                                      GPA     WNS
      $30                                                                                                             EMENA
  3                                                                                                                              Norway
 

In [114]:
base_index = VectorStoreIndex(base_nodes, embed_model=embed_model)
base_query_engine = base_index.as_query_engine(llm=llm, similarity_top_k=9)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


## Build a Multimodal Agent

Build an agent around the multimodal query engine. This gives you agent capabilities like query planning/decomposition and memory around a central QA interface.

In [115]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent

vector_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="vector_tool",
        description="Useful for retrieving specific context from the data. Do NOT select if question asks for a summary of the data."
    )
)

agent = ReActAgent.from_tools([vector_tool], llm=llm, verbose=True)

In [118]:
base_vector_tool = QueryEngineTool(
    query_engine=base_query_engine,
    metadata=ToolMetadata(
        name="vector_tool",
        description="Useful for retrieving specific context from the data. Do NOT select if question asks for a summary of the data."
    )
)

base_agent = ReActAgent.from_tools([base_vector_tool], llm=llm, verbose=True)

## Try out Queries

Let's try out queries against these documents and compare against each other.

In [117]:
response = agent.chat("How does the Conoco Phillips capex/EUR in the delaware basin compare against other competitors?")
print(str(response))

if hasattr(response, 'source_nodes') and response.source_nodes:
    print("\nSource nodes found:")
    for i, node in enumerate(response.source_nodes):
        print(f"\nNode {i}:")
        print(node.get_content(metadata_mode="all"))
else:
    print("\nNo source nodes found in the response.")

# Also, let's print the full response to see what we got
print("\nFull response:")
print(response)

> Running step 73d53dd0-1680-4102-9c4a-503bba8b0327. Step input: How does the Conoco Phillips capex/EUR in the delaware basin compare against other competitors?


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[1;3;38;5;200mThought: The current language of the user is English. I need to use a tool to help me answer the question.
Action: vector_tool
Action Input: {'input': 'Conoco Phillips capex/EUR in the Delaware Basin compared to competitors'}
[0m

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[1;3;34mObservation: ConocoPhillips' capex/EUR in the Delaware Basin is $8/BOE, which is lower compared to its competitors. The competitors' capex/EUR ranges from $10/BOE to $24/BOE.

This information was obtained from the image on page 38, which shows a bar chart comparing the capex/EUR of ConocoPhillips and its competitors in the Delaware Basin. The parsed markdown text also confirms this data, listing ConocoPhillips at $8/BOE and competitors ranging from $10/BOE to $24/BOE. There are no discrepancies between the image and the parsed markdown text.
[0m> Running step f551acef-d4d6-494d-8356-e716563189d0. Step input: None


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer.
Answer: ConocoPhillips' capex/EUR in the Delaware Basin is $8/BOE, which is lower compared to its competitors. The competitors' capex/EUR ranges from $10/BOE to $24/BOE.
[0mConocoPhillips' capex/EUR in the Delaware Basin is $8/BOE, which is lower compared to its competitors. The competitors' capex/EUR ranges from $10/BOE to $24/BOE.

Source nodes found:

Node 0:
page_num: 38
image_path: data_images_rag\a0af2de7-8064-4372-9094-0c91b54ed38f-page-37.jpg
parsed_text_markdown: # Delaware: Vast Inventory with Proven Track Record of Performance

## Prolific Acreage Spanning Over ~659,000 Net Acres¹

![Map of Delaware Basin](image)

### Total 10-Year Operated Permian Inventory

- Delaware Basin: 65%
- Midland Basin: 35%

### High Single-Digit Production Growth

## 12-Month Cumulative Production³ (BOE/FT)

| Months | 2019 | 2020 | 2021 | 2022 |
|--------|------|------|------|------|
| 1  

In [119]:
# Query the base agent
base_response = base_agent.chat(
    "How does the Conoco Phillips capex/EUR in the delaware basin compare against other competitors?"
)
print(str(base_response))

# Print source nodes if available
if hasattr(base_response, 'source_nodes') and base_response.source_nodes:
    print("\nSource nodes found:")
    for i, node in enumerate(base_response.source_nodes):
        print(f"\nNode {i}:")
        print(node.get_content(metadata_mode="llm"))
else:
    print("\nNo source nodes found in the response.")

# Print the full response
print("\nFull response:")
print(base_response)

> Running step d6f1ee48-be2d-413a-b7ca-2ac39771dff9. Step input: How does the Conoco Phillips capex/EUR in the delaware basin compare against other competitors?


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: vector_tool
Action Input: {'input': 'Conoco Phillips capex/EUR in the Delaware Basin compared to competitors'}
[0m

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[1;3;34mObservation: ConocoPhillips' capex/EUR in the Delaware Basin is lower compared to its competitors.
[0m> Running step acfde3aa-a6f9-4aa7-8613-9389f8e86646. Step input: None


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[1;3;38;5;200mThought: I have the information needed to answer the question.
Answer: ConocoPhillips' capex/EUR in the Delaware Basin is lower compared to its competitors.
[0mConocoPhillips' capex/EUR in the Delaware Basin is lower compared to its competitors.

Source nodes found:

Node 0:
Delaware: Vast Inventory with Proven Track Record of Performance
        New                       Prolific Acreage Spanning Over                                                        12-Month Cumulative Production? (BOE/FT)
       Mexico                                 659,000 Net Acres'                                             40
                       Texas                                                                                                                                                                       3828
                                                                                                             30                                                           