# LlamaParse `JobResult` Tour

<a href="https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/demo_json.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The `JobResult` object is the main object returned by the LlamaParse API. It contains all the information about the job, including the parsed data, metadata, and any errors.

This notebook walks through each component of the `JobResult` object and shows you what it contains.

Status:
| Last Executed | Version | State      |
|---------------|---------|------------|
| Aug-19-2025   | 0.6.61  | Maintained |

## Setup

Let's bring in our imports and set up our API keys.

In [None]:
%pip install llama-cloud-services

In [None]:
import os

# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-.."

## Load Data

Let's load a large and complex PDF, San Francisco's 2023 proposed budget.

In [None]:
!wget 'https://www.dropbox.com/scl/fi/vip161t63s56vd94neqlt/2023-CSF_Proposed_Budget_Book_June_2023_Master_Web.pdf?rlkey=hemoce3w1jsuf6s2bz87g549i&dl=0' -O './san_francisco_budget_2023.pdf'

## Using LlamaParse for Basic PDF Parsing

Let's parse our document!

In [None]:
from llama_cloud_services import LlamaParse

parser = LlamaParse(
    parse_mode="parse_page_with_agent",
    model="openai-gpt-4-1-mini",
    high_res_ocr=True,
    adaptive_long_table=True,
    outlined_table_extraction=True,
    output_tables_as_HTML=True,
)
result = await parser.aparse("./san_francisco_budget_2023.pdf")

Started parsing the file under job_id d12d419a-52fc-400c-9f88-f61b352d3fb2


Every job will come back with some metadata about the job:

In [None]:
result.job_metadata

JobMetadata(job_credits_usage=0, job_pages=0, job_auto_mode_triggered_pages=0, job_is_cache_hit=True)

Since this was a re-run, I can see that a cache hit occurred. Jobs are cached for 48 hours by default.

Beyond this, we can explore the parsed data per-page:

In [None]:
print(len(result.pages))

362


In [None]:
print(result.pages[0].model_dump().keys())

dict_keys(['page', 'text', 'md', 'images', 'charts', 'tables', 'layout', 'items', 'status', 'links', 'width', 'height', 'triggeredAutoMode', 'parsingMode', 'structuredData', 'noStructuredContent', 'noTextContent'])


Inside the page object, you can see nearly every detail about the page.

Most of these will depend on the settings you used when parsing. Since we used the default settings, we get the text and markdown for each page, as well as a list of all the elements on the page.

* `page`: this is simply the page number, starting at 1.
* `text`: this is the text of the page, as extracted by the parser.
* `images`: this is an array of all the images on the page, including metadata and text OCRed out of the images, as well as a full-page screenshot of the entire page.
* `charts`: this is an array of all the charts on the page, including metadata and text OCRed out of the charts, as well as a full-page screenshot of the entire chart.
* `layout`: this is an array of all the layout elements on the page, if you are using layout mode.
* `items`: This is an array of all the parsed elements on the page, as used to render the markdown, but separated out into their own objects. This is useful if you want to do more processing on the data.
* `links`: this is an array of all the links on the page, if you are used `annotate_links=True`
* `status`: this is the status of the page, which is usually "OK" unless there was an error processing the page.
* `width` and `height`: these are the dimensions of the page in pixels.
* `parsingMode`: Contains the specific parsing mode that was used for the page.
* `triggeredAutoMode`: this indicates whether the page triggered auto mode; see [LlamaParse docs](https://docs.cloud.llamaindex.ai/llamaparse/getting_started) for more details.
* `structuredData`/`noStructuredContent`: these are set if you are using structured mode; see [LlamaParse docs](https://docs.cloud.llamaindex.ai/llamaparse/getting_started) for more details.
* `noTextContent`: this is true if the page was empty of text.


In [None]:
print(result.pages[0].text[:1000])

                        CITY & COUNTY OF SAN FRANCISCO, CALIFORNIA
 PROPOSED BUDGET
                                                              FISCAL YEARS 2023-2024 & 2024-2025
                                                                                LONDON  N. BREED
               MAYOR’S OFFICE OF PUBLIC POLICY AND FINANCE
         Anna Duning, Director of Mayor’s                                   Fisher Zhu, Fiscal and Policy Analyst
         Office of Public Policy and Finance                             Anya Shutovska, Fiscal and Policy Analyst
         Sally Ma, Deputy Budget Director
Radhika Mehlotra, Senior Fiscal and Policy Analyst                        Jack English, Fiscal and Policy Analyst
     Damon Daniels, Fiscal and Policy Analyst                           Xang Hang, Junior Fiscal and Policy Analyst
    Matthew Puckett, Fiscal and Policy Analyst                       Tabitha Romero-Bothi, Fiscal and Policy Assistant


In [None]:
print(result.pages[0].md[:1000])

# CITY & COUNTY OF SAN FRANCISCO, CALIFORNIA

# PROPOSED BUDGET

# FISCAL YEARS 2023-2024 & 2024-2025

# LONDON N. BREED

# MAYOR’S OFFICE OF PUBLIC POLICY AND FINANCE

Anna Duning, Director of Mayor’s Office of Public Policy and Finance

Fisher Zhu, Fiscal and Policy Analyst

Anya Shutovska, Fiscal and Policy Analyst

Sally Ma, Deputy Budget Director

Radhika Mehlotra, Senior Fiscal and Policy Analyst

Jack English, Fiscal and Policy Analyst

Damon Daniels, Fiscal and Policy Analyst

Xang Hang, Junior Fiscal and Policy Analyst

Matthew Puckett, Fiscal and Policy Analyst

Tabitha Romero-Bothi, Fiscal and Policy Assistant


## Images

By default, images embedded in documents that can be extracted are part of the result object.

We can also specify to take screenshots of every page:

In [None]:
parser = LlamaParse(
    parse_mode="parse_page_with_agent",
    model="openai-gpt-4-1-mini",
    high_res_ocr=True,
    adaptive_long_table=True,
    outlined_table_extraction=True,
    output_tables_as_HTML=True,
    # Take screenshot of the page
    take_screenshot=True,
)
result = await parser.aparse("./san_francisco_budget_2023.pdf")

Started parsing the file under job_id e6332422-803b-404d-8d0d-ad510fa56c09
...

In [None]:
print(result.pages[0].images)

[ImageItem(name='page_1.jpg', height=792.0, width=612.0, x=0.0, y=0.0, original_width=1236, original_height=1600, type='full_page_screenshot')]


We can download images (either their bytes or to a local file) using the `JobResult` object as well!

In [None]:
# single image
image_data = await result.aget_image_data(result.pages[0].images[0].name)

# save an image to a file
output_path = await result.asave_image(
    result.pages[0].images[0].name, "./json_tour_screenshots"
)

# save all images
output_paths = await result.asave_all_images("./json_tour_screenshots")

## Items

This is an array of all the parsed elements on the page, as used to render the markdown, but separated out into their own objects. This is useful if you want to do more processing on the data. Let's take a look:

In [None]:
import json

print(result.pages[0].items[0])

type='heading' lvl=1 value='CITY & COUNTY OF SAN FRANCISCO, CALIFORNIA' md='# CITY & COUNTY OF SAN FRANCISCO, CALIFORNIA' rows=None bBox=BBox(x=176.0, y=52.0, w=277.0, h=12.0)


In [None]:
print(result.pages[0].items[1])

type='heading' lvl=1 value='PROPOSED BUDGET' md='# PROPOSED BUDGET' rows=None bBox=BBox(x=89.0, y=118.0, w=451.0, h=47.0)


As you can see you get different element types: text, headings, and tables. Each comes with its own `md` key containing a Markdown representation of that element, allowing you to easily summarize with only headings, tables only, etc..

The ability to extract tables from visual data is really powerful. Let's take a look at page 35, which has some bar charts that get automatically converted into tables:

<img src="./json_tour_screenshots/page_35.png" alt="Page 35" width="300"/>


The bar chart has been converted into a table, and even though explicit values are not included, the bar chart has been read and approximate values for each bar on the chart have been included!

In [None]:
print(result.pages[34].items[6])

type='table' lvl=None value=None md="Source: U.S. Census Bureau, 2017-2021 American Community Survey 5-years Estimate.\n|Race|Educational Level|Number of Residents| | | | |\n|---|---|---|---|---|---|---|\n|Age Group| | | | | | |\n|Under 5 Years|5 to 19 Years|20 to 34 Years|35 to 59 Years|60 and Over| | |\n|Graduate or professional degree|Bachelor's degree|Associate's degree|Some college, no degree|High school graduate (includes equivalency)|9th to 12th grade, no diploma|Less than 9th grade|" rows=[[], ['Race', 'Educational Level', 'Number of Residents', '', '', '', ''], ['---', '---', '---', '---', '---', '---', '---'], ['Age Group', '', '', '', '', '', ''], ['Under 5 Years', '5 to 19 Years', '20 to 34 Years', '35 to 59 Years', '60 and Over', '', ''], ['Graduate or professional degree', "Bachelor's degree", "Associate's degree", 'Some college, no degree', 'High school graduate (includes equivalency)', '9th to 12th grade, no diploma', 'Less than 9th grade']] bBox=BBox(x=68.0, y=129.0, w

### `links`

Our budget PDF doesn't have any links, so let's load a different PDF with links and see what we get.


In [None]:
!wget 'https://www.dropbox.com/scl/fi/hay06lyxc49gkuh91oek6/basic-link-1.pdf?rlkey=uije7yb0lxqgqwk7p7hnqepdx&dl=0' -O './basic-link-1.pdf'

In [None]:
parser = LlamaParse(
    parse_mode="parse_page_with_agent",
    model="openai-gpt-4-1-mini",
    high_res_ocr=True,
    adaptive_long_table=True,
    outlined_table_extraction=True,
    output_tables_as_HTML=True,
    # Annotate links in the document
    annotate_links=True,
)
result = await parser.aparse("./basic-link-1.pdf")

Started parsing the file under job_id 9b2df975-af3c-4868-99e2-520ce0b21f4d


This is a very simple document with some internal and external links:

<img src="./json_tour_screenshots/links_page.png" alt="Page 1" width="300"/>


The parser finds the external links and their labels and includes them in the `links` section:

In [None]:
print(result.pages[0].links)

[{'url': 'https://www.antennahouse.com/', 'text': 'Antenna House, Inc.'}, {'url': 'https://www.antennahouse.com/', 'text': 'Linking to a website (https://www.antennahouse.com/)'}]


This concludes our tour! I hope this makes clear the power of JSON mode and the flexibility it gives you over what parts of your documents you can use.