# AI Parse Document with Debugger Interface

* To learn more, visit the official `ai_parse_document` [documentation page](https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_parse_document).
* This notebook shows how to use `ai_parse_document` on a sample PDF file. And then **use a debugger interface to check the output against the parsed results**. 
* This notebook should be run natively on Databricks, using `Serverless` compute option, with at least version 3 env. 
* For this example notebook, I'm using [Nvidia's company overview presentation from August 2025](https://s201.q4cdn.com/141608511/files/doc_presentations/2025/08/Q226-NVDA-Company-Overview-Final.pdf).

## Setup

For `page_selection`, here are the supported options:
- "all" or None: Display all pages
- "3": Display specific page (1-indexed)
- "1-5": Display page range (inclusive, 1-indexed)
- "1,3,5": Display list of specific pages (1-indexed)
- "1-3,7,10-12": Mixed ranges and individual pages

In [None]:
# Exec Parameters
catalog = "users"
schema = "david_huang"
volume = "ai_parse_doc_examples"
input_file = "Q226-NVDA-Company-Overview-Final.pdf"
page_selection = "all"

In [None]:
# Path configuration
source_files = f"/Volumes/{catalog}/{schema}/{volume}/input/{input_file}"
image_output_path = f"/Volumes/{catalog}/{schema}/{volume}/output/"

## Parse document

In [None]:
# SQL statement with ai_parse_document()
if not input_file:
    source_files = f"/Volumes/{catalog}/{schema}/{volume}/input/*"
sql = f"""
with parsed_documents AS (
  SELECT
    path,
    ai_parse_document(
      content,
      map(
        'version', '2.0',
        'imageOutputPath', '{image_output_path}',
        'descriptionElementTypes', '*'
      )
  ) as parsed
  FROM
    read_files('{source_files}', format => 'binaryFile')
)
select * from parsed_documents
"""

In [None]:
parsed_results = [row.parsed for row in spark.sql(sql).collect()]

## Run debugger

In [0]:
from debugger import render_ai_parse_output_interactive

In [0]:
render_ai_parse_output_interactive(parsed_results)