<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/parsing_instructions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Parsing documents with Instructions

Parsing instructions allow you to guide our parsing model in the same way you would instruct an LLM.

These instructions can be useful for improving the parser's performance on complex document layouts, extracting data in a specific format, or transforming the document in other ways.

### Why This Matters:
Traditional document parsing can be rigid and error-prone, often missing crucial context and nuances in complex layouts. Our instruction-based parsing allows you to:

1. Extract specific information with pinpoint accuracy
2. Handle complex document layouts with ease
3. Transform unstructured data into structured formats effortlessly
4. Save hours of manual data entry and verification
5. Reduce errors in document processing workflows

In this demonstration, we showcase how parsing instructions can be used to extract specific information from unstructured documents. Below are the documents we use for testing:

1. McDonald's Receipt - Extracting the price of each order and the final amount to be paid.

2. Expense Report Document - Extracting employee name, employee ID, position, department, date ranges, individual expense items with dates, categories, and amounts.

3. Purchase Order Document - Identifying the PO number, vendor details, shipping terms, and an itemized list of products with quantities and unit prices.

Let's jump into these real-world examples and see how parsing instructions can help us extract specific information.

### Installation

In [None]:
!pip install llama-parse

### Setup API Key

In [None]:
import nest_asyncio

nest_asyncio.apply()

import os

os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."

### McDonald's Receipt

Here we extract the price of each order and the final amount to be paid.

<img src="mcdonalds_receipt.png" alt="Alt Text" width="500">

In [None]:
from llama_parse import LlamaParse

vanilaParsing = LlamaParse(result_type="markdown").load_data("./mcdonalds_receipt.png")

Started parsing the file under job_id 66643b81-e2f4-408b-890b-8e116472210b


In [None]:
print(vanilaParsing[0].text)

# Rate us HIGHLY SATISFIED

Purchase any sandwich and receive a FREE ITEM

Go to WWW.mcdvoice.com within 7 days of purchase of equal or lesser value and tell us about your visit.

Validation Code: 31278-01121-21018-20481-00081-0

Valid at participating US McDonald's

Expires 30 days after receipt date

# McDonald's Restaurant #312782378

PINE RD NW

RICE MN 56367-9740

TEL# 320 393 4600

KS# 12/08/2022 08:48 PM

# Order

|Happy Meal 6 Pc|$4.89|
|---|---|
|Creamy Ranch Cup| |
|Extra Kids Fry| |
|Wreck It Ralph 2 Snack| |
|Oreo McFlurry|$2.69|

# Summary

|Subtotal|$7.58|
|---|---|
|Tax|$0.52|
|Take-Out Total|$8.10|
|Cash Tendered|$10.00|
|Change|$1.90|

### Not ACCEPTING APPLICATIONS *++ McDonald's Restaurant Rice

Text to #36453 apply 31278


In [None]:
parsingInstruction = """The provided document is a McDonald's receipt.
 Provide the price of each order and final amount to be paid."""
withInstructionParsing = LlamaParse(
    result_type="markdown", parsing_instruction=parsingInstruction
).load_data("./mcdonalds_receipt.png")

Started parsing the file under job_id 1a04fdbb-5415-4a36-a1bd-26bfb5d618fa


In [None]:
print(withInstructionParsing[0].text)

Here are the prices for each order from the McDonald's receipt:

1. Happy Meal 6 Pc: $4.89
2. Snack Oreo McFlurry: $2.69

**Subtotal:** $7.58
**Tax:** $0.52
**Total Amount to be Paid:** $8.10

The cash tendered was $10.00, and the change given was $1.90.


### Expense Report Document

Here we extract employee name, employee ID, position, department, date ranges, individual expense items with dates, categories, and amounts.

<img src="expense_report_document.png" alt="Alt Text" width="500">

In [None]:
vanilaParsing = LlamaParse(result_type="markdown").load_data(
    "./expense_report_document.pdf"
)

Started parsing the file under job_id b6bcc6e1-7d30-4522-9abd-ace196781a70


In [None]:
print(vanilaParsing[0].text)

# QUANTUM DYNAMICS CORPORATION

# EMPLOYEE EXPENSE REPORT

# FISCAL YEAR 2024

# EMPLOYEE INFORMATION:

Name: Dr. Alexandra Chen-Martinez, PhD

Employee ID: QD-2022-1457

Department: Advanced Research & Development

Cost Center: CC-ARD-NA-003

Project Codes: QD-QUANTUM-2024-01, QD-AI-2024-03

Position: Principal Research Scientist

Reporting Manager: Dr. James Thompson

# TRIP/EXPENSE PERIOD:

Start Date: November 15, 2024

End Date: December 10, 2024

Purpose: International Conference Attendance & Client Meetings

Locations: Tokyo, Japan → Singapore → Sydney, Australia

# CURRENCY CONVERSION RATES APPLIED:

JPY (¥) → USD: 0.0068 (as of 11/15/2024)

SGD (S$) → USD: 0.74 (as of 11/28/2024)

AUD (A$) → USD: 0.65 (as of 12/03/2024)

# ITEMIZED EXPENSES:

|Date|Category|Description|Original|Currency|USD|
|---|---|---|---|---|---|
|11/15/2024|Transportation|JFK → NRT Business Class|4,250.00|USD|4,250.00|
|Booking Ref: QF78956 - Corporate Rate Applied|Booking Ref: QF78956 - Corporate Rate Ap

In [None]:
parsingInstruction = """You are provided with an expense report. 
Extract employee name, employee id, position, department, date ranges, individual expense items with dates, categories, and amounts."""

withInstructionParsing = LlamaParse(
    result_type="markdown", parsing_instruction=parsingInstruction
).load_data("./expense_report_document.pdf")

Started parsing the file under job_id 7b0d05bb-947b-4475-8d0f-f10386f7446e


In [None]:
print(withInstructionParsing[0].text)

**Employee Information:**
- **Name:** Dr. Alexandra Chen-Martinez, PhD
- **Employee ID:** QD-2022-1457
- **Position:** Principal Research Scientist
- **Department:** Advanced Research & Development

**Trip/Expense Period:**
- **Start Date:** November 15, 2024
- **End Date:** December 10, 2024

**Expense Items:**
1. **Date:** 11/15/2024
- **Category:** Transportation
- **Description:** JFK → NRT Business Class
- **Original Amount:** $4,250.00
- **Currency:** USD
- **USD Amount:** $4,250.00
- **Booking Reference:** QF78956 - Corporate Rate Applied
- **Project Code:** QD-QUANTUM-2024-01

2. **Date:** 11/16/2024
- **Category:** Accommodation
- **Description:** Hilton Tokyo - 5 nights
- **Original Amount:** ¥225,000
- **Currency:** JPY
- **USD Amount:** $1,530.00
- **Confirmation:** HTK-2024-78956

**Locations:**
- Tokyo, Japan
- Singapore
- Sydney, Australia

**Currency Conversion Rates Applied:**
- JPY (¥) → USD: 0.0068 (as of 11/15/2024)
- SGD (S$) → USD: 0.74 (as of 11/28/2024)
- AUD (A

### Purchase Order Document 

Here we identify the PO number, vendor details, shipping terms, and an itemized list of products with quantities and unit prices.

<img src="purchase_order_document.png" alt="Alt Text" width="500">

In [None]:
vanilaParsing = LlamaParse(result_type="markdown").load_data(
    "./purchase_order_document.pdf"
)

Started parsing the file under job_id b8cb11c3-7dce-4e6a-94bb-1a4e50e45e55


In [None]:
print(vanilaParsing[0].text)

# GLOBAL TECH SOLUTIONS, INC.

# PURCHASE ORDER

Document Reference: PO-2024-GT-9876/REV.2

[Original: PO-2024-GT-9876]

Amendment Date: 12/10/2024

# VENDOR INFORMATION:

Quantum Electronics Manufacturing

DUNS: 78-456-7890

Tax ID: EU8976543210

Hoofdorp, Netherlands

Vendor #: QEM-EU-2024-001

# SHIP TO:

Global Tech Solutions, Inc.

Building 7A, Innovation Park

2100 Technology Drive

Austin, TX 78701

USA

Attn: Sarah Martinez, Receiving Manager

Tel: +1 (512) 555-0123

# PAYMENT TERMS:

Net 45

2% discount if paid within 15 days

# SHIPPING TERMS:

DDP (Delivered Duty Paid) - Incoterms 2020

Insurance Required: Yes

Preferred Carrier: DHL/FedEx

Required Delivery Date: 01/15/2025

# SPECIAL INSTRUCTIONS:

1. All shipments must include Certificate of Conformance
2. ESD-sensitive items must be properly packaged
3. Temperature logging required for items marked with *
4. Partial shipments accepted with prior approval
5. Quote PO number on all correspondence

# ITEM DETAILS:

|Line|Pa

In [None]:
parsingInstruction = """You are provided with a purchase order. 
Identify the PO number, vendor details, shipping terms, and itemized list of products with quantities and unit prices."""

withInstructionParsing = LlamaParse(
    result_type="markdown", parsing_instruction=parsingInstruction
).load_data("./purchase_order_document.pdf")

Started parsing the file under job_id d2731305-984d-4633-8a52-0493748cf10b


In [None]:
print(withInstructionParsing[0].text)

Here are the details extracted from the purchase order:

**PO Number:** PO-2024-GT-9876/REV.2

**Vendor Details:**
- **Vendor Name:** Quantum Electronics Manufacturing
- **DUNS:** 78-456-7890
- **Tax ID:** EU8976543210
- **Address:** Hoofdorp, Netherlands
- **Vendor Number:** QEM-EU-2024-001
- **Contact Person:** Sarah Martinez, Receiving Manager
- **Phone:** +1 (512) 555-0123

**Shipping Terms:**
- **Terms:** DDP (Delivered Duty Paid) - Incoterms 2020
- **Insurance Required:** Yes
- **Preferred Carrier:** DHL/FedEx
- **Required Delivery Date:** 01/15/2025

**Itemized List of Products:**
1. **Part Number:** QE-MCU-5590
- **Description:** Microcontroller Unit
- **Quantity:** 500 EA
- **Unit Price:** $12.50
- **Total:** $6,250.00

**Payment Terms:**
- Net 45
- 2% discount if paid within 15 days

**Special Instructions:**
1. All shipments must include Certificate of Conformance
2. ESD-sensitive items must be properly packaged
3. Temperature logging required for items marked with *
4. Part