<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/parsing_instructions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Parsing documents with Instructions

Parsing instructions allow you to guide our parsing model in the same way you would instruct an LLM.

These instructions can be useful for improving the parser's performance on complex document layouts, extracting data in a specific format, or transforming the document in other ways.

In this demonstration, we showcase how parsing instructions can be used to extract specific information from unstructured documents. Below are the documents we use for testing:

1. McDonald's Receipt - Extracting the price of each order and the final amount to be paid.

2. Expense Report Document - Extracting employee name, employee ID, position, department, date ranges, individual expense items with dates, categories, and amounts.

3. Purchase Order Document - Identifying the PO number, vendor details, shipping terms, and an itemized list of products with quantities and unit prices.


### Installation

In [None]:
!pip install llama-parse

### Setup API Key

In [None]:
import nest_asyncio

nest_asyncio.apply()

import os

os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."

### McDonald's Receipt

Here we extract the price of each order and the final amount to be paid.

In [None]:
from llama_parse import LlamaParse

vanilaParsing = LlamaParse(result_type="markdown").load_data("./mcdonalds_receipt.png")

Started parsing the file under job_id 0ffbb3d4-5148-47e7-a6a0-6ea0c47be0df


In [None]:
print(vanilaParsing[0].text)

# Rate us HIGHLY SATISFIED

Purchase any sandwich and receive a FREE ITEM

Go to WWW.mcdvoice.com within 7 days of purchase of equal or lesser value and tell us about your visit.

Validation Code: 31278-01121-21018-20481-00081-0

Valid at participating US McDonald's

Expires 30 days after receipt date

# McDonald's Restaurant #312782378

PINE RD NW

RICE MN 56367-9740

TEL# 320 393 4600

KS# 12/08/2022 08:48 PM

# Order

|Happy Meal 6 Pc|$4.89|
|---|---|
|Creamy Ranch Cup| |
|Extra Kids Fry| |
|Wreck It Ralph 2 Snack| |
|Oreo McFlurry|$2.69|

# Summary

|Subtotal|$7.58|
|---|---|
|Tax|$0.52|
|Take-Out Total|$8.10|
|Cash Tendered|$10.00|
|Change|$1.90|

### Not ACCEPTING APPLICATIONS *++ McDonald's Restaurant Rice

Text to #36453 apply 31278


In [None]:
parsingInstruction = """The provided document is a McDonald's receipt.
 Provide the price of each order and final amount to be paid."""
withInstructionParsing = LlamaParse(
    result_type="markdown", parsing_instruction=parsingInstruction
).load_data("./mcdonalds_receipt.png")

Started parsing the file under job_id af9c7ef4-e842-47f2-9a22-e99b959e8028


In [None]:
print(withInstructionParsing[0].text)

Here are the prices for each order from the McDonald's receipt:

1. Happy Meal 6 Pc: $4.89
2. Snack Oreo McFlurry: $2.69

**Subtotal:** $7.58
**Tax:** $0.52
**Total Amount to be Paid:** $8.10

The cash tendered was $10.00, and the change given was $1.90.


### Expense Report Document

Here we extract employee name, employee ID, position, department, date ranges, individual expense items with dates, categories, and amounts.

In [None]:
vanilaParsing = LlamaParse(result_type="markdown").load_data(
    "./export_report_document.md"
)

Started parsing the file under job_id 63ff0728-bb93-421d-a093-6560050e6c22
...

In [None]:
print(vanilaParsing[0].text)

# QUANTUM DYNAMICS CORPORATION

# EMPLOYEE EXPENSE REPORT

# FISCAL YEAR 2024

# EMPLOYEE INFORMATION:

Name: Dr. Alexandra Chen-Martinez, PhD

Employee ID: QD-2022-1457

Department: Advanced Research & Development

Cost Center: CC-ARD-NA-003

Project Codes: QD-QUANTUM-2024-01, QD-AI-2024-03

Position: Principal Research Scientist

Reporting Manager: Dr. James Thompson

# TRIP/EXPENSE PERIOD:

Start Date: November 15, 2024

End Date: December 10, 2024

Purpose: International Conference Attendance & Client Meetings

Locations: Tokyo, Japan → Singapore → Sydney, Australia

# CURRENCY CONVERSION RATES APPLIED:

JPY (¥) → USD: 0.0068 (as of 11/15/2024)

SGD (S$) → USD: 0.74 (as of 11/28/2024)

AUD (A$) → USD: 0.65 (as of 12/03/2024)

# ITEMIZED EXPENSES:

|Date|Category|Description|Original Currency|USD|
|---|---|---|---|---|
|11/15/2024|Transportation|JFK → NRT Business Class Booking Ref: QF78956 - Corporate Rate Applied Project Code: QD-QUANTUM-2024-01|4,250.00 USD|4,250.00|
|11/16/2024|

In [None]:
parsingInstruction = """You are provided with an expense report. 
Extract employee name, employee id, position, department, date ranges, individual expense items with dates, categories, and amounts."""

withInstructionParsing = LlamaParse(
    result_type="markdown", parsing_instruction=parsingInstruction
).load_data("./export_report_document.md")

Started parsing the file under job_id 8c953105-3684-4239-bd27-946ddb3a1943


In [None]:
print(withInstructionParsing[0].text)

**Employee Information:**
- **Name:** Dr. Alexandra Chen-Martinez, PhD
- **Employee ID:** QD-2022-1457
- **Position:** Principal Research Scientist
- **Department:** Advanced Research & Development

**Trip/Expense Period:**
- **Start Date:** November 15, 2024
- **End Date:** December 10, 2024

**Itemized Expenses:**

1. **Date:** 11/15/2024
- **Category:** Transportation
- **Description:** JFK → NRT Business Class
- **Original Currency:** USD
- **Amount:** $4,250.00

2. **Date:** 11/16/2024
- **Category:** Accommodation
- **Description:** Hilton Tokyo - 5 nights
- **Original Currency:** JPY
- **Amount:** ¥225,000 (Converted Amount: $1,530.00)

3. **Date:** 11/17/2024
- **Category:** Meals
- **Description:** Client Dinner - Sushi Zen
- **Original Currency:** JPY
- **Amount:** ¥45,600 (Converted Amount: $310.08)

4. **Date:** 11/18/2024 to 11/20/2024
- **Category:** Conference Registration
- **Description:** Quantum Computing Summit
- **Original Currency:** USD
- **Amount:** $2,500.00

5

### Purchase Order Document 

Here we identify the PO number, vendor details, shipping terms, and an itemized list of products with quantities and unit prices.

In [None]:
vanilaParsing = LlamaParse(result_type="markdown").load_data(
    "./purchase_order_document.md"
)

Started parsing the file under job_id e7e389d1-bdaa-4d12-8283-b736e21ffe6b


In [None]:
print(vanilaParsing[0].text)

# GLOBAL TECH SOLUTIONS, INC.

# PURCHASE ORDER

Document Reference: PO-2024-GT-9876/REV.2

[Original: PO-2024-GT-9876]

Amendment Date: 12/10/2024

# VENDOR INFORMATION:

Quantum Electronics Manufacturing

DUNS: 78-456-7890

Tax ID: EU8976543210

Hoofdorp, Netherlands

Vendor #: QEM-EU-2024-001

# SHIP TO:

Global Tech Solutions, Inc.

Building 7A, Innovation Park

2100 Technology Drive

Austin, TX 78701

USA

Attn: Sarah Martinez, Receiving Manager

Tel: +1 (512) 555-0123

# PAYMENT TERMS:

Net 45

2% discount if paid within 15 days

# SHIPPING TERMS:

DDP (Delivered Duty Paid) - Incoterms 2020

Insurance Required: Yes

Preferred Carrier: DHL/FedEx

Required Delivery Date: 01/15/2025

# SPECIAL INSTRUCTIONS:

1. All shipments must include Certificate of Conformance
2. ESD-sensitive items must be properly packaged
3. Temperature logging required for items marked with *
4. Partial shipments accepted with prior approval
5. Quote PO number on all correspondence

# ITEM DETAILS:

|Line|Pa

In [None]:
parsingInstruction = """You are provided with a purchase order. 
Identify the PO number, vendor details, shipping terms, and itemized list of products with quantities and unit prices."""

withInstructionParsing = LlamaParse(
    result_type="markdown", parsing_instruction=parsingInstruction
).load_data("./purchase_order_document.md")

Started parsing the file under job_id c336a4aa-1711-4648-8612-f82e63bf4127


In [None]:
print(withInstructionParsing[0].text)

**Purchase Order Details:**

- **PO Number:** PO-2024-GT-9876/REV.2
- **Vendor Details:**
- **Vendor Name:** Quantum Electronics Manufacturing
- **DUNS:** 78-456-7890
- **Tax ID:** EU8976543210
- **Address:** Hoofdorp, Netherlands
- **Vendor #:** QEM-EU-2024-001

- **Shipping Terms:**
- **Terms:** DDP (Delivered Duty Paid) - Incoterms 2020
- **Insurance Required:** Yes
- **Preferred Carrier:** DHL/FedEx
- **Required Delivery Date:** 01/15/2025

- **Itemized List of Products:**
1. **Item 1:**
- **Part Number:** QE-MCU-5590
- **Description:** Microcontroller Unit, Rev. B, 32-bit, 120MHz, LQFP-144
- **Quantity:** 500 EA
- **Unit Price:** $12.50
- **Total Price:** $6,250.00
- **Notes:** Temp Range: -40°C to +85°C, Lot tracking required

2. **Item 2:**
- **Part Number:** QE-SENS-789
- **Description:** Temperature Sensor Module, v2.1, Digital Output, I2C Interface
- **Quantity:** 750 EA
- **Unit Price:** $8.75
- **Total Price:** $6,562.50
- **Notes:** Calibration certificates required

3. **