# LlamaParse - Parsing comic books with parsing intructions
Parsing intructions allow you to instruct our parsing model the same way you would instruct an LLM!

They can be useful to help the parser get better results on complex document layouts, to extract data in a specific format, or to transform the document in other ways.

Using Parsing Instruction you will get better results out of LlamaParse on complicated documents, and also be able to simplify your application code.

## Installation

Parsing instructions are part of the llamaParse API. They can be accessed by directly specifying the parsing_instruction parameter in the API or by using the LlamaParse python module (which we will use for this tutorial).

To install llama-parse, just get it from PIP:

In [None]:
!pip install llama-parse

Collecting llama-parse
  Downloading llama_parse-0.3.8-py3-none-any.whl (6.7 kB)
Collecting llama-index-core>=0.10.7 (from llama-parse)
  Downloading llama_index_core-0.10.19-py3-none-any.whl (15.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.3/15.3 MB[0m [31m31.9 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json (from llama-index-core>=0.10.7->llama-parse)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core>=0.10.7->llama-parse)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index-core>=0.10.7->llama-parse)
  Downloading dirtyjson-1.0.8-py3-none-any.whl (25 kB)
Collecting httpx (from llama-index-core>=0.10.7->llama-parse)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llamaindex-py-clie

## API key

The use of LlamaParse requires an API key which you can get here: https://cloud.llamaindex.ai/parse

In [None]:
import os

os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."

## Async (Notebook only)
llama-parse is async-first, so running the code in a notebook requires the use of nest_asyncio


In [None]:
import nest_asyncio

nest_asyncio.apply()

## Import the package

In [None]:
from llama_parse import LlamaParse

## Using llamaparse for getting better results (on Manga!)

Sometimes the layout of a page is unusual and you will get sub-optimal reading order results with LlamaParse. For example, when parsing manga you expect the reading order to be right to left even if the content is in English!

Let's download an extract of a great manga "The manga guide to calculus", by Hiroyuki Kojima (https://www.amazon.com/Manga-Guide-Calculus-Hiroyuki-Kojima/dp/1593271948)



In [None]:
! wget "https://drive.usercontent.google.com/uc?id=1tZJhcpepLRdQFJFCFX50QIqLyLgqzZsY&export=download" -O ./manga.pdf

--2024-03-13 13:57:19--  https://drive.usercontent.google.com/uc?id=1tZJhcpepLRdQFJFCFX50QIqLyLgqzZsY&export=download
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 173.194.211.132, 2607:f8b0:400c:c10::84
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|173.194.211.132|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=1tZJhcpepLRdQFJFCFX50QIqLyLgqzZsY&export=download [following]
--2024-03-13 13:57:19--  https://drive.usercontent.google.com/download?id=1tZJhcpepLRdQFJFCFX50QIqLyLgqzZsY&export=download
Reusing existing connection to drive.usercontent.google.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 3041634 (2.9M) [application/octet-stream]
Saving to: ‘./manga.pdf’


2024-03-13 13:57:20 (78.6 MB/s) - ‘./manga.pdf’ saved [3041634/3041634]



### Without parsing instructions
For the sake of comparison, let's first parse without any instructions.

In [None]:
vanilaParsing = LlamaParse(result_type="markdown").load_data("./manga.pdf")

Started parsing the file under job_id 25bf4202-78d8-4705-88cf-c616ae7c82af


As you can see below, LlamaParse is not doing a great job here. It is interpreting the grid of comic panels as a table, and trying to fit the dialogue into a table. It's very hard to follow.

In [None]:
print(vanilaParsing[0].text[100:1000])


The Asagake Times Sanda-Cho Distributor

A newspaper distributor? do I have the wrong map?

You’re looking It’s next for the Sanda-cho door. branch office? Everybody mistakes us for the office because we are larger. What Is a Function? 3
---
## Calculating the Derivative of a Constant, Linear, or Quadratic Function

|1.|Let’s find the derivative of constant function f(x) = α. The differential coefficient of f(x) at x = a is|
|---|---|
| |lim ε→0 (f(a + ε) - f(a)) / ε = lim ε→0 (α - α) = lim ε→0 0 = 0|
| |Thus, the derivative of f(x) is f′(x) = 0. This makes sense, since our function is constant—the rate of change is 0.|

Note: The differential coefficient of f(x) at x = a is often simply called the derivative of f(x) at x = a, or just f′(a).

|2.|Let’s calculate the derivative of linear function f(x) = αx + β. The derivative of f(x) at x = α is|
|---|---|
| |lim ε→0 (f(α + ε) - f(a)) = 


### Using parsing instructions
Let's try to parse the manga with custom instructions:

"The provided document is a manga comic book. Most pages do NOT have title. It does not contain tables. Try to reconstruct the dialogue happening in a cohesive way."

To do so just pass the parsing instruction as a parameter to LlamaParse:

In [None]:
parsingInstructionManga = """The provided document is a manga comic book, most page do NOT have title.
It does not contain table.
Try to reconstruct the dialog happening in a cohesive way."""
withInstructionParsing = LlamaParse(
    result_type="markdown", parsing_instruction=parsingInstructionManga
).load_data("./manga.pdf")

Started parsing the file under job_id 88ab273e-b2a7-4f84-8e72-e9367cf6b114
.

Let's see how it compare with page 3! We encourage you to play with the target page and explore other pages. As you will see, the parsing instruction allowed LlamaParse to make sense of the document!

<img src="https://drive.usercontent.google.com/download?id=1M87rXTIZE8d5v7aHmVZVW6gW3eDGq6ks&authuser=0" />





In [None]:
target_page = 1
print(vanilaParsing[0].text.split("\n---\n")[target_page])
print("\n\n------------------------------------------------------------\n\n")
print(withInstructionParsing[0].text.split("\n---\n")[target_page])

The Asagake Times Sanda-Cho Distributor

A newspaper distributor? do I have the wrong map?

You’re looking It’s next for the Sanda-cho door. branch office? Everybody mistakes us for the office because we are larger. What Is a Function? 3


------------------------------------------------------------


# The Asagake Times

Sanda-Cho Distributor

A newspaper distributor?

Do I have the wrong map?

You're looking for the Sanda-cho branch office?

It's next door.

Everybody mistakes us for the office because we are larger.

What Is a Function? 3


### Math - doing more with parsing instuction!

But this manga is about math and full of equations, why not ask the parser to output them in **LaTeX**?

<img src="https://drive.usercontent.google.com/download?id=1tze3xcQ7axVA-vC_iZeAj_GvYcyNuYDa&authuser=0" />

In [None]:
parsingInstructionMangaLatex = """The provided document is a manga comic book, most page do NOT have title.
It does not contain table. Do not output table.
Try to reconstruct the dialog happening in a cohesive way.
Output any math equation in LATEX markdown (between $$)"""
withLatex = LlamaParse(
    result_type="markdown", parsing_instruction=parsingInstructionMangaLatex
).load_data("./manga.pdf")

Started parsing the file under job_id 3a055e64-d91e-484e-b9b0-99a2e637c08d
.

In [None]:
target_page = 2
print(
    "\n\n[Without instruction]------------------------------------------------------------\n\n"
)
print(vanilaParsing[0].text.split("\n---\n")[target_page])
print(
    "\n\n[With instruction to output math in LATEX!]------------------------------------------------------------\n\n"
)
print(withLatex[0].text.split("\n---\n")[target_page])



[Without instruction]------------------------------------------------------------


## Calculating the Derivative of a Constant, Linear, or Quadratic Function

|1.|Let’s find the derivative of constant function f(x) = α. The differential coefficient of f(x) at x = a is|
|---|---|
| |lim ε→0 (f(a + ε) - f(a)) / ε = lim ε→0 (α - α) = lim ε→0 0 = 0|
| |Thus, the derivative of f(x) is f′(x) = 0. This makes sense, since our function is constant—the rate of change is 0.|

Note: The differential coefficient of f(x) at x = a is often simply called the derivative of f(x) at x = a, or just f′(a).

|2.|Let’s calculate the derivative of linear function f(x) = αx + β. The derivative of f(x) at x = α is|
|---|---|
| |lim ε→0 (f(α + ε) - f(a)) = lim ε→0 (α(a + ε) + β - (αa + β)) = lim ε→0 α = α|
| |Thus, the derivative of f(x) is f′(x) = α, a constant value. This result should also be intuitive—linear functions have a constant rate of change by definition.|

|3.|Let’s find the derivative of f(x) = 

And here is the result as rendered by https://upmath.me/ .


<img src="https://drive.usercontent.google.com/download?id=1qGo5bMGYOiIC9MnprcgEByaYjU9YII2Q&authuser=0" />


Over this short notebook we saw how to use parsing instructions to increase the quality and accuracy of parsing with LLamaParse!