<a href="https://colab.research.google.com/github/jaideep11061982/GenAINotebooks/blob/main/demo_ppt_financial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LlamaParse - Parsing Financial Powerpoints 📊

In this cookbook we show you how to use LlamaParse to parse a financial powerpoint.

## Installation

Parsing instruction are part of the LlamaParse API. They can be access by directly specifying the parsing_instruction parameter in the API or by using LlamaParse python module (which we will use for this tutorial).

To install llama-parse, just get it from `pip`:

In [None]:
%pip install llama-index
%pip install llama-parse
%pip install torch transformers python-pptx Pillow

## API Key

The use of LlamaParse requires an API key which you can get here: https://cloud.llamaindex.ai/parse

In [None]:
import os

os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."
os.environ["OPENAI_API_KEY"] = "sk-..."

**NOTE**: Since LlamaParse is natively async, running the sync code in a notebook requires the use of nest_asyncio.


In [None]:
import nest_asyncio

nest_asyncio.apply()

## Importing the package

To import llama_parse simply do:

In [None]:
from llama_parse import LlamaParse

## Using LlamaParse to Parse Presentations

Like Powerpoints, presentations are often hard to extract for RAG. With LlamaParse we can now parse them and unclock their content of presentations for RAG.

Let's download a financial report from the World Meteorological Association.

In [None]:
! mkdir data; wget "https://meetings.wmo.int/Cg-19/PublishingImages/SitePages/FINAC-43/7%20-%20EC-77-Doc%205%20Financial%20Statements%20for%202022%20(FINAC).pptx" -O data/presentation.pptx

### Parsing the presentation

Now let's parse it into Markdown with LlamaParse and the default LlamaIndex parser.




#### Llama Index default

In [None]:
from llama_index.core import SimpleDirectoryReader

vanilla_documents = SimpleDirectoryReader("./data/").load_data()

#### Llama Parse

In [None]:
llama_parse_documents = LlamaParse(result_type="markdown").load_data(
    "./data/presentation.pptx"
)

Started parsing the file under job_id 56724c0d-e45a-4e30-ae8c-e416173c608a


Let's take a look at the parsed output from an example slide (see image below).

As we can see the table is faithfully extracted!

In [None]:
print(llama_parse_documents[0].get_content()[-2800:-2300])

ation and mitigation
---
|Item|31 Dec 2022|31 Dec 2021|Change|
|---|---|---|---|
|Payables and accruals|4,685|4,066|619|
|Employee benefits|127,215|84,676|42,539|
|Contributions received in advance|6,975|10,192|(3,217)|
|Unearned revenue from exchange transactions|20|651|(631)|
|Deferred Revenue|71,301|55,737|15,564|
|Borrowings|28,229|29,002|(773)|
|Funds held in trust|30,373|29,014|1,359|
|Provisions|1,706|1,910|(204)|
|Total Liabilities|270,504|215,248|55,256|
---
## Liabilities

Employee Ben


Compared against the original slide image.
![Demo](demo_ppt_financial_1.png)

## Comparing the two for RAG

The main difference between LlamaParse and the previous directory reader approach, it that LlamaParse will extract the document in a structured format, allowing better RAG.

### Query Engine on SimpleDirectoryReader results

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

vanilla_index = VectorStoreIndex.from_documents(vanilla_documents)
vanilla_query_engine = vanilla_index.as_query_engine()

### Query Engine on LlamaParse Results


In [None]:
llama_parse_index = VectorStoreIndex.from_documents(llama_parse_documents)
llama_parse_query_engine = llama_parse_index.as_query_engine()

### Liability provision
What was the liability provision as of Dec 31 2021?

<!-- <img src="https://drive.usercontent.google.com/download?id=184jVq0QyspDnmCyRfV0ebmJJxmAOJHba&authuser=0" /> -->

In [None]:
vanilla_response = vanilla_query_engine.query(
    "What was the liability provision as of Dec 31 2021?"
)
print(vanilla_response)

The liability provision as of December 31, 2021, included Employee Benefit Liabilities, Contributions received in advance (assessed contributions), and Deferred revenue.


In [None]:
llama_parse_response = llama_parse_query_engine.query(
    "What was the liability provision as of Dec 31 2021?"
)
print(llama_parse_response)

The liability provision as of December 31, 2021, was 1,910 CHF.
