# Testing:
This notebook demo's LlamaParse's ability to work in JSON mode for parsing text from images via indexing.

> Using JSON mode gives you back a list of json dictionaries, which contains both text and images. You can then download these images and use a multimodal model to extract information and index them.

> https://github.com/run-llama/llama_parse/blob/main/examples/demo_json.ipynb

## Document Corpus:
This notebook runs JSON mode from `llama-parse` on a single, local PDF. "Solving Linear Inverse Problems Provably via
Posterior Sampling with Latent Diffusion Models" -- which can be found here for `wget` https://arxiv.org/pdf/2307.00619.pdf. For this demo notebook, I've elected to just load it w/in the notebook's runtime for testing.

# API Key Management:
Using this notebook's secret feature to call API keys where applicable. (found in left panel of the Colab). Previous iterations of this I used `getPass` module.

```
from google.colab import userdata
userdata.get('secretName')
```
alt:

```
import os
import getpass
#API Access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = getpass.getpass("Enter LlamaParse API Key:")

# Using OpenAI API for embeddings/LLMs
os.environ['OPENAI_API_KEY'] = getpass.getpass("Enter OpenAI API Key:")
```

## Setup

Going to use OpenAI for the LLM and HuggingFace Hub for the BAII embedding model - _no Qdrant on this demo for persistent vector database._

In [2]:
!pip install llama-index llama-index-core
!pip install llama-index-embeddings-huggingface
!pip install llama-parse

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Applications/Xcode.app/Contents/Developer/usr/bin/python3 -m pip install --upgrade pip[0m
Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Applications/Xcode.app/Contents/Developer/usr/bin/python3 -m pip install --upgrade pip[0m
Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0

In [5]:
import nest_asyncio
nest_asyncio.apply()

import os

In [6]:
os.environ["LLAMA_CLOUD_API_KEY"] = os.getenv("LLAMA_CLOUD_API_KEY")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")


In [3]:
# LLM:
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo")

# Core:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = "local:BAAI/bge-small-en-v1.5"

  from .autonotebook import tqdm as notebook_tqdm


## Load Data

As mentioned above going to use "Solving Linear Inverse Problems Provably via
Posterior Sampling with Latent Diffusion Models"
> - https://drive.google.com/drive/folders/1viTmkHmHupA2qX6ePJBtdc5BUOqjdz7S
> - https://arxiv.org/pdf/2307.00619.pdf

## Using LlamaParse in JSON Mode for PDF Reading

> Following along as shown in this Llama-parse demo [notebook](https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_json.ipynb)  - but also adding instructions for the parser for our specific use.

In [12]:
from llama_parse import LlamaParse

MyparsingInstruction = """The provided document includes both text and math equations.
Output any math equation in LATEX markdown, using $$ at the start and end of the LATEX.
Any table in the document needs to preserve markdown structure."""

parser = LlamaParse(verbose=True,
                    parsing_instruction=MyparsingInstruction)
json_objs = parser.get_json_result("../corpus/solving-linear-inverse-probs.pdf")
json_list = json_objs[0]["pages"]

Started parsing the file under job_id 3bd197ee-0a14-4bd4-ad71-ffafcc19e9d3
.....................

In [16]:
json_objs[1]

IndexError: list index out of range

### Create the "get_text_nodes" function
Using for loop, build `json_list`.

In [None]:
from llama_index.core.schema import TextNode
from typing import List


def get_text_nodes(json_list: List[dict]):
    text_nodes = []
    for idx, page in enumerate(json_list):
        text_node = TextNode(
            text=page["text"],
            metadata={
                "page": page["page"]
            }
        )
        text_nodes.append(text_node)
    return text_nodes

In [None]:
text_nodes = get_text_nodes(json_list)

In [None]:
print(text_nodes)

[TextNode(id_='ffd79866-215f-429e-8026-72fc9fe0c4a4', embedding=None, metadata={'page': 1}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='        Solving Linear Inverse Problems Provably via\n     Posterior Sampling with Latent Diffusion Models\n                               Litu Rout     Negin Raoof       Giannis Daras\n                 Constantine Caramanis        Alexandros G. Dimakis         Sanjay Shakkottai\n                                    The University of Texas at Austin∗\n                                                 Abstract\n          We present the first framework to solve linear inverse problems leveraging pre-\n          trained latent diffusion models. Previously proposed algorithms (such as DPS and\n          DDRM) only apply to pixel-space diffusion models. We theoretically analyze our\n          algorithm showing provable sample recovery in a linear model setting. The algo-\n          rithmic insight obtained from our 

## Extract/Index images from Image Dict

here we use a multimodal to extract and index images from image dictionaries.

In [None]:
# call get_images on parser, convert to ImageDocuments
!mkdir llama2_images

from llama_index.core.schema import ImageDocument
from llama_index.multi_modal_llms.openai import OpenAIMultiModal


def get_image_text_nodes(json_objs: List[dict]):
    """Extract out text from images using a multimodal model."""
    openai_mm_llm = OpenAIMultiModal(max_tokens=500)
    image_dicts = parser.get_images(json_objs, download_path="llama2_images")
    image_documents = []
    img_text_nodes = []
    for image_dict in image_dicts:
        image_doc = ImageDocument(image_path=image_dict["path"])
        response = openai_mm_llm.complete(
            prompt="Describe the images as an alternative, informative text",
            image_documents=[image_doc],
        )
        text_node = TextNode(
            text=str(response),
            metadata={"path": image_dict["path"]}
        )
        img_text_nodes.append(text_node)
    return img_text_nodes

In [None]:
image_text_nodes = get_image_text_nodes(json_objs)

> Image for page 1: []
> Image for page 2: [{'name': 'page-2-4.jpg', 'height': 119, 'width': 119, 'x': 246, 'y': 191}, {'name': 'page-2-0.jpg', 'height': 119, 'width': 119, 'x': 125, 'y': 72}, {'name': 'page-2-1.jpg', 'height': 119, 'width': 119, 'x': 246, 'y': 72}, {'name': 'page-2-8.jpg', 'height': 119, 'width': 119, 'x': 367, 'y': 311}, {'name': 'page-2-6.jpg', 'height': 119, 'width': 119, 'x': 125, 'y': 311}, {'name': 'page-2-3.jpg', 'height': 119, 'width': 119, 'x': 125, 'y': 191}, {'name': 'page-2-2.jpg', 'height': 119, 'width': 119, 'x': 367, 'y': 72}, {'name': 'page-2-9.jpg', 'height': 119, 'width': 119, 'x': 125, 'y': 431}, {'name': 'page-2-7.jpg', 'height': 119, 'width': 119, 'x': 246, 'y': 311}, {'name': 'page-2-5.jpg', 'height': 119, 'width': 119, 'x': 367, 'y': 191}, {'name': 'page-2-11.jpg', 'height': 119, 'width': 119, 'x': 367, 'y': 431}, {'name': 'page-2-10.jpg', 'height': 119, 'width': 119, 'x': 246, 'y': 431}]
> Image for page 3: []
> Image for page 4: []
> Image for

In [None]:
# Example of an image description
image_text_nodes[0].get_content()

"The image appears to be a digitally altered or photoshopped picture where a person is dressed in a panda costume and is also wearing a superhero costume, specifically resembling Spider-Man's iconic red and blue suit. The individual is squatting on the ground, with one arm extended forward as if mimicking Spider-Man's web-slinging action. The image is whimsical and humorous, combining elements of wildlife and popular culture. There is a blue vertical stripe obscuring part of the image on the left side."

In [None]:
#Example of an embedded graph description
image_text_nodes[36].get_content()

'The image is a line graph with a white background and a grid. The x-axis is labeled "Percentage of dropped pixels" and ranges from 20 to 80, with increments of 20. The y-axis is labeled "SSIM" and ranges from 0.75 to 0.90, with increments of 0.05. There are two lines representing different data sets plotted on the graph:\n\n1. A blue dashed line with diamond-shaped markers represents "DPS." This line starts at approximately 0.88 SSIM at 20% dropped pixels and decreases steadily to about 0.78 SSIM at 80% dropped pixels.\n\n2. An orange dashed line with square markers represents "PSLD." This line also starts at around 0.88 SSIM at 20% dropped pixels but decreases at a slower rate than the DPS line, ending at about 0.82 SSIM at 80% dropped pixels.\n\nThe graph is used to show the relationship between the percentage of dropped pixels and the Structural Similarity Index (SSIM), a measure of the similarity between two images. The two lines suggest that as more pixels are dropped, the SSIM d

In [None]:
#Another example of an image description
image_text_nodes[100].get_content()

'The image is a close-up portrait of a woman with a neutral expression. She has medium-length brown hair, fair skin, and her eyes are looking directly at the camera. The background is a plain, muted green color, providing a contrast that highlights her features. The woman appears to be wearing minimal makeup with a natural look.'

# Bulding the Index across the Image and Text nodes
Building this here to demo and test queries over this parsed data.

**In production, we'd need to put the content into our vectorDB (Qdrant, VoyageAI).**

> *Going to try and present questions to the model that will require answers from the 3 nodes referenced above.*

In [None]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex(text_nodes + image_text_nodes)

In [None]:
query_engine = index.as_query_engine()

In [None]:
# ask question over image!
response = query_engine.query("What do the graphs titled 'Percentage of dropped pixels' show?")
print(str(response))

The graphs titled 'Percentage of dropped pixels' show the relationship between the percentage of dropped pixels and specific metrics (IPPS for one graph and SSIM for the other graph). Both graphs illustrate how as the percentage of dropped pixels increases, there is an impact on the corresponding metric being measured (IPPS or SSIM). In both cases, there is a clear trend where as more pixels are dropped, the metric value decreases, indicating a negative correlation between the percentage of dropped pixels and the metric being measured.


In [None]:
# ask question over text!
response = query_engine.query("How would you summarize 3 key findings from this research for a non-technical reader?")
print(str(response))

The research findings indicate that as the percentage of dropped pixels increases, the similarity between images, as measured by SSIM, decreases. The data sets DPS and PSLD both show a decline in SSIM as more pixels are dropped, with DPS experiencing a more significant decrease compared to PSLD. This suggests that the method used to drop pixels impacts the similarity between images.
