# Using llama-parse with AstraDB

In this notebook, we show a basic RAG-style example that uses `llama-parse` to parse a PDF document, store the corresponding document into a vector store (`AstraDB`) and finally, perform some basic queries against that store. The notebook is modeled after the quick start notebooks and hence is meant as a way of getting started with `llama-parse`, backed by a vector database.

### Requirements

In [1]:
# First, install the required dependencies
!pip install --quiet llama-index llama-parse llama-index-vector-stores-astra llama-index-llms-openai astrapy

### Configuration

In [2]:
import os
import openai

from getpass import getpass

# Get all required API keys and parameters
llama_cloud_api_key = getpass("Enter your Llama Index Cloud API Key: ")
api_endpoint = input("Enter your Astra DB API Endpoint: ")
token = getpass("Enter your Astra DB Token: ")
namespace = input("Enter your Astra DB namespace (optional, must exist on Astra): ") or None
openai_api_key = getpass("Enter your OpenAI API Key: ")

os.environ["LLAMA_CLOUD_API_KEY"] = llama_cloud_api_key
openai.api_key = openai_api_key

In [3]:
# llama-parse is async-first, running the sync code in a notebook requires the use of nest_asyncio
import nest_asyncio

nest_asyncio.apply()

### Using llama-parse to parse a PDF

In [4]:
# Grab a PDF from Arxiv for indexing
import requests 

# The URL of the file you want to download
url = "https://arxiv.org/pdf/1706.03762.pdf"
# The local path where you want to save the file
file_path = "./attention.pdf"

# Perform the HTTP request
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Open the file in binary write mode and save the content
    with open(file_path, "wb") as file:
        file.write(response.content)
    print("Download complete.")
else:
    print("Error downloading the file.")

Download complete.


In [5]:
from llama_parse import LlamaParse

documents = LlamaParse(result_type="text").load_data(file_path)

Started parsing the file under job_id ce3909a7-54cf-438b-849a-fe9a903b0c71


In [6]:
# Take a quick look at some of the parsed text from the document:
documents[0].get_content()[10000:11000]

'rmer - model architecture.\nThe Transformer follows this overall architecture using stacked self-attention and point-wise, fully\nconnected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,\nrespectively.\n3.1   Encoder and Decoder Stacks\nEncoder:     The encoder is composed of a stack of N = 6 identical layers. Each layer has two\nsub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-\nwise fully connected feed-forward network. We employ a residual connection [11] around each of\nthe two sub-layers, followed by layer normalization [1]. That is, the output of each sub-layer is\nLayerNorm(x + Sublayer(x)), where Sublayer(x) is the function implemented by the sub-layer\nitself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding\nlayers, produce outputs of dimension dmodel = 512.\nDecoder:    The decoder is also composed of a stack of N = 6 identical layers.

### Storing into Astra DB

In [7]:
from llama_index.vector_stores.astra import AstraDBVectorStore

astra_db_store = AstraDBVectorStore(
    token=token,
    api_endpoint=api_endpoint,
    namespace=namespace,
    collection_name="astra_v_table_llamaparse",
    embedding_dimension=1536
)

In [8]:
from llama_index.core.node_parser import SimpleNodeParser

node_parser = SimpleNodeParser()

nodes = node_parser.get_nodes_from_documents(documents)

In [9]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, StorageContext

storage_context = StorageContext.from_defaults(vector_store=astra_db_store)

index = VectorStoreIndex(
    nodes=nodes,
    storage_context=storage_context,
    embed_model=OpenAIEmbedding(api_key=openai_api_key),
)

### Simple RAG Example

In [10]:
query_engine = index.as_query_engine(similarity_top_k=15)

In [11]:
query = "What is Multi-Head Attention also known as?"

response_1 = query_engine.query(query)
print("\n***********New LlamaParse+ Basic Query Engine***********")
print(response_1)


***********New LlamaParse+ Basic Query Engine***********
Multi-Head Attention is also known as multi-headed self-attention.


In [12]:
# Take a look at one of the source nodes from the response
response_1.source_nodes[0].get_content()

'We used beam search as described in the previous section, but no\ncheckpoint averaging. We present these results in Table 3.\nIn Table 3 rows (A), we vary the number of attention heads and the attention key and value dimensions,\nkeeping the amount of computation constant, as described in Section 3.2.2. While single-head\nattention is 0.9 BLEU worse than the best setting, quality also drops off with too many heads.\nIn Table 3 rows (B), we observe that reducing the attention key size dk hurts model quality. This\nsuggests that determining compatibility is not easy and that a more sophisticated compatibility\nfunction than dot product may be beneficial. We further observe in rows (C) and (D) that, as expected,\nbigger models are better, and dropout is very helpful in avoiding over-fitting. In row (E) we replace our\nsinusoidal positional encoding with learned positional embeddings [9], and observe nearly identical\nresults to the base model.\n6.3    English Constituency Parsing\nTo eva