# Using llama-parse with Elasticsearch vector database


In this notebook, we show a basic RAG-style example that uses `llama-parse` to parse a PDF document, store the corresponding document in the [Elasticsearch vector database](https://www.elastic.co/elasticsearch/vector-database), and perform some basic queries. The notebook is modeled after the quick start notebooks and hence is meant as a way of getting started with `llama-parse`, backed by a vector database.

### Requirements

- Get an Llama Cloud API key from https://cloud.llamaindex.ai
- Get an OpenAI API key from https://openai.com
- Get an Elasticsearch key from [Elastic Cloud](https://cloud.elastic.co/serverless-registration?onboarding_token=vectorsearch) with a free trial.

In [None]:
# First, install the required dependencies
%pip install --quiet llama-index llama-parse llama-index-vector-stores-elasticsearch llama-index-llms-openai

### Configuration

In [None]:
import os
import openai

from getpass import getpass

# Get all required API keys and parameters
llama_cloud_api_key = getpass("Enter your Llama Index Cloud API Key: ")
es_cloud_id = input("Enter your Elastic Cloud ID: ")
es_api_key = getpass("Enter your Elasticsearch API Key: ")
openai_api_key = getpass("Enter your OpenAI API Key: ")

os.environ["LLAMA_CLOUD_API_KEY"] = llama_cloud_api_key
openai.api_key = openai_api_key

Enter your Llama Index Cloud API Key: ··········
Enter your Elastic Cloud ID: elasticsearch-serverless:dXMtZWFzdC0xLmF3cy5lbGFzdGljLmNsb3VkJGE5YmM0NmVmZWQ2NzQ4NDNhOTkzOTE3NDljMTYyNTQ3LmVzJGE5YmM0NmVmZWQ2NzQ4NDNhOTkzOTE3NDljMTYyNTQ3Lmti
Enter your Elasticsearch API Key: ··········
Enter your OpenAI API Key: ··········


In [None]:
# llama-parse is async-first, running the sync code in a notebook requires the use of nest_asyncio
import nest_asyncio

nest_asyncio.apply()

### Using llama-parse to parse a PDF

In [None]:
# Grab a PDF from Arxiv for indexing
import requests

# The URL of the file you want to download
url = "https://arxiv.org/pdf/1706.03762.pdf"
# The local path where you want to save the file
file_path = "./attention.pdf"

# Perform the HTTP request
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Open the file in binary write mode and save the content
    with open(file_path, "wb") as file:
        file.write(response.content)
    print("Download complete.")
else:
    print("Error downloading the file.")

Download complete.


In [None]:
from llama_parse import LlamaParse

documents = LlamaParse(result_type="text").load_data(file_path)

Started parsing the file under job_id cac11eca-bd8d-4ca9-8889-2f3904363116


In [None]:
# Take a quick look at some of the parsed text from the document:
documents[0].get_content()[10000:11000]

'     Forward\n                                                                       Add & Norm\n                                             Add & Norm                  Masked\n                                              Multi-Head                Multi-Head\n                                               Attention                 Attention\n                                                 Input                   Output\n                                              Embedding                Embedding\n                                 Figure 1: The Transformer - model architecture.\n\n\nThe Transformer follows this overall architecture using stacked self-attention and point-wise, fully\nconnected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,\nrespectively.\n\n\n3.1    Encoder and Decoder Stacks\n\n\nEncoder:        The encoder is composed of a stack of N = 6 identical layers. Each layer has two\nsub-layers. The first is a multi-head self-att

### Storing into Elasticsearch Vector Database

In [None]:
from llama_index.vector_stores.elasticsearch import ElasticsearchStore

es_store = ElasticsearchStore(
    index_name="llama-parse-docs",
    es_cloud_id=es_cloud_id,  # found within the deployment page
    es_api_key=es_api_key,  # create an API key within Kibana (Security -> API Keys)
)

In [None]:
from llama_index.core.node_parser import SimpleNodeParser

node_parser = SimpleNodeParser()

nodes = node_parser.get_nodes_from_documents(documents)

In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, StorageContext

storage_context = StorageContext.from_defaults(vector_store=es_store)

index = VectorStoreIndex(
    nodes=nodes,
    storage_context=storage_context,
    embed_model=OpenAIEmbedding(api_key=openai_api_key),
)

### Simple RAG Example

In [None]:
query_engine = index.as_query_engine(similarity_top_k=15)

In [None]:
query = "What is Multi-Head Attention also known as?"

response_1 = query_engine.query(query)
print("\n***********New LlamaParse+ Basic Query Engine***********")
print(response_1)


***********New LlamaParse+ Basic Query Engine***********
Additive attention.


In [None]:
# Take a look at one of the source nodes from the response
response_1.source_nodes[0].get_content()

'The two most commonly used attention functions are additive attention [ 2], and dot-product (multi-\nplicative) attention. Dot-product attention is identical to our algorithm, except for the scaling factor\n    √1dk. Additive attention computes the compatibility function using a feed-forward network with\nof\na single hidden layer. While the two are similar in theoretical complexity, dot-product attention is\nmuch faster and more space-efficient in practice, since it can be implemented using highly optimized\nmatrix multiplication code.\nWhile for small values of dk the two mechanisms perform similarly, additive attention outperforms\ndot product attention without scaling for larger values of dk [3 ]. We suspect that for large values of\ndk, the dot products grow large in magnitude, pushing the softmax function into regions where it has\nextremely small gradients4. To counteract this effect, we scale the dot products by                            √1dk.\n\n\n\n\n\n\n3.2.2     Multi-Hea