# LlamaParse With MongoDB

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_mongodb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we provide a straightforward example of using LlamaParse with MongoDB Atlas VectorSearch.

We illustrate the process of using llama-parse to parse a PDF document, then index the document with a MongoDB vector store, and subsequently perform basic queries against this store.

This notebook is structured similarly to quick start guides, aiming to introduce users to utilizing llama-parse in conjunction with a MongoDB Atlas VectorSearch.

### Installation

In [None]:
!pip install llama-index llama-parse pip install llama-index-vector-stores-mongodb llama-index-llms-openai

### Setup API Keys

In [None]:
import os

os.environ[
    "LLAMA_CLOUD_API_KEY"
] = ""  # Get it from https://cloud.llamaindex.ai/api-key
os.environ["OPENAI_API_KEY"] = ""  # Get it from https://platform.openai.com/api-keys

In [None]:
# llama-parse is async-first, running the sync code in a notebook requires the use of nest_asyncio
import nest_asyncio

nest_asyncio.apply()

import requests
import pymongo

from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from llama_parse import LlamaParse
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.node_parser import SimpleNodeParser

### Download Document

We will use `Attention is all you need` paper.

In [None]:
# The URL of the file you want to download
url = "https://arxiv.org/pdf/1706.03762.pdf"
# The local path where you want to save the file
file_path = "./attention.pdf"

# Perform the HTTP request
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Open the file in binary write mode and save the content
    with open(file_path, "wb") as file:
        file.write(response.content)
    print("Download complete.")
else:
    print("Error downloading the file.")

Download complete.


### Parse the document using `LlamaParse`.

In [None]:
documents = LlamaParse(result_type="text").load_data(file_path)

Started parsing the file under job_id 09a49745-9f21-4190-9de8-27e4e1a4bdf5


In [None]:
# Take a quick look at some of the parsed text from the document:
print(documents[0].get_content()[10000:11000])

rmer - model architecture.
The Transformer follows this overall architecture using stacked self-attention and point-wise, fully
connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,
respectively.
3.1   Encoder and Decoder Stacks
Encoder:     The encoder is composed of a stack of N = 6 identical layers. Each layer has two
sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-
wise fully connected feed-forward network. We employ a residual connection [11] around each of
the two sub-layers, followed by layer normalization [1]. That is, the output of each sub-layer is
LayerNorm(x + Sublayer(x)), where Sublayer(x) is the function implemented by the sub-layer
itself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding
layers, produce outputs of dimension dmodel = 512.
Decoder:    The decoder is also composed of a stack of N = 6 identical layers. In addition 

### Create `MongoDBAtlasVectorSearch`.

In [None]:
mongo_uri = os.environ["MONGO_URI"]

mongodb_client = pymongo.MongoClient(mongo_uri)
mongodb_vector_store = MongoDBAtlasVectorSearch(mongodb_client)

### Create nodes.

In [None]:
node_parser = SimpleNodeParser()

nodes = node_parser.get_nodes_from_documents(documents)

### Create Index and Query Engine.

In [None]:
storage_context = StorageContext.from_defaults(vector_store=mongodb_vector_store)

index = VectorStoreIndex(
    nodes=nodes,
    storage_context=storage_context,
    embed_model=OpenAIEmbedding(),
)

In [None]:
query_engine = index.as_query_engine(similarity_top_k=2)

### Test Query

In [None]:
query = "What is BLEU score on the WMT 2014 English-to-German translation task?"

response = query_engine.query(query)
print("\n***********New LlamaParse+ Basic Query Engine***********")
print(response)


***********New LlamaParse+ Basic Query Engine***********
The BLEU score on the WMT 2014 English-to-German translation task is 28.4.


In [None]:
# Take a look at one of the source nodes from the response
print(response.source_nodes[0].get_content())

We varied the learning
rate over the course of training, according to the formula:
               lrate = d−0.5                                                                          (3)
                         model · min(step_num−0.5, step_num · warmup_steps−1.5)
This corresponds to increasing the learning rate linearly for the first warmup_steps training steps,
and decreasing it thereafter proportionally to the inverse square root of the step number. We used
warmup_steps = 4000.
5.4   Regularization
We employ three types of regularization during training:
                                                    7
---
Table 2: The Transformer achieves better BLEU scores than previous state-of-the-art models on the
English-to-German and English-to-French newstest2014 tests at a fraction of the training cost.
        Model                                          BLEU               Training Cost (FLOPs)
                                                 EN-DE      EN-FR          EN-DE     