# MongoDB Atlas

In the notebook we will demonstrate how to perform Retrieval Augmented Generation (RAG) using MongoDB Atlas, OpenAI and Langchain. We will be performing Similarity Search, Similarity Search with Metadata Pre-Filtering, and Question Answering.

In [1]:
#install dependencies:
%pip install -Uq langchain pymongo openai tiktoken pypdf

Note: you may need to restart the kernel to use updated packages.


In [6]:
import os

os.environ["OPENAI_API_KEY"] = ""

In [1]:
from pymongo import MongoClient

# initialize MongoDB python client
client = MongoClient("mongodb+srv://<username>:<password>@<cluster>.t63mnon.mongodb.net/?retryWrites=true&w=majority")

# Whatever these are set to, if they don't exist, will be created
DB_NAME = "mongo-rag"
COLLECTION_NAME = "test-user-1"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "vector_index"

MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]

In [2]:
from langchain_community.document_loaders.pdf import PyPDFDirectoryLoader

# In this example, there's only one document to load from the directory
loader = PyPDFDirectoryLoader("./data")
data = loader.load()

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = text_splitter.split_documents(data)

In [4]:
print(docs[0])

page_content='KENNY GUNDERMAN\nZERO TO FULL-TIME\nPROGRAMMER IN 5 STEPS' metadata={'source': 'data\\Zero to Full-Time Programmer in 5 Steps.pdf', 'page': 0}


In [7]:
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import MongoDBAtlasVectorSearch

# insert the documents in MongoDB Atlas with their embedding
vector_search = MongoDBAtlasVectorSearch.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings(disallowed_special=()),
    collection=MONGODB_COLLECTION,
    index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
)

In [18]:
# Perform a similarity search between the embedding of the query and the embeddings of the documents
query = "Who's Kenny?"
results = vector_search.similarity_search(query)

print(results[0].page_content)

KENNY GUNDERMAN
ZERO TO FULL-TIME
PROGRAMMER IN 5 STEPS


In [9]:
query = "How do I become a programmer?"

results = vector_search.similarity_search_with_score(
    query=query, 
    k=5,
)

# Display results
for result in results:
    print(result)

(Document(page_content='Before we begin, I just want to say congratulations. Seriously, hats off to you for taking the \nfirst step (or one of the first steps) into the realm of learning how to code. I am thrilled that \nyou’ve downloaded this eBook.\nThere is a reason why you’re here. Maybe that’s because you see programming as a lucra -\ntive career path (it is) . Or maybe it’s because you’re sick of your current life circumstances \nand you’re looking for work that is more rewarding. Or perhaps it’s both.\nWhatever your reasoning may be, I hope this book provides you with valuable insights and \nactionable steps to go from Zero To Full-Time Programmer.\nLearning how to code may be one of the toughest obstacles you tackle in your life. Any -\nthing worth having will require much sacrifice. Nevertheless, If you see this through it will \nbe one of the most rewarding things you ever accomplish. To quote an author from one of \nmy favorite books, The Obstacle Is The Way:', metadata={'_i

# Question Answering

In [10]:
qa_retriever = vector_search.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 25},
)

In [11]:
from langchain.prompts import PromptTemplate

prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [17]:
from langchain.chains import RetrievalQA
from langchain_community.chat_models import ChatOpenAI

qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model_name="gpt-3.5-turbo-1106", temperature=0.7),
    chain_type="stuff",
    retriever=qa_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT},
)

docs = qa({"query": "how to become programmer. answer with a brief statement and then 5 bullet points."})

print(docs["result"])
print(docs["source_documents"])

To become a programmer, you can start by learning how to code and building a strong portfolio. Here are five key steps to becoming a programmer:

1. Choose a starting point and programming language to learn.
2. Focus on building a solid tech stack and understanding computer science principles.
3. Build a portfolio of software applications to showcase your skills.
4. Don't get discouraged if concepts don't click right away; keep practicing and learning.
5. Utilize free resources and online courses to continue learning and improving your skills.
[Document(page_content='KENNY GUNDERMAN\nZERO TO FULL-TIME\nPROGRAMMER IN 5 STEPS', metadata={'_id': ObjectId('65921fc30fa0b0c69566d632'), 'embedding': [0.005376497757617127, 0.0029902795841947258, 0.021110348436349, -0.013162175213938934, -0.000135781301862731, 0.031227489982031738, -0.028825376227711676, -0.014794200681851262, -0.04719448842011744, -0.024021146856426395, 0.01750717687354633, 0.032555718025631106, 0.00029717336338685606, 0.00242