# This Notebook shows  and create a vector search index using the MongoDB Atlas GUI and LangChain

*   how can we store vector embeddings in MongoDB documents
*   create a vector search index using the MongoDB Atlas GUI
*   perform KNN search using Approximate Nearest Neighbors algorithm which uses the Hierarchical Navigable Small World (HSNW) graphs

# and also throws some light on

1.   Comparing textual and fuzzy search with semantic search
2.   How retrieval architecture helps



# Deploy the MongoDB Atlas Cluster, you can refer [quick start](https://www.mongodb.com/docs/atlas/getting-started/) and then install the required libraries

In [None]:
!pip install pymongo
!pip install --upgrade langchain
!pip install --upgrade OpenAI
!pip install --upgrade tiktoken



##Get the MongoDB Cluster URI

In [None]:
import os
import getpass

MONGODB_ATLAS_CLUSTER_URI = getpass.getpass("MongoDB Atlas Cluster URI:")

MongoDB Atlas Cluster URI:··········


## Get the Open AI API Key from the user (this is the LLM we will be using for this demo)

In [None]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")



OpenAI API Key:··········



Let's create a database named LLMDemo and collection named state_union and create a vector search index LangChainDemo in MongoDB Atlas GUI using the below mapping. See [quick start](https://www.mongodb.com/docs/atlas/atlas-search/field-types/knn-vector/)
```
# {
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "dimensions": 1536,
        "similarity": "cosine",
        "type": "knnVector"
      }
    }
  }
}
```



### Import LLM framework LangChain packages

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import MongoDBAtlasVectorSearch
from langchain.document_loaders import TextLoader


### Read the data set and create documents and the embeddings


In [None]:
loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

Connect to the database and collection created using MongoClient from pymongo and specify the vector search index

In [None]:
from pymongo import MongoClient

# initialize MongoDB python client
client = MongoClient(MONGODB_ATLAS_CLUSTER_URI)

db_name = "LLMDemo"
collection_name = "state_union"
collection = client[db_name][collection_name]
index_name = "LangChainDemo"


In [None]:
prompt = "What did the president say about Covid-19"


### Insert the documents created from the data set and create vector embeddings, to be done only in the first execution

In [None]:
# # insert the documents in MongoDB Atlas with their embedding
# docsearch = MongoDBAtlasVectorSearch.from_documents(
#     docs, embeddings, collection=collection, index_name=index_name
# )
# # perform a similarity search between the embedding of the query and the embeddings of the documents
# query = "What did the president say about Ketanji Brown Jackson"
# docs = docsearch.similarity_search(query)

# print(docs[0].page_content)

### Let's run a textual search and see the result. Too many results

In [None]:
import pymongo
db = client.LLMDemo
db.state_union.create_index([("text", pymongo.TEXT)])
res = db.state_union.find( { '$text': { '$search': prompt} } )
#print(db.state_union.index_information())
#print(res)
for doc in res:
  print(doc["text"])

And based on the projections, more of the country will reach that point across the next couple of weeks. 

Thanks to the progress we have made this past year, COVID-19 need no longer control our lives.  

I know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. 

We will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard. 

Here are four common sense steps as we move forward safely.  

First, stay protected with vaccines and treatments. We know how incredibly effective vaccines are. If you’re vaccinated and boosted you have the highest degree of protection. 

We will never give up on vaccinating more Americans. Now, I know parents with kids under 5 are eager to see a vaccine authorized for their children. 

The scientists are working hard to get that done and we’ll be ready with plenty of vaccines when they do.
And let’s pass the PRO Act when 

### Let's create a vector store and perform a vector search using the MongoDB Atlas Vector Search and see that we have more contextual results!


In [None]:
# initialize vector store
vectorstore = MongoDBAtlasVectorSearch(
    collection, OpenAIEmbeddings(), index_name=index_name
)

# perform a similarity search between a query and the ingested documents
docs = vectorstore.similarity_search(prompt)
print(prompt)
print(docs[0].page_content)

What did the president say about Covid-19
And based on the projections, more of the country will reach that point across the next couple of weeks. 

Thanks to the progress we have made this past year, COVID-19 need no longer control our lives.  

I know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. 

We will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard. 

Here are four common sense steps as we move forward safely.  

First, stay protected with vaccines and treatments. We know how incredibly effective vaccines are. If you’re vaccinated and boosted you have the highest degree of protection. 

We will never give up on vaccinating more Americans. Now, I know parents with kids under 5 are eager to see a vaccine authorized for their children. 

The scientists are working hard to get that done and we’ll be ready with plenty of vaccines when