# Adding Semantic Caching and Memory to your RAG Application using MongoDB and LangChain

In this notebook, we will see how to use the new MongoDBCache and MongoDBChatMessageHistory in your RAG application.


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/rag/mongodb-langchain-cache-memory.ipynb)

## Step 1: Install required libraries

- **datasets**: Python library to get access to datasets available on Hugging Face Hub

- **langchain**: Python toolkit for LangChain

- **pymongo**: Python toolkit for MongoDB

In [1]:
! pip install -qU datasets langchain langchain-mongodb langchain-openai pymongo

## Step 2: Setup pre-requisites

* Set the MongoDB connection string. Follow the steps [here](https://www.mongodb.com/docs/manual/reference/connection-string/) to get the connection string from the Atlas UI.

* Set the OpenAI API key. Steps to obtain an API key as [here](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)

In [2]:
import getpass

In [3]:
MONGODB_URI = getpass.getpass("Enter your MongoDB connection string:")

Enter your MongoDB connection string:········


In [4]:
OPENAI_API_KEY = getpass.getpass("Enter your OpenAI API key:")

Enter your OpenAI API key:········


In [7]:
# Optional-- If you want to enable Langsmith -- good for debugging
# import os
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

········


## Step 3: Download the dataset

In [5]:
from datasets import load_dataset
import pandas as pd

In [6]:
# Use streaming=True to load the dataset without downloading it fully
data = load_dataset("MongoDB/tech-news-embeddings", split="train", streaming=True)
# Get first 25k records from the dataset
data_head = data.take(25000)

Downloading readme:   0%|          | 0.00/7.04k [00:00<?, ?B/s]

Resolving data files:   0%|          | 0/42 [00:00<?, ?it/s]

In [7]:
df = pd.DataFrame(data_head)

## Step 4: Data analysis

Make sure length of the dataset is what we expect, drop Nones etc.

In [8]:
# Ensuring length of dataset is what we expect i.e. 25k
len(df)

25000

In [9]:
# Previewing the contents of the data
df.head()

Unnamed: 0,_id,companyName,companyUrl,published_at,url,title,main_image,description,embedding
0,65c63ea1f187c085a866f680,01Synergy,https://hackernoon.com/company/01synergy,2023-05-16 02:09:00,https://www.businesswire.com/news/home/2023051...,onsemi and Sineng Electric Spearhead the Devel...,https://firebasestorage.googleapis.com/v0/b/ha...,(Nasdaq: ON) a leader in intelligent power and...,"[0.05243798345327377, -0.10347484797239304, -0..."
1,65c63ea2f187c085a866f681,01Synergy,https://hackernoon.com/company/01synergy,2023-05-02 00:07:00,https://elkodaily.com/news/local/adobe-student...,Adobe student receives national Information an...,https://firebasestorage.googleapis.com/v0/b/ha...,ELKO — An eighth grader at Adobe Middle School...,"[0.0036485784221440554, -0.05992984399199486, ..."
2,65c63ea2f187c085a866f682,01Synergy,https://hackernoon.com/company/01synergy,2023-05-01 22:22:00,https://www.aei.org/technology-and-innovation/...,Modernizing State Services: Harnessing Technol...,https://firebasestorage.googleapis.com/v0/b/ha...,To deliver 21st-century government services Go...,"[0.012319465167820454, -0.0807630866765976, 0...."
3,65c63ea2f187c085a866f683,01Synergy,https://hackernoon.com/company/01synergy,2023-05-02 13:12:00,https://www.crn.com/news/managed-services/terr...,Terry Richardson On Why He Left AMD GreenPages...,https://firebasestorage.googleapis.com/v0/b/ha...,In February GreenPages acquired Toronto-based ...,"[-0.02363203465938568, 0.021521812304854393, 0..."
4,65c63ea7f187c085a866f684,01Synergy,https://hackernoon.com/company/01synergy,2023-05-15 20:01:00,https://www.benzinga.com/pressreleases/23/05/3...,Synex Renewable Energy Corporation (Formerly S...,https://firebasestorage.googleapis.com/v0/b/ha...,The conference will bring together growth orie...,"[0.08473014086484909, -0.07019763439893723, 0...."


In [10]:
# Only keep records where the description field is not null
df = df[df["description"].notna()]

In [11]:
# Created a list of texts to embed
texts = df["description"].tolist()

## Step 5: Create a simple RAG chain using MongoDB as the vector store

In [12]:
from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

from pymongo import MongoClient

In [13]:
# Initializing the OpenAI model for embeddings
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model="text-embedding-3-small")

In [14]:
# Initializing the MongoDB client
mongo_client = MongoClient(MONGODB_URI)
# Defining the database name, collection name and vector search index name
db = "langchain_chatbot"
data_collection = mongo_client[db]["data"]
index_name = "vector_index"
# Delete documents from the collection, if any exist
data_collection.delete_many({})

# Create a MongoDB Atlas Search vector store 
vectorstore = MongoDBAtlasVectorSearch.from_texts(texts[:1000], embeddings, collection=data_collection, index_name=index_name)

In [15]:
# Use the MongoDB Atlas vector store as a retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

In [16]:
# Generate context using the retriever, and pass the user question through
retrieve = {"context": retriever, "question": RunnablePassthrough()}
template = """Answer the question based only on the following context: \
{context}

Question: {question}
"""
# Defining the chat prompt
prompt = ChatPromptTemplate.from_template(template)
# Defining the model to be used for chat completion
model = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo")
# Parse output as a string
parse_output = StrOutputParser()

# Naive RAG chain 
naive_rag_chain = (
    retrieve
    | prompt
    | model
    | parse_output
)

In [34]:
naive_rag_chain.invoke("Tell me about Fidelity Financial Services.")

'Fidelity National Information Services Inc. is a financial technology solutions provider headquartered in Jacksonville, Florida.'

## Step 5: Create Semantic Caching and Memory


In [22]:
from langchain_mongodb.cache import MongoDBAtlasSemanticCache
from langchain_core.globals import set_llm_cache

set_llm_cache(MongoDBAtlasSemanticCache(
    connection_string=MONGODB_URI,
    embedding=embeddings,
    collection_name="semantic_cache",
    database_name="langchain_chatbot",
    index_name="vector_index",
    wait_until_ready=True # Optional, waits until the cache is ready to be used
))

In [23]:
naive_rag_chain.invoke("Tell me about Samsung Electronics.")

'Samsung Electronics Co. Ltd. is a world leader in advanced semiconductor technology. They recently unveiled their latest innovations in analog and logic semiconductor technologies and outlined their blueprint for upcoming technological advancements.'

In [24]:
naive_rag_chain.invoke("What do you know about Samsung Electronics?")

'Based on the provided context, Samsung Electronics Co. Ltd. is described as a world leader in advanced semiconductor technology. They have unveiled their latest innovations in analog and logic semiconductor technologies and outlined their blueprint for upcoming technological advancements.'

## Step 6: Create a RAG chain with chat history

In [35]:
from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import MessagesPlaceholder

In [36]:
history_coll_name = "history"

In [44]:
def get_session_history(session_id: str) -> MongoDBChatMessageHistory:
        return MongoDBChatMessageHistory(MONGODB_URI, session_id, database_name=db, collection_name=history_coll_name)

In [45]:
# Given a follow-up question and history, create a standalone question
standalone_system_prompt = """
Given a chat history and a follow-up question, rephrase the follow-up question to be a standalone question. \
Do NOT answer the question, just reformulate it if needed, otherwise return it as is. \
Only return the final standalone question. \
"""
standalone_question_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", standalone_system_prompt),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{question}"),
    ]
)

question_chain = standalone_question_prompt | model | parse_output

In [46]:
# Generate context by passing output of the question_chain i.e. the standalone question to the retriever
retriever_chain = RunnablePassthrough.assign(context=question_chain | retriever)

In [47]:
# Create a prompt that includes the context, history and the follow-up question
rag_system_prompt = """Answer the question based only on the following context. If you don't know the answer, say 'I don't know!': \
{context}
"""
rag_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", rag_system_prompt),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{question}"),
    ]
)

In [48]:
# RAG chain
rag_chain = (
    retriever_chain
    | rag_prompt
    | model
    | parse_output
)

In [49]:
# RAG chain with history
with_message_history = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)
with_message_history.invoke({"question": "Tell me about Fidelity Financial Services."}, {"configurable": {"session_id": "1"}})

"I don't know!"

In [50]:
with_message_history.invoke({"question": "Are you sure you don't know?"}, {"configurable": {"session_id": "1"}})

"I don't have the information about Fidelity Financial Services in the provided context."

In [51]:
with_message_history.invoke({"question": "How about Fidelity National Information Services?"}, {"configurable": {"session_id": "1"}})

'Fidelity National Information Services Inc. (NYSE:FIS) is a financial technology solutions provider headquartered in Jacksonville, Florida.'