## <b><font color='darkblue'>Preface</font></b>
([article source](https://medium.com/@rahulpant.me/langchain-memory-types-in-simple-words-9fc142003567)) <b><font size='3ptx'>Langchain is becoming the secret sauce which helps in LLM’s easier path to production. In this article we delve into the different types of memory / remembering power the LLMs can have by using langchain.</font></b>
![concept](images/1.PNG)

<b>We also look at a sample code and output to explain these memory type.</b>

We have been dealing with chatbots since many years, and from the earlier days of fixed question and robotic answers the chatbots are consistently becoming more ‘human-like’. LangChain’s suite of memory types offers a fascinating glimpse into the future of these interactions.

From the short-term recall of [**ConversationBufferMemory**](https://api.python.langchain.com/en/latest/memory/langchain.memory.buffer.ConversationBufferMemory.html), **which ensures a chatbot remembers your favourite pizza toppings, to the complex web of relationships maintained by Conversation Knowledge Graph Memory**, Lanchchain is revolutionizing the way we think about and interact with AI.

We will look into the following memory types in this article:
* Conversation Buffer Memory
* Conversation Buffer Window Memory
* Conversation Summary Memory
* Conversation Summary Buffer Memory
* Conversation Token Buffer Memory

In [1]:
! pip install -q --upgrade google-generativeai langchain-google-genai chromadb pypdf ipywidgets

In [32]:
import os
import google.generativeai as genai
from IPython.display import display
from IPython.display import Markdown
import textwrap
from dotenv import load_dotenv
import urllib
import warnings
from pathlib import Path as p
from pprint import pprint

import pandas as pd
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.memory.buffer import ConversationBufferMemory
from langchain.memory import (
    ConversationBufferWindowMemory, ConversationSummaryMemory, ConversationTokenBufferMemory, ConversationSummaryBufferMemory)
from langchain.chains.conversation.base import ConversationChain

warnings.filterwarnings("ignore")

# loads the .env file
load_dotenv()

GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)

## <b><font color='darkblue'>ConversationBufferMemory</font>: Buffer for storing conversation memory</b>
<b><font size='3ptx'>As the name suggests, this keeps in memory the conversation history to help contextualize the answer to the next user question.</font></b>

While this sounds very useful, one <b><font color='darkred'>drawback is that it keeps all of history</font></b> (<font color='brown'>upto the max limit of specific LLM</font>) <b><font color='darkred'>and for every questions passes the whole previous discussion</font></b> (<font color='brown'>as tokens</font>) <b><font color='darkred'>to LLM API</font></b>. This can have significant cost impact as API costs are based on number of tokens processed and also the latency impact as conversation grows.

In [6]:
llm = ChatGoogleGenerativeAI(
    model="gemini-pro",google_api_key=GOOGLE_API_KEY,
    temperature=0,convert_system_message_to_human=True)

In [11]:
memory= ConversationBufferMemory()
conversation = ConversationChain(
    memory=memory, llm=llm, verbose=True)

Let's ask our first question:

In [12]:
conversation.predict(input="Hi, my name is John.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is John.
AI:[0m

[1m> Finished chain.[0m


"Hello John, my name is Gemini. It's nice to meet you."

For our second question, the previous chat history will be appended:

In [13]:
conversation.predict(input="I am a software engineer. Tell me what's amazing about being a software engineer.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is John.
AI: Hello John, my name is Gemini. It's nice to meet you.
Human: I am a software engineer. Tell me what's amazing about being a software engineer.
AI:[0m

[1m> Finished chain.[0m


'Being a software engineer is an amazing career for many reasons. First, it is a highly in-demand field, with a growing number of jobs available each year. This means that software engineers are likely to have job security and the opportunity to work on a variety of projects. Second, software engineering is a creative field that allows engineers to use their skills to solve problems and create new products. Third, software engineering is a well-paying field, with salaries that are typically higher than the average for other professions. Finally, software engineering is a rewarding field that allows engineers to make a real difference in the world.'

Then let's ask the information we disclose in the first quesiton:

In [14]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is John.
AI: Hello John, my name is Gemini. It's nice to meet you.
Human: I am a software engineer. Tell me what's amazing about being a software engineer.
AI: Being a software engineer is an amazing career for many reasons. First, it is a highly in-demand field, with a growing number of jobs available each year. This means that software engineers are likely to have job security and the opportunity to work on a variety of projects. Second, software engineering is a creative field that allows engineers to use their skills to solve problems and create new products. Third, software engineering is a well-paying field, with salaries t

'John'

The LLM could correctly answer the question and know my name is `John`!

## <b><font color='darkblue'>ConversationBufferWindowMemory</font>: Buffer for storing conversation memory inside a limited size window.</b>
<b><font size='3ptx'>In our conversations we usually do not need all last 5–10 conversation history but definitely the last few. </font></b>

<b>This type of memory helps define “K”, the number of last few conversations it should remember</b>. It simply tells the LLM, remember the last few discussions and forget all of the rest!

In [18]:
memory = ConversationBufferWindowMemory(k=1)

## Adding COnversational in Memory as an example
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What's your plan today?"},
                    {"output": "Have a coffee and enjoy the day."})


conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False)
memory.load_memory_variables({})

{'history': "Human: What's your plan today?\nAI: Have a coffee and enjoy the day."}

From the output, because of `k=1`, we would only keep the last intput/output in the history.

## <b><font color='darkblue'>[ConversationSummaryMemory](https://api.python.langchain.com/en/latest/memory/langchain.memory.summary.ConversationSummaryMemory.html)</font>: Conversation summarizer to chat memory.</b>
<b><font size='3ptx'>Instead of remembering the exact conversation, can we summarize the previous conversation context and hence help the LLM in answering the upcoming question?</font></b>

<b>This is how Summary Memory helps. It keeps on summarizing the previous context and maintains it for use in next discussion.</b>

In the below example we create a bit of historical conversation and a scenario. The summary memory instead of remembering the exact conversation remembers a summary and answers using that context:

In [20]:
# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

In [21]:
memory= ConversationSummaryMemory(llm=llm, max_token_limit=100)

memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [22]:
memory.load_memory_variables({})

{'history': "The human says hello to the AI. The AI asks what's up. The human says they're just hanging out. The human asks what is on the schedule today. The AI says there is a meeting at 8am with the product team, time to work on the LangChain project from 9am-12pm, and lunch at an Italian restaurant with a customer at Noon."}

In [23]:
conversation = ConversationChain(
    memory=memory, llm=llm, verbose=True)

In [26]:
output = conversation.predict(input="What time is the meeting with the product team?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
The human says hello to the AI. The AI asks what's up. The human says they're just hanging out. The human asks what is on the schedule today. The AI says there is a meeting at 8am with the product team, time to work on the LangChain project from 9am-12pm, and lunch at an Italian restaurant with a customer at Noon. The human asks what cool demo in laptop can he show. The AI says it does not have that information in its context.
Human: What time is the meeting with the product team?
AI:[0m

[1m> Finished chain.[0m


In [27]:
output

'8am'

## <b><font color='darkblue'>[ConversationTokenBufferMemory](https://api.python.langchain.com/en/latest/memory/langchain.memory.token_buffer.ConversationTokenBufferMemory.html): Conversation chat memory with token limit.</font></b>
<b><font size='3ptx'>Instead of “k” conversations being remembered in [ConversationBufferWindowMemory](https://api.python.langchain.com/en/latest/memory/langchain.memory.token_buffer.ConversationTokenBufferMemory.html), in this case we want to remember last set of discussion based on “max token limit”.</font></b>

As seen in below example different token limit remembers different size of discussion depending on token limit defined

In [29]:
memory=ConversationTokenBufferMemory(llm=llm, max_token_limit=25)

memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

memory.load_memory_variables({})

{'history': 'Human: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

In [30]:
memory=ConversationTokenBufferMemory(llm=llm, max_token_limit=50)

memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

memory.load_memory_variables({})

{'history': 'Human: AI is what?!\nAI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

## <b><font color='darkblue'>[ConversationSummaryBufferMemory](https://api.python.langchain.com/en/latest/memory/langchain.memory.summary_buffer.ConversationSummaryBufferMemory.html): Buffer with summarizer for storing conversation memory.</font></b>
<b><font size='3ptx'>While summary is good, we know that recent/last conversation has high correlation to upcoming query and a mix of summary of old conversation with a buffer memory of last few conversation would be a good combination.</font></b>

This type of memory exactly does that. You can set the token limit which define how much historical conversation to be kept and how much to summarize. Higher the token size more the exact conversation history kept as-is. ([more](https://python.langchain.com/v0.1/docs/modules/memory/types/summary_buffer/))

In [33]:
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10)

memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})

{'history': 'System: The human says "hi" and the AI responds with "whats up".\nHuman: not much you\nAI: not much'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

In [34]:
memory = ConversationSummaryBufferMemory(
    llm=llm, max_token_limit=10, return_messages=True
)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

We can also utilize the predict_new_summary method directly.

In [35]:
messages = memory.chat_memory.messages
messages

[HumanMessage(content='not much you'), AIMessage(content='not much')]

In [36]:
previous_summary = ""
memory.predict_new_summary(messages, previous_summary)

'The human and the AI both say they are not doing much.'

## <b><font color='darkblue'>More</font></b>
<b><font size='3ptx'>Some other memory types also exists which I am yet to explore namely...</font></b>
* [**ConversationEntityMemory**](https://api.python.langchain.com/en/latest/memory/langchain.memory.entity.ConversationEntityMemory.html): This extracts and stores key entities (like people, places etc.) from the discussion.
* [**Conversation Knowledge Graph Memory**](https://api.python.langchain.com/en/latest/memory/langchain.memory.kg.ConversationKGMemory.html) : Uses a knowledge graph to store information and relationships between entities.
* [**VectorStore-Backed Memory**](https://js.langchain.com/v0.1/docs/modules/memory/types/vectorstore_retriever_memory/): Uses vector embeddings to store and retrieve information based on semantic similarity.

## <b><font color='darkblue'>Supplement</font></b>
* [Pinecone - Conversational Memory for LLMs with Langchain](https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/)
> Conversational memory is how a chatbot can respond to multiple queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions.
* [Langchain doc - Memory](https://python.langchain.com/v0.1/docs/modules/memory/)
> Most LLM applications have a conversational interface. An essential component of a conversation is being able to refer to information introduced earlier in the conversation...
* [DeepLearning.AI - LangChain for LLM Application Development: Memory](https://learn.deeplearning.ai/courses/langchain/lesson/3/memory)