# LangChain: Memory
When we interact with LLM, usualy they dont remember the past history and its problem with chatbots building. So we will build, how can we do conversation utilizing memory.

## Outline
* ConversationBufferMemory
    - This memory allows for storing of messages and then extract the messages in a variable
* ConversationBufferWindowMemory
    - This memory keeps a list of the interations of the conversation over time. it onlybuses the last $k$ intearations
    
* ConversationTokenBufferMemory
    - This memory keeps a buffer of recent interations in memory and usues tokens length rather than number of interations to determing when to flush interations
* ConversationSummaryMemory
    - This memory creates a suummary of the covnersation over time.

## Additional memory Types
* Vector data memory
    - Stores text (from conversations or elsewhere) in a vector database and retrieves the most relevant blocks of text
* Entity memory
    - Using an LLM, it rememebers details about specific entities (people)
 
You can also use multiple memories at a time. E.g. Conversation memory + Entity memory to recall endividuals

You can also store the conversation in a conventional database (such as key -value store or SQL)

## ConversationBufferMemory

In [None]:
import os

# import openAI key
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video.

In [None]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

In [None]:
# import some other tools from langchain that we will need
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory


In [None]:
# A good example of using memory is to use langchain to manage a chat or
# a chatbot conversation.

# lets setup an LLM as a chat interface with temperature 0
llm = ChatOpenAI(temperature=0.0, model=llm_model)
# we are using memory as conversation buffer memory
memory = ConversationBufferMemory()

# build a conversation chain. We will see later what is chain.
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True # you can choose `false` to not show waht Lanchain is doing 
)

In [None]:
# Start conversation
conversation.predict(input="Hi, my name is Abhishek")

In [None]:
# Ask another conversation
conversation.predict(input="What is 1+1?")

In [None]:
# Now cehck if llm can remember my name. When you set verbose=True you will 
# see that llm is storing the conversation. History of conversation 
conversation.predict(input="What is my name?")

In [None]:
# at the start we defined memory variable to store the hoistory
print(memory.buffer)

In [None]:
# {} is empty dioctionary. we can use other features as input. this is what langchain is remebered. 
memory.load_memory_variables({})

In [None]:
# The way langchain is storing history is uing `ConversationBufferMemory`
memory = ConversationBufferMemory()

In [None]:
# add additonal things in the input
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

In [None]:
print(memory.buffer)

In [None]:
memory.load_memory_variables({})

In [None]:
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

In [None]:
memory.load_memory_variables({})

## ConversationBufferWindowMemory
when using LLM for conversation. 

- The LLM is actually stateless. 
    - Each transaction is independent
    - chatbots  appear to have memory by providing the full conversation as 'context'
    
So memory story all the conversation and it is used as an input to LLm as a context. So that it can generate an output.

As the memory storage become lomng its getting expensive to send tokens to the LLM. Langchain provides differnt kinds of memory and accumulate the conversation.

In [None]:
# lets look other type of memory. It only contains window of a memory
from langchain.memory import ConversationBufferWindowMemory

In [None]:
# k=1 means I want to remeber only one conversational exchange.
memory = ConversationBufferWindowMemory(k=1)               

In [None]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})

memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})


In [None]:
# It only rememebr most recent conversation.
memory.load_memory_variables({})

In [None]:
# Rerun the conversation that we have
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False # change this to true and see what LLm is doing and how memory change.
)

In [None]:
conversation.predict(input="Hi, my name is Abhishek")

In [None]:
conversation.predict(input="What is 1+1?")

In [None]:
# because k=1 , we only remeber laterst conversation
conversation.predict(input="What is my name?")

Exercise: change this to true and see what LLm is doing and how memory change.

## ConversationTokenBufferMemory
With `ConversationTokenBufferMemory` the memory will limit the number of tokens saved, and because a lot of LLM pricing is based on tokens. That maps directly to the cost of the cores.

In [None]:
#!pip install tiktoken

In [None]:
from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0, model=llm_model)

In [None]:
# Limit the tokens to 50 `max_token_limit=50`
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)

memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})

memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})

memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [None]:
memory.load_memory_variables({})

Exercise: Change the token size and check how much it is stored. Differnt LLM has differnt way of counting token.

Change the prompt and see if it change.

## ConversationSummaryMemory
instead of limiting the memory to fix number of tokens, based on most recent utterance or a fixed number of conversational exchanges. 

Lets use an LLM to write a summary of the conversation so far, and let that be the memory.

In [None]:
from langchain.memory import ConversationSummaryBufferMemory

In [None]:
# create a long string about an schedule
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

In [None]:
# create a conversational buffer memory
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=500)

memory.save_context({"input": "Hello"}, {"output": "What's up"})

memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})

memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

# Now the memory has a lot of information, because the size of tokens are enough
# but if we reduce it will generate only latest conversaiton

In [None]:
memory.load_memory_variables({})

In [None]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [None]:
conversation.predict(input="What would be a good demo to show?")
# you can se system message but its not the openAI API system message.

In [None]:
# check what happend to the memory.
memory.load_memory_variables({})

#  It has encorporated the most recent conversation whereas the human 
# utterance is encorporated in the system message.

With the conversation summary buffer memory, what it tries to do is keep the explicit storage of the mesages upto the number of tokens defined and anything beyond that we will see in the history.