# LangChain: Memory

By default, LLMs do not remember anything. But we can achieve this via:
- ConversationBufferMemory
- ConversationBufferWindowMemory
- ConversationTokenBufferMemory
- ConversationSummaryBufferMemory

---

## Setup

In [None]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
openai.api_type = os.environ.get("OPENAI_API_TYPE")
openai.api_base = os.environ.get("OPENAI_API_BASE")
openai.api_key = os.environ.get("OPENAI_API_KEY")
openai.api_version = os.environ.get("OPENAI_API_VERSION")

## ConversationBufferMemory
- Unlimited memory.
- Stores all conversations.
- Entire conversation is sent to LLM every time.
- Can reach token limit quickly.
- Costly as the tokens sent to LLM will be increasing with every ping.

In [None]:
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

In [None]:
chat = AzureChatOpenAI(
    deployment_name="gpt4",
    openai_api_version="2023-03-15-preview",
    temperature=0,
)

In [None]:
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=chat, memory=memory, verbose=True)

In [None]:
conversation.predict(input="Hi, my name is Alice?")

In [None]:
conversation.predict(input="What's 1+1?")

In [None]:
conversation.predict(input="What's my name?")

In [None]:
memory.buffer

In [None]:
memory.load_memory_variables({})

### Modify memory

In [None]:
memory2 = ConversationBufferMemory()
memory2.save_context({"input": "Hi"}, {"output": "What's up?"})
memory2.buffer

In [None]:
memory2.save_context({"input": "Nothing much, just hanging."}, {"output": "Cool."})
memory2.buffer

## ConversationBufferWindowMemory
- Only keeps a window of memory.
- `k` controls the window size.
- prevents token limit from being reached.
- not as costly as `ConversationBufferMemory` since number of tokens does not keep on growing.

In [None]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=2)
memory.save_context({"input": "Hi"}, {"output": "What's up?"})
memory.save_context({"input": "Nothing much, just hanging."}, {"output": "Cool."})
memory.save_context({"input": "What's 1+1?"}, {"output": "2"})
memory.buffer  # shows entire buffer

In [None]:
memory.load_memory_variables({})  # shows only last k items

In [None]:
conversation = ConversationChain(llm=chat, memory=memory, verbose=True)
conversation.predict(input="What's my name?")

## ConversationTokenBufferMemory
- memory will limit the number of tokens saved.
- more control on the cost since we can control the number of tokens sent to LLM.
- need to also provide the `llm` along with `max_token_limit` (since, different llm count tokens differently).

In [None]:
from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=chat, max_token_limit=40)
memory.save_context({"input": "Hi"}, {"output": "What's up?"})
memory.save_context({"input": "Nothing much, just hanging."}, {"output": "Cool."})
memory.save_context({"input": "What's 1+1?"}, {"output": "2"})
memory.buffer  # shows only last k tokens

In [None]:
memory.load_memory_variables({})  # shows only last k tokens

## ConversationSummaryBufferMemory
- stores the summary of the conversation in memory.
- requires `llm` along with `max_token_limit`.
- tokens are counted based on the `llm` provided.
- if `max_token_limit` is reached, rest of the conversation is summarized using `llm` and stored in the memory as system message along with unsummarized last `max_token_limit` tokens.

In [None]:
from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(llm=chat, max_token_limit=40)
memory.save_context({"input": "Hi"}, {"output": "What's up?"})
memory.save_context({"input": "Nothing much, just hanging."}, {"output": "Cool."})
memory.save_context({"input": "What's 1+1?"}, {"output": "2"})
memory.buffer  # shows only last k tokens

In [None]:
memory.load_memory_variables(
    {}
)  # shows the summary along with the unsummarized last k tokens

In [None]:
conversation = ConversationChain(llm=chat, memory=memory, verbose=True)
conversation.predict(input="What is the capital of India?")

In [None]:
memory.load_memory_variables({})  # shows the summary

## Additional Memory Types
- **Vector data memory**: Stores texts in a vector database and retrieves the most similar/relevant text to the input text.
- **Entity memories**: Using an LLM, it remembers details about specific entities (like people, place, organization).

> Note: You can also use multiple memory types together. You can also store the conversation in a database (such as key-value store or SQL).