# LangChain: Memory

## Outline
* ConversationBufferMemory
* ConversationBufferWindowMemory
* ConversationTokenBufferMemory
* ConversationSummaryMemory

## ConversationBufferMemory

In [63]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video.

In [64]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

1. ChatOpenAI - LangChain'S wrapper around OpenAI's chat models
2. ConversationChain - A chain in LangChain = a sequence of steps (prompt → model → output).
    - ConversationChain is a built-in chain for chat-like interactions.
    - It automatically structures the prompt so the LLM behaves like a conversational partner
3. ConversationBufferMemory - This is a memory module for storing past conversation turns.
    - By default, LLMs don’t remember what was said before — each call is independent
    - ConversationBufferMemory specifically just stores the raw conversation history in memory (like a running transcript) ↓

In [65]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory


1. create the LLM
2. set up memory
3. create conversation chain ↓

In [66]:
    #create the LLM
llm = ChatOpenAI(temperature=0.0, model=llm_model)
    #set up memory
    # memory object keeps track of conversation history
    # stores dialogue like a running transcript (past inputs+outputs)
    # chatbot's short term memory
memory = ConversationBufferMemory()
    # create conversation chain
    # manages interactions between you and the LLM
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

This is how you interact with the chatbot, one message at a time.
- Each call to .predict(...) both:
    - Returns the model’s answer.
    - Updates the memory so the bot remembers what was just said ↓


In [67]:
conversation.predict(input="Hi, my name is Andrew")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Andrew
AI:[0m

[1m> Finished chain.[0m


"Hello Andrew! It's nice to meet you. How can I assist you today?"

In [68]:
conversation.predict(input="What is 1+1?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Andrew
AI: Hello Andrew! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI:[0m

[1m> Finished chain.[0m


'1+1 equals 2. Is there anything else you would like to know?'

In [69]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Andrew
AI: Hello Andrew! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 equals 2. Is there anything else you would like to know?
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


'Your name is Andrew. Is there anything else you would like to know or discuss?'

- memory - stores the entire conversation history in a simple text format
- .buffer . returns the conversation transcript as one long string
- print - displays the transcript in the notebook/console
    - useful for debugging or checking what the AI currently remembers ↓

In [70]:
print(memory.buffer)

Human: Hi, my name is Andrew
AI: Hello Andrew! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 equals 2. Is there anything else you would like to know?
Human: What is my name?
AI: Your name is Andrew. Is there anything else you would like to know or discuss?


- .load_memory_variables({}) - This method returns the memory contents in dictionary form, instead of as a plain string like .buffer.
- The {} you pass in is just a placeholder for inputs — the method expects a dictionary argument (often context from a chain), but here you’re passing an empty one because you don’t need to provide anything.
- So instead of just a string (memory.buffer), you get a dict with a key (history) and the conversation transcript as its value ↓

In [71]:
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Andrew\nAI: Hello Andrew! It's nice to meet you. How can I assist you today?\nHuman: What is 1+1?\nAI: 1+1 equals 2. Is there anything else you would like to know?\nHuman: What is my name?\nAI: Your name is Andrew. Is there anything else you would like to know or discuss?"}

- setting up the memory system for your chatbot: ↓

In [72]:
memory = ConversationBufferMemory()

- how you can manually add conversation turns into the memory
- pre-filling the memory
- Normally, you don’t call save_context(...) directly.
    - It happens automatically when you run a chain like conversation.predict(input="Hi").
- But you can use it manually if you want to inject custom turns into memory (e.g., preloading a chatbot with a fake backstory or pre-existing chat) ↓

In [73]:
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

In [74]:
print(memory.buffer)

Human: Hi
AI: What's up


In [75]:
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up"}

In [76]:
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

In [77]:
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

## ConversationBufferWindowMemory

- Efficiency: Keeps prompts short → faster and cheaper.
- Control: Prevents the model from being influenced by very old messages.
- Useful for Q&A: If you only care about the latest question and answer.

In [78]:
from langchain.memory import ConversationBufferWindowMemory

Instead of saving the entire conversation history, it only remembers the last k exchanges (messages).
- This keeps the context short, which is useful if:
    - Your conversation gets long.
    - You want to avoid sending too much text back to the LLM
- k=1 - the window size (only remember the most recent 1 interaction) ↓

In [79]:
memory = ConversationBufferWindowMemory(k=1)               

In [80]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})


- retrieve the memory in dictionary form
- returns a dictionary with the key "history"
    - the value is the whole conversation transcript as a single string
    - in this case it returns the last save_context which is "not much, just hanging / cool" ↓

In [81]:
memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

(my example) if i wanted it to remember last 2 lines then I would use k=2

In [84]:
memory = ConversationBufferWindowMemory(k=2)
memory.save_context({"input": "Hey"},
                    {"output": "whazzzup"})
memory.save_context({"input": "all good man"},
                    {"output": "nice"})

memory.load_memory_variables({})

{'history': 'Human: Hey\nAI: whazzzup\nHuman: all good man\nAI: nice'}

In [85]:
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

In [86]:
conversation.predict(input="Hi, my name is Andrew")

"Hello Andrew! It's nice to meet you. How can I assist you today?"

In [87]:
conversation.predict(input="What is 1+1?")

'1+1 equals 2. Is there anything else you would like to know?'

In [88]:
conversation.predict(input="What is my name?")

"I'm sorry, I do not have access to personal information such as your name. Is there anything else you would like to ask?"

## ConversationTokenBufferMemory

- type of memory that manages conversation history using tokens instead of turns
- It’s similar to ConversationBufferMemory (stores the transcript), but with an important twist:
    - Instead of keeping ALL past messages or JUST THE LAST K exchanges (ConversationBufferWindowMemory)
    - It keeps as much history as possible up to a maximum number of tokens.

- LLMs don’t see text as words; they see tokens (chunks of words).
    - Each model has a token limit (e.g., 4096, 8192, 128k tokens).
    - If your conversation gets too long, you can’t fit it all into the model’s context window.
- ConversationTokenBufferMemory solves this by:
    - Tracking how many tokens your conversation uses.
    - Automatically trimming old messages if the token count would exceed your set limit

In [53]:
#!pip install tiktoken

In [89]:
from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0, model=llm_model)

In [90]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [91]:
memory.load_memory_variables({})

{'history': 'AI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

## ConversationSummaryMemory

- another memory type in LangChain which handles long conversations more intelligently
- It combines two ideas:
    - Conversation Buffer → Stores the most recent turns of dialogue.
    - Summarization → When the history gets too long, it asks the LLM to summarize older parts into a shorter version.
- Instead of simply trimming old messages (like ConversationTokenBufferMemory), it compresses them into a summary.

- Lets you keep context over long conversations without overflowing.
- More efficient than buffer-only memory.
- Useful for chatbots, customer service, or tutoring systems where you want the AI to remember the big picture but not every single sentence

In [92]:
from langchain.memory import ConversationSummaryBufferMemory

In [93]:
# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [94]:
memory.load_memory_variables({})

{'history': "System: The human and AI exchange greetings and discuss the schedule for the day. The AI informs the human of a morning meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of being prepared for the day's events."}

In [95]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [96]:
conversation.predict(input="What would be a good demo to show?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human and AI exchange greetings and discuss the schedule for the day. The AI informs the human of a morning meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of being prepared for the day's events.
Human: What would be a good demo to show?
AI:[0m

[1m> Finished chain.[0m


"For the morning meeting with the product team, a demo showcasing the latest features and updates on the LangChain project would be ideal. For the lunch meeting with the customer interested in AI, a demo highlighting the capabilities of our AI technology and how it can benefit their business would be impressive. It's important to tailor the demos to each audience's specific interests and needs."

In [97]:
memory.load_memory_variables({})

{'history': "System: The human and AI exchange greetings and discuss the schedule for the day. The AI informs the human of a morning meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of being prepared for the day's events.\nHuman: What would be a good demo to show?\nAI: For the morning meeting with the product team, a demo showcasing the latest features and updates on the LangChain project would be ideal. For the lunch meeting with the customer interested in AI, a demo highlighting the capabilities of our AI technology and how it can benefit their business would be impressive. It's important to tailor the demos to each audience's specific interests and needs."}

↑↑↑

Problem: LLMs have a context length limit (they can’t remember infinite history).
- Basic solutions:
    - ConversationBufferMemory → remembers everything (can overflow).
    - ConversationBufferWindowMemory → remembers only the last k turns.
    - ConversationTokenBufferMemory → remembers as many tokens as fit, drops the oldest.
- Better solution: ConversationSummaryBufferMemory → keeps recent turns verbatim and keeps a running summary of the older ones.

Whiteboard analogy:
- With ConversationBufferMemory → you never erase anything (eventually the board overflows).
- With ConversationBufferWindowMemory(k=2) → you only keep the last 2 lines.
- With ConversationTokenBufferMemory → you keep writing until the board is almost full, then erase the oldest notes to make space.
- With ConversationSummaryBufferMemory → when the board gets full, instead of erasing, you replace old notes with a short summary so the main ideas are still there.