
LLMs Don’t Have Memory — So How Do They Remember?

In [None]:

from langchain_openai import ChatOpenAI 
from dotenv import load_dotenv

ModuleNotFoundError: No module named 'langchain_openai'

In [None]:

load_dotenv()  # Load environment variables from .env file 

In [None]:
# Initialize the LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

## Conclusion

LLMs are stateless by nature, but we can add memory through:
1. **Passing conversation history** in each API call
2. **Memory management systems** that handle history automatically
3. **Different strategies** based on use case (buffer, window, summary)

The key is choosing the right memory type based on your needs for cost, context length, and detail retention.

## 7. Comparing Memory Types

| Memory Type | Pros | Cons | Use Case |
|-------------|------|------|----------|
| **Buffer** | Simple, complete history | Expensive for long conversations | Short conversations |
| **Buffer Window** | Controls cost, recent context | Loses older information | Fixed context needs |
| **Summary** | Efficient for long conversations | May lose details | Long conversations |
| **Summary Buffer** | Best of both worlds | More complex | Production applications |

In [None]:
# View message history for session
print("\nMessage History:")
for message in store["user_123"].messages:
    print(f"{message.type}: {message.content}")

In [None]:
# Use the chain with session
config = {"configurable": {"session_id": "user_123"}}

response1 = chain_with_history.invoke(
    {"input": "Hi, I'm Emma and I love Python programming"},
    config=config
)
print("Response 1:", response1.content)

response2 = chain_with_history.invoke(
    {"input": "What's my name and what do I love?"},
    config=config
)
print("\nResponse 2:", response2.content)

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory

# Store for session histories
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# Create prompt with message placeholder
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

# Create chain
chain = prompt | llm

# Wrap with message history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

## 6. Using Message History with LCEL (LangChain Expression Language)
Modern approach using RunnableWithMessageHistory.

In [None]:
from langchain.memory import ConversationSummaryBufferMemory

# Summary buffer with token limit
summary_buffer_memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
conversation_sb = ConversationChain(llm=llm, memory=summary_buffer_memory, verbose=True)

print(conversation_sb.predict(input="Hello! I'm learning about AI"))
print("\n" + "="*50 + "\n")
print(conversation_sb.predict(input="Specifically, I'm interested in LLM memory mechanisms"))
print("\n" + "="*50 + "\n")
print(conversation_sb.predict(input="Can you summarize what we've discussed?"))

## 5. Conversation Summary Buffer Memory
Combines buffer and summary - keeps recent messages and summarizes older ones.

In [None]:
# View the summary
print("Summary:")
print(summary_memory.load_memory_variables({}))

In [None]:
from langchain.memory import ConversationSummaryMemory

# Create summary memory
summary_memory = ConversationSummaryMemory(llm=llm)
conversation_summary = ConversationChain(llm=llm, memory=summary_memory, verbose=True)

print(conversation_summary.predict(input="Hi, I'm Charlie. I work as a data scientist"))
print("\n" + "="*50 + "\n")
print(conversation_summary.predict(input="I'm currently working on a project about LLM memory"))
print("\n" + "="*50 + "\n")
print(conversation_summary.predict(input="What do you know about me?"))

## 4. Conversation Summary Memory
Progressively summarizes the conversation to save tokens while retaining context.

In [None]:
from langchain.memory import ConversationBufferWindowMemory

# Keep only last 2 interactions (k=2)
window_memory = ConversationBufferWindowMemory(k=2)
conversation_window = ConversationChain(llm=llm, memory=window_memory, verbose=True)

print(conversation_window.predict(input="Hi, my name is Bob"))
print("\n" + "="*50 + "\n")
print(conversation_window.predict(input="I love pizza"))
print("\n" + "="*50 + "\n")
print(conversation_window.predict(input="I also like ice cream"))
print("\n" + "="*50 + "\n")
print(conversation_window.predict(input="What's my name?"))  # Should remember
print("\n" + "="*50 + "\n")
print(conversation_window.predict(input="What food did I mention first?"))  # Might forget

## 3. Conversation Buffer Window Memory
Only keeps the last K interactions to limit context size and cost.

In [None]:
# View the conversation history
print("Chat History:")
print(memory.load_memory_variables({}))

In [None]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

# Create memory and chain
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory=memory, verbose=True)

# Now the model remembers context
print(conversation.predict(input="Hi, my name is Alice"))
print("\n" + "="*50 + "\n")
print(conversation.predict(input="What's my name?"))
print("\n" + "="*50 + "\n")
print(conversation.predict(input="What did I just tell you?"))

## 2. Conversation Buffer Memory
Stores the entire conversation history. Simple but can get expensive with long conversations.

In [None]:
# Without memory - each call is independent
response1 = llm.invoke("Hi, my name is Alice")
print("Response 1:", response1.content)

response2 = llm.invoke("What's my name?")
print("Response 2:", response2.content)

## 1. Basic Chat Without Memory
Without memory, the LLM doesn't remember previous messages in the conversation.