# Memory-enhanced conversational agent

This notebook explores building a sophisticated conversational agent that implements dual-layer memory architecture. We distinguish between short-term memory (immediate conversation context) and long-term memory (persistent information that transcends individual sessions). This approach enables the agent to not only maintain conversational flow but also build relationships with users over time, remembering key details that enhance future interactions.

While basic conversational agents can maintain context within a single conversation session, they face a significant limitation: they cannot retain important information beyond the immediate chat history.

The challenge lies in intelligently deciding what information deserves long-term storage and how to effectively integrate both memory layers into the conversation generation process. We will implement a practical solution that balances memory efficiency with conversational quality, creating an agent that becomes more helpful and personalized with each interaction. We go further by implementing short-term memory (to retain recent message history) and long-term memory (to remember important facts across multiple messages or sessions).

In [1]:
from langchain_openai import ChatOpenAI
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.memory import ChatMessageHistory

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Configure OpenAI API key for AI model access
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

### Initialize the language model
The language model serves as the core intelligence of our conversational agent. Here we will configure the AI model with specific parameters that balance response quality, cost, and performance.

In [2]:
# Initialize the language model
llm = ChatOpenAI(
    model="gpt-4o-mini-2024-07-18",
    max_tokens=1000,  # Limit response length for focused answers
    temperature=0  # Set to 0 for deterministic responses
)

This configuration creates our AI model instance. Setting `max_tokens=1000` ensures responses remain concise and focused, while `temperature=0` eliminates randomness in responses, making the agent's behavior predictable and consistent across similar inputs. This deterministic approach is particularly valuable for applications requiring reliable and reproducible interactions.

### Creating memory storage systems
The heart of our enhanced conversational agent lies in its dual-memory architecture. We need to create separate storage systems for short-term conversational context of the chat history and long-term persistent memories that survive across sessions.

In [3]:
# Memory stores for managing different types of conversational context
chat_store = {}  # Dictionary to store short-term conversation histories by session
long_term_memory = {}  # Dictionary to store persistent memories across sessions

# Function to retrieve or create a chat history for immediate conversation context - It ensures each user session has its own conversation context
def get_chat_history(session_id: str):
    if session_id not in chat_store:
        # Initialize new conversation history for first-time sessions
        chat_store[session_id] = ChatMessageHistory()
    return chat_store[session_id]

# Function to update long-term memory with significant interactions - It uses simple heuristics to determine what information is worth remembering
def update_long_term_memory(session_id: str, input: str, output: str):
    if session_id not in long_term_memory:
        long_term_memory[session_id] = []
    # Store user inputs that contain substantial information (longer than 20 characters)
    if len(input) > 20:
        long_term_memory[session_id].append(f"User said: {input}")
    # Maintain memory efficiency by keeping only the most recent 5 memories
    if len(long_term_memory[session_id]) > 5:
        long_term_memory[session_id] = long_term_memory[session_id][-5:]

# Fumction to retrieve and format long-term memories for inclusion in prompts. It returns a concatenated string of relevant past interactions.
def get_long_term_memory(session_id: str):
    # Join all long-term memories into a coherent context string
    return ". ".join(long_term_memory.get(session_id, []))

Here, we create a memory management system with clear separation of concerns. The `chat_store` maintains immediate conversational context using LangChain's `ChatMessageHistory`, ensuring compatibility with the framework's conversation management tools. The `long_term_memory` system implements custom logic for determining what information deserves persistent storage, using message length as a simple but effective heuristic for importance.

The memory management strategy prevents unbounded growth by limiting long-term storage to the five most recent important interactions. This approach balances memory utility with system performance, ensuring that the most relevant recent information remains accessible while preventing memory bloat that could degrade performance over time.


### Create prompt template for memory integration
The prompt template is where we orchestrate how different types of memory and current input come together to create context-aware responses. This structure determines how the language model interprets and uses both immediate and historical context. We will create a prompt template that includes both short-term and long-term memory.

In [4]:
# Create a prompt template that incorporates multiple memory layers
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant. Use the information from long-term memory if relevant."),
    ("system", "Long-term memory: {long_term_memory}"),
    MessagesPlaceholder(variable_name="history"),  # Placeholder for conversation history
    ("human", "{input}")  # Current user input
])

This prompt template creates a layered context structure that gives the language model access to different types of information. The system messages establish the AI's role and provide long-term memory context, while the `MessagesPlaceholder` dynamically inserts the current conversation history. This design allows the model to consider both immediate conversational context and persistent memories when generating responses, creating more coherent and personalized interactions.

The separation of long-term memory into its own system message ensures that persistent information receives appropriate weight in the AI's response generation process. This architectural choice prevents long-term memories from being overshadowed by more recent conversation history while maintaining clear boundaries between different types of contextual information.

### Building the conversational chain
Now we need to combine our prompt template with the language model and wrap it with history management capabilities. This creates the core processing pipeline that handles user inputs and generates contextually appropriate responses.

In [5]:
# Combine prompt template with language model to create the base conversation chain
chain = prompt | llm

# Wrap the chain with message history management for short-term memory
chain_with_history = RunnableWithMessageHistory(
    chain,  # The base conversation chain
    get_chat_history,  # Function for short-term memory retrieval
    input_messages_key="input",  # Key for user input in the prompt template
    history_messages_key="history"  # Key for conversation history in the prompt template
)

Here, we create our complete conversational pipeline by chaining the prompt template with the language model using LangChain's pipe operator. The `RunnableWithMessageHistory` wrapper then adds automatic conversation history management, ensuring that each interaction builds upon previous messages in the same session. The key parameters tell the system where to find the current input and where to inject the conversation history within our prompt template.

### Creating the main chat interface
The chat function serves as the primary interface for user interactions, orchestrating the flow of information between memory systems, the language model, and response generation while ensuring both types of memory are properly updated. We will create a function to handle chat interactions, including updating long-term memory.

In [6]:
#Handle a conversation interaction with dual-memory integration.
def chat(input_text: str, session_id: str):
    # Retrieve relevant long-term memories for this session
    long_term_mem = get_long_term_memory(session_id)

    # Process the input through our conversational chain with both memory types
    response = chain_with_history.invoke(
        {"input": input_text, "long_term_memory": long_term_mem},
        config={"configurable": {"session_id": session_id}}
    )

    # Update long-term memory based on this interaction
    update_long_term_memory(session_id, input_text, response.content)

    # Return the AI's response
    return response.content

This function orchestrates the entire conversation flow by first retrieving any relevant long-term memories, then invoking our conversational chain with both the current input and memory context. The session configuration ensures that short-term memory (conversation history) is properly managed, while the explicit call to update long-term memory ensures that significant interactions are preserved for future conversations.

### Testing the memory-enhanced agent
Let's demonstrate how our agent maintains context within a conversation with a series of interactions and remembers information for future interactions. This example shows both short-term memory (immediate context) and long-term memory (persistent information) in action.

In [7]:
# Example conversation demonstrating memory capabilities
session_id = "user_123"

print("AI:", chat("Hello! My name is Alice.", session_id))
print("AI:", chat("What's the weather like today?", session_id))
print("AI:", chat("I love sunny days.", session_id))
print("AI:", chat("Do you remember my name?", session_id))

AI: Hello, Alice! How can I assist you today?
AI: I don't have real-time weather data, but you can check a weather website or app for the most accurate and up-to-date information. If you tell me your location, I can suggest how to find the weather for your area!
AI: Sunny days are wonderful! They can really lift your mood and are perfect for outdoor activities. Do you have any favorite things you like to do on sunny days?
AI: Yes, your name is Alice! How can I help you today?


This test sequence demonstrates the agent's memory capabilities in action. The first interaction introduces the user's name, which should be stored in long-term memory since it's longer than our 20-character threshold. The subsequent interactions build conversational context while the final question tests whether the agent can recall information from earlier in the conversation. When executed, we should see the AI successfully remember "Alice" and potentially reference the user's preference for sunny days, showcasing both the immediate conversational flow and the persistent memory functionality.

### Review Memory
Let's review the conversation history and long-term memory.

In [8]:
# Examine the complete conversation history (short-term memory)
print("Conversation History:")
for message in chat_store[session_id].messages:
    print(f"{message.type}: {message.content}")

# Review what information was deemed important for long-term storage
print("\nLong-term Memory:")
print(get_long_term_memory(session_id))

Conversation History:
human: Hello! My name is Alice.
ai: Hello, Alice! How can I assist you today?
human: What's the weather like today?
ai: I don't have real-time weather data, but you can check a weather website or app for the most accurate and up-to-date information. If you tell me your location, I can suggest how to find the weather for your area!
human: I love sunny days.
ai: Sunny days are wonderful! They can really lift your mood and are perfect for outdoor activities. Do you have any favorite things you like to do on sunny days?
human: Do you remember my name?
ai: Yes, your name is Alice! How can I help you today?

Long-term Memory:
User said: Hello! My name is Alice.. User said: What's the weather like today?. User said: Do you remember my name?


The conversation history shows the complete short-term memory, including both user inputs and AI responses in chronological order. The long-term memory output reveals which pieces of information our system has identified as significant enough for persistent storage, helping us evaluate the effectiveness of our memory selection criteria.