# <font color="#418FDE" size="6.5" uppercase>**Conversational Memory**</font>

>Last update: 20260118.
    
By the end of this Lecture, you will be able to:
- Explain how LangChain memory mechanisms maintain conversational context for Llama 3. 
- Configure and use different memory types to build a stateful Llama 3 chatbot. 
- Tune memory behavior to balance context completeness, token limits, and privacy concerns. 


## **1. Conversational Memory Basics**

### **1.1. Storing Conversation History**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master LangChain & Llama 3/Module_03/Lecture_B/image_01_01.jpg?v=1768767607" width="250">



>* Memory stores past messages for ongoing context
>* Without history, multi-step conversations break down

>* LangChain stores role-tagged message pairs chronologically
>* Structured history enables recall, summarization, and reuse

>* Different apps need different conversation memory strategies
>* Organized history creates rich, persistent context for Llama



In [None]:
#@title Python Code - Storing Conversation History

# Demonstrate simple conversation history storage conceptually with Python lists.
# Show how each conversational turn gets appended and preserved over time.
# Print final history to illustrate persistent conversational context clearly.

# pip install langchain llama-cpp-python transformers accelerate bitsandbytes.

# Define a function that simulates storing conversation history.
def simulate_conversation_history():
    # Initialize an empty list representing stored conversation history.
    history = []
    
    # Append first user message and assistant reply as a structured pair.
    history.append({"role": "user", "content": "Hi, I need thermostat help."})
    history.append({"role": "assistant", "content": "Sure, what temperature currently?"})
    
    # Append second user message and assistant reply, continuing the conversation.
    history.append({"role": "user", "content": "House feels cold, around sixty degrees."})
    history.append({"role": "assistant", "content": "Try setting seventy two degrees."})
    
    # Append third user message referencing earlier context implicitly.
    history.append({"role": "user", "content": "Same issue again today, still freezing."})
    history.append({"role": "assistant", "content": "Let me review previous thermostat details."})
    
    # Return the complete stored history list for later inspection.
    return history

# Call the simulation function and capture the stored conversation history.
conversation_history = simulate_conversation_history()

# Print a concise view of the stored conversation history for understanding.
for message in conversation_history:
    print(f"Role: {message['role']}, Content: {message['content']}")



### **1.2. Prompt Context Injection**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master LangChain & Llama 3/Module_03/Lecture_B/image_01_02.jpg?v=1768767632" width="250">



>* Memory injects past messages into new prompts
>* This lets Llama 3 understand earlier references

>* Memory uses templates to structure past messages
>* Organized context helps Llama 3 continue conversations

>* Memory selects and condenses only relevant history
>* Gives Llama 3 focused context for coherent replies



In [None]:
#@title Python Code - Prompt Context Injection

# Demonstrate how stored conversation history becomes injected prompt context.
# Show simple memory storing user and assistant conversational turns.
# Build final prompt sent into a pretend Llama three model.

# pip install langchain langchain-community llama-cpp-python.

# Define a simple conversation history list with dictionaries.
conversation_history = [
    {"role": "user", "content": "Hi, I need exam help."},
    {"role": "assistant", "content": "Sure, which subject exactly?"},
    {"role": "user", "content": "Physics and some basic algebra."}
]

# Define a new user message that references earlier conversation.
new_user_message = "Can you quiz me on the topics we listed earlier?"

# Define a simple system instruction describing assistant behavior.
system_instruction = "You are a helpful exam study assistant focusing on clear short questions."

# Define a function that injects context into a single prompt string.
def build_injected_prompt(system_text, history, new_message):
    prompt_lines = []
    prompt_lines.append(f"System: {system_text}")
    for turn in history:
        role = turn["role"].capitalize()
        content = turn["content"]
        prompt_lines.append(f"{role}: {content}")
    prompt_lines.append(f"User: {new_message}")
    full_prompt = "\n".join(prompt_lines)
    return full_prompt

# Build the final prompt that would be sent into Llama three.
final_prompt = build_injected_prompt(system_instruction, conversation_history, new_user_message)

# Print the final injected prompt showing context clearly.
print("Injected prompt sent into model:\n")
print(final_prompt)



### **1.3. Context Limits in Chat**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master LangChain & Llama 3/Module_03/Lecture_B/image_01_03.jpg?v=1768767659" width="250">



>* Chatbots see only a limited context window
>* Stored history exceeds what Llama 3 actually reads

>* Context limits force LangChain to trim history
>* Poor selection can drop key details, hurting continuity

>* LangChain compresses chat history into key summaries
>* Designers choose what to keep, summarize, discard



In [None]:
#@title Python Code - Context Limits in Chat

# Demonstrate chat context limits using a tiny sliding window simulation.
# Show how only recent conversation fits inside a fixed context window.
# Illustrate why older details disappear when the window becomes full.

# !pip install some_required_library_here if external libraries were needed.

# Define a simple conversation history as user and assistant message pairs.
conversation_history = [
    ("User", "Hi, I need travel help to New York."),
    ("Assistant", "Sure, what dates are you considering for your trip."),
    ("User", "I prefer late June, maybe around the twenty fifth."),
    ("Assistant", "Great, do you prefer morning or evening flights."),
    ("User", "Evening flights only, I hate early mornings."),
    ("Assistant", "Noted, evening flights only for your New York trip."),
    ("User", "My budget is eight hundred dollars maximum for everything."),
    ("Assistant", "Okay, I will search options under eight hundred dollars total."),
]

# Define a tiny context window measured in number of recent messages kept.
context_window_size = 4

# Function that selects only the most recent messages within the window.
def get_visible_context(history, window_size):
    visible_slice = history[-window_size:]
    return visible_slice

# Show full stored history length versus visible context length.
print("Total stored messages in history:", len(conversation_history))
print("Messages visible to model with window:", context_window_size)

# Get the visible context that would be sent inside the prompt.
visible_context = get_visible_context(conversation_history, context_window_size)

# Print the visible context to show which messages remain accessible.
print("\nVisible context before new user question:")
for speaker, message in visible_context:
    print(f"{speaker}: {message}")

# Simulate a new user question that depends on an older forgotten detail.
new_user_message = ("User", "Can you remind me my maximum budget again.")
conversation_history.append(new_user_message)

# Recalculate visible context after adding the new question message.
visible_context_after = get_visible_context(conversation_history, context_window_size)

# Print the new visible context and highlight missing earlier budget detail.
print("\nVisible context after new question:")
for speaker, message in visible_context_after:
    print(f"{speaker}: {message}")



## **2. Conversational Memory Types**

### **2.1. Buffer Versus Summary Memory**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master LangChain & Llama 3/Module_03/Lecture_B/image_02_01.jpg?v=1768767686" width="250">



>* Buffer memory stores the full conversation transcript
>* Summary memory keeps a compressed narrative of history

>* Buffer memory preserves exact wording and tone
>* Long buffers increase token use and latency

>* Summary memory condenses long chats into evolving notes
>* Combining buffer and summary balances detail and scalability



In [None]:
#@title Python Code - Buffer Versus Summary Memory

# Demonstrate buffer memory versus summary memory with simple conversation examples.
# Show how full transcripts differ from compact evolving summaries for chatbots.
# Help beginners visualize memory tradeoffs without external libraries or real models.

# pip install langchain llama-index transformers accelerate bitsandbytes.

# Define a simple conversation as ordered user and assistant message pairs.
conversation = [
    ("User", "Hi, I need help planning a road trip."),
    ("Assistant", "Great, where are you starting from today."),
    ("User", "I live in Denver and I drive a small SUV."),
    ("Assistant", "Noted, Denver start with a small SUV vehicle detail."),
    ("User", "I want to visit national parks within five hundred miles."),
    ("Assistant", "Okay, we will focus on nearby national parks within that driving range."),
]

# Implement buffer memory that keeps the full detailed transcript unchanged.
def build_buffer_memory(conversation_pairs):
    buffer_lines = []
    for speaker, text in conversation_pairs:
        buffer_lines.append(f"{speaker}: {text}")
    return "\n".join(buffer_lines)

# Implement summary memory that stores only key facts and preferences.
def build_summary_memory(conversation_pairs):
    summary = "User plans a Denver road trip using a small SUV vehicle."  \
              " User prefers national parks within roughly five hundred miles driving distance."
    return summary

# Build both memory representations from the same conversation history.
buffer_memory = build_buffer_memory(conversation)
summary_memory = build_summary_memory(conversation)

# Print a clear header and the buffer memory transcript content.
print("BUFFER MEMORY TRANSCRIPT:\n")
print(buffer_memory)

# Print a separator line to visually distinguish the two memory styles.
print("\n" + "-" * 40 + "\n")

# Print a clear header and the summary memory compact description.
print("SUMMARY MEMORY DESCRIPTION:\n")
print(summary_memory)



### **2.2. Key Value Memory Patterns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master LangChain & Llama 3/Module_03/Lecture_B/image_02_02.jpg?v=1768767711" width="250">



>* Store facts as labeled keys for retrieval
>* Use selected values in prompts for stateful chatbots

>* Profile-style keys store and update user preferences
>* Structured memory enables personalized, debuggable chatbot responses

>* Choose scoped, lasting keys for user state
>* Balance helpful memory with privacy and auditability



In [None]:
#@title Python Code - Key Value Memory Patterns

# Demonstrate simple key value memory for chatbot preferences.
# Show how user profile updates across multiple conversation turns.
# Print final profile and personalized response using stored values.
# pip install langchain llama-cpp-python transformers accelerate.

# Create an empty dictionary representing user memory profile.
user_memory_profile = {"user_name": None, "home_airport": None, "seat_preference": None}

# Define a function that updates memory using extracted key value pairs.
def update_memory_with_facts(memory_dict, new_facts_dict):
    for fact_key, fact_value in new_facts_dict.items():
        memory_dict[fact_key] = fact_value

# Simulate first conversation turn where user shares basic information.
turn_one_facts = {"user_name": "Alex", "home_airport": "JFK", "seat_preference": "aisle"}

# Update memory profile using first turn extracted facts.
update_memory_with_facts(user_memory_profile, turn_one_facts)

# Simulate second conversation turn where user changes seat preference.
turn_two_facts = {"seat_preference": "window"}

# Update memory profile again, overwriting outdated preference value.
update_memory_with_facts(user_memory_profile, turn_two_facts)

# Build a prompt snippet using only relevant memory keys.
prompt_context = f"Traveler {user_memory_profile['user_name']} flies from {user_memory_profile['home_airport']} preferring {user_memory_profile['seat_preference']} seats."

# Print the structured memory profile dictionary for inspection.
print("Current user memory profile:", user_memory_profile)

# Print the prompt context that would be sent into Llama model.
print("Prompt context for Llama three:", prompt_context)



### **2.3. Selecting Memory Approaches**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master LangChain & Llama 3/Module_03/Lecture_B/image_02_03.jpg?v=1768767736" width="250">



>* Match memory design to chatbot interaction needs
>* Separate short-term context from longer-term user details

>* Match memory type to conversation length, structure
>* Use buffer, summary, or key-value based risks

>* Combine buffer, summary, and key-value memories thoughtfully
>* Tune memory settings using constraints, testing, and privacy



In [None]:
#@title Python Code - Selecting Memory Approaches

# Demonstrate choosing memory approaches for different chatbot interaction needs.
# Compare buffer, summary, and key value memory behaviors in one script.
# Show how mixing memory types changes chatbot responses and stored information.
# pip install langchain langchain-community langchain-openai.

# Import required standard libraries for simple chatbot simulation.
import textwrap

# Define a simple buffer memory storing recent conversational turns exactly.
class BufferMemory:
    def __init__(self, max_turns):
        self.max_turns = max_turns
        self.turns = []

    # Add a new turn and keep only recent turns within limit.
    def add_turn(self, user, bot):
        self.turns.append((user, bot))
        self.turns = self.turns[-self.max_turns :]

    # Build a context string showing recent conversation history.
    def get_context(self):
        lines = [f"User: {u} | Bot: {b}" for u, b in self.turns]
        return " | ".join(lines)

# Define a simple summary memory storing compressed conversation description.
class SummaryMemory:
    def __init__(self):
        self.summary = ""

    # Update summary using a naive rule based concatenation approach.
    def update(self, user, bot):
        snippet = f"User asked about {user[:20]} and bot replied."
        if not self.summary:
            self.summary = snippet
        else:
            self.summary += " Then " + snippet

    # Return current summary describing earlier conversation parts.
    def get_summary(self):
        return self.summary

# Define a key value memory storing stable user attributes explicitly.
class KeyValueMemory:
    def __init__(self):
        self.store = {}

    # Set a stable attribute like name or preference value.
    def set(self, key, value):
        self.store[key] = value

    # Retrieve a stored attribute with default fallback value.
    def get(self, key, default=None):
        return self.store.get(key, default)

# Define a simple rule based chatbot using different memory types.
class SimpleChatbot:
    def __init__(self, buffer_memory, summary_memory, kv_memory):
        self.buffer = buffer_memory
        self.summary = summary_memory
        self.kv = kv_memory

    # Generate a response using memory choices and simple rules.
    def respond(self, user_message):
        lower = user_message.lower()
        if "my name is" in lower:
            name = user_message.split("is")[-1].strip()
            self.kv.set("name", name)
            bot = f"Nice meeting you {name}, I will remember your name."
        elif "favorite job" in lower:
            name = self.kv.get("name", "there")
            bot = f"{name}, your favorite job seems related to helping people grow."
        elif "recap" in lower:
            bot = f"Conversation summary so far: {self.summary.get_summary()}"
        else:
            bot = "I am a simple coach, tell me goals or ask career questions."
        self.buffer.add_turn(user_message, bot)
        self.summary.update(user_message, bot)
        return bot

# Helper function printing a titled block with wrapped text lines.
def print_block(title, text):
    print(f"\n{title}:")
    for line in textwrap.wrap(text, width=70):
        print(line)

# Create memory objects representing different time horizons and structures.
buffer_memory = BufferMemory(max_turns=2)
summary_memory = SummaryMemory()
kv_memory = KeyValueMemory()

# Create chatbot instance combining buffer, summary, and key value memories.
chatbot = SimpleChatbot(buffer_memory, summary_memory, kv_memory)

# Simulate several user messages representing a longer coaching interaction.
messages = [
    "My name is Alex.",
    "I want a new job helping people learn skills.",
    "What would be my favorite job path maybe.",
    "Can you recap our conversation so far briefly please.",
]

# Run conversation, capturing last response and printing key memory views.
last_response = ""
for msg in messages:
    last_response = chatbot.respond(msg)

# Show how buffer memory keeps only recent conversational turns exactly.
print_block("Buffer memory recent turns", buffer_memory.get_context())

# Show how summary memory keeps compressed longer horizon conversational information.
print_block("Summary memory narrative", summary_memory.get_summary())

# Show how key value memory keeps stable user attributes like remembered name.
print_block("Key value memory snapshot", str(kv_memory.store))



## **3. Optimizing Chat Memory**

### **3.1. Context Window Tradeoffs**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master LangChain & Llama 3/Module_03/Lecture_B/image_03_01.jpg?v=1768767770" width="250">



>* Choosing context window size shapes conversation quality
>* More history improves coherence but increases token costs

>* Short chats can include nearly all history
>* Long chats need selective history and summaries

>* Prioritize the most useful past details first
>* Balance context depth with tokens, privacy, compliance



### **3.2. Conversation Summarization Strategies**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master LangChain & Llama 3/Module_03/Lecture_B/image_03_02.jpg?v=1768767788" width="250">



>* Summaries replace full transcripts to save context
>* Keep key goals and decisions within token limits

>* Use incremental or hierarchical summaries to track context
>* Give summaries clear focus on key user details

>* Continuously update summaries to reflect latest information
>* Separate stable facts, session details, and discard small talk



In [None]:
#@title Python Code - Conversation Summarization Strategies

# Demonstrate simple conversation summarization strategies with incremental updates.
# Show how summaries stay compact while preserving important conversational information.
# Compare full conversation history versus evolving concise summaries for a chatbot.
# pip install langchain llama-index transformers accelerate bitsandbytes.

# Define a sample conversation with several user and assistant turns.
conversation_turns = [
    ("user", "I want a road trip from Boston to New York this weekend."),
    ("assistant", "Great, do you prefer fast highways or scenic back roads."),
    ("user", "Scenic back roads please, and I hate driving at night."),
    ("assistant", "Got it, scenic daytime route, avoiding late night driving as requested."),
    ("user", "My budget is around three hundred dollars for gas and motels."),
    ("assistant", "Okay, I will keep costs under three hundred dollars including motels."),
    ("user", "Actually, I can spend more because this is our anniversary trip."),
    ("assistant", "Understood, anniversary trip, comfort prioritized over strict budget constraints now."),
]

# Define a function that creates a compact summary from conversation turns.
def summarize_conversation(turns, previous_summary=None):
    important_preferences = []
    goals = []
    constraints = []

    # Extract simple goals, preferences, and constraints from user messages.
    for role, message in turns:
        if role == "user":
            lower_message = message.lower()
            if "road trip" in lower_message:
                goals.append("Plan pleasant Boston to New York weekend road trip.")
            if "scenic" in lower_message:
                important_preferences.append("Prefers scenic back roads over fastest highways.")
            if "hate driving at night" in lower_message:
                constraints.append("Avoid night driving whenever reasonably possible.")
            if "budget" in lower_message and "three hundred" in lower_message:
                constraints.append("Initial budget around three hundred dollars for trip costs.")
            if "anniversary" in lower_message:
                important_preferences.append("Trip is anniversary celebration, comfort more important than savings.")

    # Remove outdated budget constraint when anniversary preference appears.
    if any("anniversary" in p.lower() for p in important_preferences):
        constraints = [c for c in constraints if "Initial budget" not in c]

    # Build a new concise summary string from extracted elements.
    summary_parts = []
    if goals:
        summary_parts.append("Goal: " + " ".join(sorted(set(goals))))
    if important_preferences:
        summary_parts.append("Preferences: " + " ".join(sorted(set(important_preferences))))
    if constraints:
        summary_parts.append("Constraints: " + " ".join(sorted(set(constraints))))

    new_summary = " | ".join(summary_parts) if summary_parts else "No important details captured yet."

    # Combine with previous summary to simulate incremental summarization behavior.
    if previous_summary and previous_summary != new_summary:
        combined = previous_summary + " THEN UPDATED TO " + new_summary
        return combined
    return new_summary

# Simulate incremental summarization after several conversation turns.
running_summary = None
for index in range(2, len(conversation_turns) + 1, 2):
    current_slice = conversation_turns[:index]
    running_summary = summarize_conversation(current_slice, running_summary)

# Show final full conversation length versus compact summary length.
full_history_text = " ".join(message for _, message in conversation_turns)
print("Full history characters:", len(full_history_text))
print("Summary characters:", len(running_summary))
print("\nFinal incremental summary:\n", running_summary)



### **3.3. Protecting Sensitive Chat Data**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master LangChain & Llama 3/Module_03/Lecture_B/image_03_03.jpg?v=1768767823" width="250">



>* Chat memory quietly stores sensitive personal information
>* Treat memory like regulated, long-term data storage

>* Store only essential context, discard unnecessary details
>* Use summaries and expirations to limit sensitive data

>* Use technical controls to secure stored memories
>* Set clear policies for consent, retention, deletion



In [None]:
#@title Python Code - Protecting Sensitive Chat Data

# Demonstrates simple redaction of sensitive chat fields before storing memory.
# Shows how to keep helpful context while masking private identifiers.
# Illustrates safer conversational memory design using basic Python structures.
# pip install commands are not required because this script uses only standard libraries.

# Define example chat messages containing potentially sensitive information.
messages = [
    "Hi, my name is Alice Johnson and my email is alice@example.com.",
    "My phone number is 555-123-4567 and I live near 10 Oak Street.",
    "Here is my project code name: Phoenix-42, please keep this confidential.",
]

# Define simple redaction rules for names, emails, phones, and project codenames.
redaction_rules = {
    "email": "[EMAIL REDACTED]",
    "phone": "[PHONE REDACTED]",
    "name": "[NAME REDACTED]",
    "codename": "[CODENAME REDACTED]",
}

# Define helper function that masks obvious email patterns using string operations.
def redact_email(text):
    if "@" in text and "." in text:
        return redaction_rules["email"]
    return text

# Define helper function that masks simple phone patterns using digit counting logic.
def redact_phone(text):
    digits = sum(ch.isdigit() for ch in text)
    if digits >= 7:
        return redaction_rules["phone"]
    return text

# Define helper function that masks likely full names using space separated tokens.
def redact_name(text):
    parts = text.split()
    if len(parts) >= 2 and parts[0].istitle() and parts[1].istitle():
        return text.replace(" ".join(parts[:2]), redaction_rules["name"])
    return text

# Define helper function that masks project codenames using simple keyword detection.
def redact_codename(text):
    if "code name" in text.lower() or "codename" in text.lower():
        return redaction_rules["codename"]
    return text

# Define function that applies all redaction helpers to a single message string.
def redact_message(message):
    step_one = redact_name(message)
    step_two = redact_email(step_one)
    step_three = redact_phone(step_two)
    final_text = redact_codename(step_three)
    return final_text

# Build safe memory list by redacting each message before simulated storage.
safe_memory = []
for msg in messages:
    safe_memory.append(redact_message(msg))

# Print original messages and redacted versions to compare stored memory safety.
for original, redacted in zip(messages, safe_memory):
    print("ORIGINAL:", original)
    print("STORED  :", redacted)
    print("-")



# <font color="#418FDE" size="6.5" uppercase>**Conversational Memory**</font>


In this lecture, you learned to:
- Explain how LangChain memory mechanisms maintain conversational context for Llama 3. 
- Configure and use different memory types to build a stateful Llama 3 chatbot. 
- Tune memory behavior to balance context completeness, token limits, and privacy concerns. 

In the next Module (Module 4), we will go over 'Tools And Retrieval'