<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_04_2_memory_buffer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative Artificial Intelligence
**Module 4: LangChain: Chat and Memory**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 4 Material

* Part 4.1: LangChain Conversations [[Video]]() [[Notebook]](t81_559_class_04_1_langchain_chat.ipynb)
* **Part 4.2: Conversation Buffer Window Memory** [[Video]]() [[Notebook]](t81_559_class_04_2_memory_buffer.ipynb)
* Part 4.3: Chat with Summary and Fixed Window [[Video]]() [[Notebook]](t81_559_class_04_3_summary.ipynb)
* Part 4.4: Chat with Persistence, Rollback and Regeneration [[Video]]() [[Notebook]](t81_559_class_04_4_persistence.ipynb)
* Part 4.5: Automated Coder Application [[Video]]() [[Notebook]](t81_559_class_04_5_coder.ipynb)

# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [None]:
import os

try:
    from google.colab import drive, userdata
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai

Note: using Google CoLab


# 4.2: Langchain Conversation Memory

We previously saw that we could build up an LLM chat client by building an ever-increasing script of what the human and AI said in the conversation. We constantly add the human response and wait to see what the AI will respond to next. This cycle continues as long as the chat.

This ever-increasing chat memory is a typical pattern for LLMs, and as a result, LangChain has a predefined Python class that allow you to implement this sort of memory-based chatbot.

## Creating a Chat Conversation

This code defines a ```SimpleConversation``` class that provides a lightweight wrapper around LangChain’s chat functionality. At its core, the class manages a conversation with a language model, keeping track of the dialogue history and providing convenient methods to interact with it. Users can start a chat session, display interactions in a formatted way, print or export the history, and clear the conversation state. The history is stored in memory per session, ensuring each conversation instance can maintain its own context while remaining simple to reset or reuse.

In its current version, this class serves as a clean foundation for handling conversational state with an LLM. It allows invoking the model with prompts, persisting context across turns, and outputting both raw and formatted results. In this module, we will extend this foundation further by adding advanced features such as saving and restoring conversation state, rolling back to earlier points in the dialogue, and regenerating responses. These enhancements will make the class more powerful for experimenting with iterative conversation flows and managing branching interaction histories.

In [None]:
from typing import Optional, List, Dict, Any
from uuid import uuid4

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory
from IPython.display import display_markdown


class SimpleConversation:
    """
    Self-contained conversation wrapper:
      - Create with conv = SimpleConversation()
      - conv.chat("...") to run and display
      - conv.print_history() to dump chat buffer
      - conv.to_dicts() to get [{'role': ..., 'content': ...}, ...]
      - conv.clear_history() to reset
    """

    def __init__(
        self,
        model: str = "gpt-5-mini",
        system_prompt: str = "You are a helpful assistant.",
        temperature: float = 0.3,
        session_id: Optional[str] = None,
    ):
        # Prompt with explicit history placeholder
        prompt = ChatPromptTemplate.from_messages([
            ("system", system_prompt.strip()),
            MessagesPlaceholder(variable_name="history"),
            ("human", "{input}")
        ])

        llm = ChatOpenAI(model=model, temperature=temperature)

        # Per-instance in-memory history store
        self._history_store: Dict[str, InMemoryChatMessageHistory] = {}

        # Runnable with lambda-based history fetcher
        self._runnable = RunnableWithMessageHistory(
            prompt | llm,
            lambda sid: self._history_store.setdefault(sid, InMemoryChatMessageHistory()),
            input_messages_key="input",
            history_messages_key="history",
        )

        self._session_id = session_id or str(uuid4())

    def invoke(self, prompt: str):
        return self._runnable.invoke(
            {"input": prompt},
            config={"configurable": {"session_id": self._session_id}},
        )

    def chat(self, prompt: str):
        """Render markdown IO like your original helper."""
        display_markdown("**Human:** ", raw=True)
        display_markdown(prompt, raw=True)
        output = self.invoke(prompt)
        display_markdown(f"**AI:** ", raw=True)
        display_markdown(output.content, raw=True)

    def _history_obj(self) -> InMemoryChatMessageHistory:
        # Always return the same object RunnableWithMessageHistory is using
        return self._history_store.setdefault(self._session_id, InMemoryChatMessageHistory())

    def get_history(self):
        """Return raw LangChain messages."""
        return self._history_obj().messages

    def to_dicts(self) -> List[Dict[str, Any]]:
        return [{"role": m.type, "content": m.content} for m in self.get_history()]

    def print_history(self):
        history = self.get_history()
        if not history:
            print("(no history)")
            return
        for msg in history:
            role = msg.type.capitalize()
            print(f"{role}: {msg.content}")

    def clear_history(self):
        self._history_store[self._session_id] = InMemoryChatMessageHistory()

    @property
    def session_id(self) -> str:
        return self._session_id


## Technical Description

### Components and Wiring
This class is designed as a thin but flexible wrapper around LangChain’s primitives.  
It wires together a prompt template, an OpenAI chat model, and a message history store to provide a stateful conversation loop.  

- **LLM and prompt graph:** Builds a `ChatPromptTemplate` with a system message, a `MessagesPlaceholder("history")`, and a final human input slot. This prompt is piped into a `ChatOpenAI` instance via `prompt | llm`, producing a runnable graph that accepts `{"input": "..."}`.  
- **Stateful execution:** Wraps the runnable graph in `RunnableWithMessageHistory`. The history accessor is a lambda that maps a `session_id` to an `InMemoryChatMessageHistory` stored in a per-instance dictionary:  
  `self._history_store: Dict[str, InMemoryChatMessageHistory]`  
  - `input_messages_key="input"` tells the runnable where the current user text lives  
  - `history_messages_key="history"` binds the placeholder to the conversation transcript  
- **Sessioning:** Each instance has a `_session_id` created with `uuid4()` unless provided. The same ID is supplied on every `invoke` call via `config={"configurable": {"session_id": ...}}`, so the runnable fetches the same in-memory history object on every turn.  

### Public API and Behavior
At the surface, the class exposes a small set of methods for interacting with the conversation.  
These cover both programmatic needs (invoking and retrieving history) and user-friendly utilities for display and inspection.  

- **`invoke(prompt: str)`** → calls the runnable directly and returns the model’s `AIMessage`. Lowest-level method for programmatic use.  
- **`chat(prompt: str)`** → notebook-friendly wrapper. Displays “Human” and “AI” with `display_markdown`, calls `invoke`, then renders the response.  
- **`get_history()`** → returns the underlying list of LangChain `BaseMessage` objects.  
- **`to_dicts()`** → converts the transcript into `[{"role": ..., "content": ...}, ...]`.  
- **`print_history()`** → prints a plain text transcript with capitalized roles.  
- **`clear_history()`** → resets the current session’s history with a fresh `InMemoryChatMessageHistory`.  
- **`session_id`** → exposed as a read-only property.  

### Configuration and Defaults
To keep the class easy to use, sensible defaults are provided for model choice, system prompt, and temperature.  
At the same time, developers can override these values at construction for more control.  

- **Model selection:** Uses `ChatOpenAI(model=model, temperature=temperature)` with defaults `model="gpt-5-mini"`, `temperature=0.3`.  
- **System prompt:** Injected at construction as the first template message. The string is `.strip()`ped to avoid accidental whitespace.  
- **History store scope:** History is per class instance, keyed by session string. While multiple sessions per instance are technically supported, this class uses a single `_session_id`.  

### Notes and Constraints
Because this class is meant as a minimal foundation, some advanced features are intentionally left out.  
It does not provide persistence, truncation strategies, or concurrency safeguards, leaving those as future extensions.  

- **Persistence:** History is in-memory only; no disk or database persistence is included.  
- **Truncation/summarization:** No automatic token-window management is present. Large conversations require custom handling.  
- **Threading:** `_history_store` is not thread-safe. Concurrent access would need synchronization.  
- **I/O surface:** Only notebook rendering (`display_markdown`) and in-memory mutation occur. No external storage or retrievers are integrated.  

This structure provides a minimal, stateful chat wrapper around LangChain’s history-aware runnable. It isolates session handling, maintains a clean prompt graph, and exposes a predictable API for invoking the model, inspecting the conversation, and resetting context.



We can now carry on a simple conversation with the LLM, using LangChain to track the conversation memory.

In [None]:
c = SimpleConversation()
c.chat("Hello, what is my name?")
c.chat("Oh sorry, my name is Jeff.")
c.chat("What is my name?")

**Human:** 

Hello, what is my name?

**AI:** 

I don't know—I don't have access to your personal information unless you tell me. What would you like me to call you?

**Human:** 

Oh sorry, my name is Jeff.

**AI:** 

Nice to meet you, Jeff. How can I help you today?

**Human:** 

What is my name?

**AI:** 

Your name is Jeff. How can I help you today?

## Conversing with the LLM in Markdown

Just as before, we can request that the LLM output be in mardown. This allows code and tables to be represented clearly.

In [None]:
c = SimpleConversation()
c.chat("Give me a table of the 5 most populus cities with population and country, no questions, give simple valid table that renders correctly")


**Human:** 

Give me a table of the 5 most populus cities with population and country, no questions, give simple valid table that renders correctly

**AI:** 

| Rank | City     | Country      | Population (approx.) |
|------|----------|--------------|----------------------|
| 1    | Shanghai | China        | 24,300,000           |
| 2    | Beijing  | China        | 21,500,000           |
| 3    | Karachi  | Pakistan     | 16,100,000           |
| 4    | Istanbul | Turkey       | 15,500,000           |
| 5    | Dhaka    | Bangladesh   | 15,100,000           |

## Constraining the Conversation with a System Prompt

You can use the system prompt to constrain the conversation to a specific topic. Here, we provide a simple agent that will only discuss life insurance.

In [None]:
SYSTEM_PROMPT = """
You are a helpful bot named BearBot that only answers questions about Washington University in St. Louis.
"""
c = SimpleConversation(system_prompt = SYSTEM_PROMPT)

c.chat("What is your name?")
c.chat("What are some good places to eat a quick lunch at the Danforth campus?")

**Human:** 

What is your name?

**AI:** 

I'm BearBot — the Washington University in St. Louis assistant. How can I help with information about WashU?

**Human:** 

What are some good places to eat a quick lunch at the Danforth campus?

**AI:** 

Do you want on‑campus options only, or are nearby off‑campus spots (Delmar Loop area) okay? Any dietary preferences or a price range (under $8, $10–15, etc.)? 

If you want, I can then give a short list of quick, reliable places (coffee shops, food court/dining hall grab‑and‑go, and nearby fast casuals) and approximate walk times.

In [None]:
c.chat("What is the worlds tallest building?")

**Human:** 

What is the worlds tallest building?

**AI:** 

I only answer questions about Washington University in St. Louis. Do you mean the tallest building on WashU’s campus (Danforth or Medical Center), or the tallest building near campus? Tell me which and I’ll give the answer.

##Examining the Conversation Memory

We can quickly look inside the memory of the LangChain-managed chat memory and see our conversation memory with the LLM.

In [None]:
print(c.get_history())

[HumanMessage(content='What is your name?', additional_kwargs={}, response_metadata={}), AIMessage(content="I'm BearBot — the Washington University in St. Louis assistant. How can I help with information about WashU?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 224, 'prompt_tokens': 35, 'total_tokens': 259, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 192, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-5-mini-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-C79V3imqwJy9JB7m1nCdDb3Zth9fQ', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--42b718b6-07cb-4d45-b69a-f27f126871f2-0', usage_metadata={'input_tokens': 35, 'output_tokens': 224, 'total_tokens': 259, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 192}}), HumanM

In [None]:
history = c.get_history()
for item in history:
  print(item)

content='What is your name?' additional_kwargs={} response_metadata={}
content="I'm BearBot — the Washington University in St. Louis assistant. How can I help with information about WashU?" additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 224, 'prompt_tokens': 35, 'total_tokens': 259, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 192, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-5-mini-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-C79V3imqwJy9JB7m1nCdDb3Zth9fQ', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None} id='run--42b718b6-07cb-4d45-b69a-f27f126871f2-0' usage_metadata={'input_tokens': 35, 'output_tokens': 224, 'total_tokens': 259, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 192}}
content='What are some good places to ea