<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/021_Chat_History.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## ‚úÖ **Key Concepts**

### 1Ô∏è‚É£ *Programmatic Prompting*

* How to call an LLM programmatically (basic function)
* Difference between `system` and `user` messages

### 2Ô∏è‚É£ *No Inherent Memory*

* Show how the LLM forgets if you don‚Äôt include prior context
* Show how to maintain memory manually with messages

### 3Ô∏è‚É£ *Memory Management for Agents*

* Basic example of building a simple ‚ÄúAgent Loop‚Äù
* Highlight the idea that **memory is your responsibility**

### 4Ô∏è‚É£ *GAIL Framework*

* Clearly outline **GAIL**:

  * **Goals:** What does the agent want to achieve?
  * **Actions:** What actions can it take (APIs, functions, tools)?
  * **Information:** What does it know or need to know?
  * **Language:** How does it communicate its reasoning and results?
* Include a mini exercise: design a ‚ÄúGAIL‚Äù for a tiny agent (like a to-do list manager).


In [3]:
%pip install -qU dotenv openai pydantic

[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/757.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[90m‚ï∫[0m[90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m399.4/757.5 kB[0m [31m11.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m757.5/757.5 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25h

# Memory Management

In [5]:
# üìö Notebook 1: Stateless vs. Stateful Prompting

from openai import OpenAI
import os
from dotenv import load_dotenv

# Load your API key securely
load_dotenv("/content/API_KEYS.env")

# ‚úÖ Use the official OpenAI Python SDK
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# ================================================
# 1Ô∏è‚É£ Define a reusable function for LLM calls
# ================================================

def generate_response(messages):
    """Call the LLM and return its response."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",   # ‚úÖ Cheap and fast!
        messages=messages,
        max_tokens=500
    )
    return response.choices[0].message.content

## Stateless - No Memory

In [6]:
# ================================================
# 2Ô∏è‚É£ Stateless Example: No memory
# ================================================

# üîπ First prompt: ask for a simple function
messages = [
    {"role": "system", "content": "You are an expert Python engineer."},
    {"role": "user", "content": "Write a function to swap keys and values in a dictionary."}
]

response = generate_response(messages)
print("‚úÖ First response:\n", response)

# üîπ Now, try to refine it ‚Äî but WITHOUT giving the previous code!
# The model will not remember its own output.
messages = [
    {"role": "system", "content": "You are an expert Python engineer."},
    {"role": "user", "content": "Add type hints and docstrings to the function from before."}
]

response = generate_response(messages)
print("\nüö´ Second response (stateless):\n", response)


‚úÖ First response:
 You can create a function that swaps keys and values in a dictionary using a simple dictionary comprehension. Here‚Äôs how you can do it:

```python
def swap_keys_and_values(original_dict):
    # Ensure the values are unique to be valid as keys in the swapped dictionary
    if len(original_dict) != len(set(original_dict.values())):
        raise ValueError("Cannot swap keys and values: duplicate values found.")

    swapped_dict = {value: key for key, value in original_dict.items()}
    return swapped_dict

# Example usage:
original = {'a': 1, 'b': 2, 'c': 3}
swapped = swap_keys_and_values(original)
print(swapped)  # Output: {1: 'a', 2: 'b', 3: 'c'}
```

### Explanation:
1. **Function Definition**: The function `swap_keys_and_values` takes an `original_dict` as input.
2. **Duplicate Check**: Before swapping, it checks if all the values in the original dictionary are unique. If there are duplicate values, it raises a `ValueError`.
3. **Swapping**: It uses a dictiona

##Statefull - with Memory

In [7]:
# ================================================
# 3Ô∏è‚É£ Stateful Example: Manual memory
# ================================================

# üîπ Keep the first output in our "working memory"
# This gives the LLM context to build on its own work.
messages = [
    {"role": "system", "content": "You are an expert Python engineer."},
    {"role": "user", "content": "Write a function to swap keys and values in a dictionary."}
]

# Generate the initial function
initial_response = generate_response(messages)
print("\n‚úÖ First response (again):\n", initial_response)

# üîπ Now, reuse the initial response as "assistant" memory
messages = [
    {"role": "system", "content": "You are an expert Python engineer."},
    {"role": "user", "content": "Write a function to swap keys and values in a dictionary."},
    {"role": "assistant", "content": initial_response},
    {"role": "user", "content": "Add type hints and docstrings to the function from before."}
]

# Generate the refined version
response_with_memory = generate_response(messages)
print("\n‚úÖ Second response (with memory):\n", response_with_memory)


‚úÖ First response (again):
 Certainly! Below is a Python function that swaps the keys and values in a given dictionary. This function creates a new dictionary where the keys become the values and the values become the keys. Note that this will only work correctly if the values in the original dictionary are unique and hashable since they will become the new keys.

Here's how you can implement this:

```python
def swap_dict(d):
    """Swaps keys and values in a dictionary.

    Args:
        d (dict): The dictionary to swap.

    Returns:
        dict: A new dictionary with keys and values swapped.

    Raises:
        ValueError: If the input dictionary has non-unique values.

    Example:
        >>> swap_dict({'a': 1, 'b': 2, 'c': 3})
        {1: 'a', 2: 'b', 3: 'c'}
    """
    if len(d) != len(set(d.values())):
        raise ValueError("Cannot swap keys and values because the values are not unique.")

    return {value: key for key, value in d.items()}

# Example usage:
original_

Absolutely ‚Äî let‚Äôs unpack this piece step by step!
Your intuition is spot on: this block shows how **you create ‚Äúmemory‚Äù** by explicitly **feeding the model its own prior output**.

---

## üîç **How ‚Äúmanual memory‚Äù works**

### ‚úÖ **Key principle**

LLMs are **stateless** ‚Äî they only see what you give them in the `messages` list, every time.
So you create ‚Äúmemory‚Äù by:

1. **Saving the assistant‚Äôs past responses**
2. Adding them back into the conversation next time

---

## üìö **Your example, line by line**

```python
# This list simulates an ongoing conversation with full context:
messages = [
    # System role: sets the model‚Äôs behavior
    {"role": "system", "content": "You are an expert Python engineer."},

    # User role: first request
    {"role": "user", "content": "Write a function to swap keys and values in a dictionary."},

    # Assistant role: the LLM‚Äôs previous answer
    {"role": "assistant", "content": initial_response},

    # User role: follow-up request that depends on previous output
    {"role": "user", "content": "Add type hints and docstrings to the function from before."}
]
```

---

### üß† **Why does this work?**

When you send this whole `messages` list to the LLM:

* The model **sees the entire conversation history** in one go.
* It knows exactly **what it said previously** (the code it generated).
* So when you say *‚ÄúAdd type hints and docstrings to the function from before‚Äù* ‚Äî ‚Äúthe function from before‚Äù is right there!

Without this, the model would guess ‚Äî but with it, the model has 100% of what it needs.

---

### ‚ö°Ô∏è **Analogy**

Think of it like **pasting the whole chat log** each time you talk to the model:

* The system sets the rules.
* The conversation is the *entire state*.
* You choose exactly what the model ‚Äúknows‚Äù at each step.

---

## ‚úÖ **Key takeaway**

**You control the ‚Äúworking memory‚Äù by managing `messages`.**

* Want the model to ‚Äúforget‚Äù something? Leave it out.
* Want it to ‚Äúremember‚Äù? Add it as a role: `"assistant"`.

This pattern is the core of:

> üåÄ **The Agent Loop**: *Perceive ‚Üí Reason ‚Üí Act ‚Üí Remember*




‚úÖ This is an important ‚Äúaha!‚Äù moment about **how LLMs handle context** versus how you might expect a software system to manage state in a variable or object.

---

## üß© **In normal software**

If you‚Äôre building a chat app, you‚Äôd probably:

1. Store each message in a `chat_history` list or database.
2. Append each new message to that list.
3. When needed, show the full conversation to the user.

```python
chat_history = []
chat_history.append("User: Write a function to swap keys and values.")
chat_history.append("Assistant: Here‚Äôs the code...")
chat_history.append("User: Add type hints and docstrings.")
```

Makes sense, right? The **conversation lives in `chat_history`**, and it grows line by line.

---

## ü§ñ **With LLMs**

The LLM itself **does not keep state**.
Every time you call `client.chat.completions.create()`, it starts from scratch ‚Äî it only ‚Äúknows‚Äù what‚Äôs inside the `messages` you send *right now*.

So *you* must build the `messages` list each time to include exactly what the model should ‚Äúremember.‚Äù

---

### ‚úÖ **So why don‚Äôt we use `.append()` in the example?**

**We *do* append in concept ‚Äî but we do it outside the LLM!**
In this simple example, you see:

```python
# First interaction: messages has 2 items
messages = [
    {"role": "system", ...},
    {"role": "user", ...}
]

initial_response = generate_response(messages)
```

---

Then for the second interaction, we *manually* build the new list:

```python
# Second interaction: build a new messages list that includes:
# 1) The original system and user messages
# 2) The model's previous response, now as role=assistant
# 3) The new user request
messages = [
    {"role": "system", ...},
    {"role": "user", ...},
    {"role": "assistant", "content": initial_response},
    {"role": "user", "content": "Add type hints and docstrings..."}
]
```

**Same idea as `.append()` ‚Äî we‚Äôre just showing you the fully built list for clarity!**

---

## ‚úÖ **How you‚Äôd do it in real code**

In production, you‚Äôd maintain a `conversation_history` list:

```python
conversation_history = [
    {"role": "system", "content": "You are an expert engineer."}
]

# User says something
conversation_history.append({"role": "user", "content": "Write a function..."})

# Call LLM
assistant_reply = generate_response(conversation_history)

# Save the assistant‚Äôs reply
conversation_history.append({"role": "assistant", "content": assistant_reply})

# Next user message
conversation_history.append({"role": "user", "content": "Add type hints..."})

# Call again with full context
assistant_reply = generate_response(conversation_history)

# Save again...
conversation_history.append({"role": "assistant", "content": assistant_reply})
```

üí° **This is the same as ‚Äúmanual memory‚Äù!**
You build your `messages` list step by step ‚Äî just like your mental model of `+=` or `.append()`.

---

## üîë **Key insight**

‚úîÔ∏è The `generate_response()` function has *no memory*.
‚úîÔ∏è The `client.chat.completions.create()` call has *no memory*.
‚úîÔ∏è **Your `messages` list *is* the memory!**

So:

> The ‚Äústate‚Äù lives in the **list of messages you maintain** in your Python process ‚Äî not inside the model.

---

## ‚úÖ **When to use manual build vs `.append()`**

* **Teaching/demo:** We often show the whole `messages` list for clarity.
* **Real agent loop:** You maintain one list and `.append()` to it as the conversation grows.



‚úÖ Let‚Äôs take your example and rewrite it as it would look in **real production** ‚Äî using a **persistent `chat_history`** that you *append* to as the conversation grows.

---

## üîÑ **Production-style memory pattern**

Here‚Äôs the same **‚Äústateless vs. stateful‚Äù** example, but with a **real chat loop** that makes it crystal clear where memory lives and how you append to it.

---

### üß© **Full working example: Production-style conversation**

```python
# ==========================================
# ‚úÖ Setup: Same generate_response function
# ==========================================
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv("/content/API_KEYS.env")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_response(messages):
    """Call the LLM and return its response."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=500
    )
    return response.choices[0].message.content


# ==========================================
# ‚úÖ 1Ô∏è‚É£ Start a conversation history
# ==========================================
conversation_history = [
    {"role": "system", "content": "You are an expert Python engineer."}
]

# ==========================================
# ‚úÖ 2Ô∏è‚É£ User asks first question
# ==========================================
user_message_1 = "Write a function to swap keys and values in a dictionary."
conversation_history.append({"role": "user", "content": user_message_1})

# Call the LLM with full history so far:
assistant_reply_1 = generate_response(conversation_history)
print("‚úÖ First response:\n", assistant_reply_1)

# Append the assistant's reply to the history:
conversation_history.append({"role": "assistant", "content": assistant_reply_1})


# ==========================================
# ‚úÖ 3Ô∏è‚É£ User asks a follow-up question
# ==========================================
user_message_2 = "Add type hints and docstrings to the function from before."
conversation_history.append({"role": "user", "content": user_message_2})

# Call the LLM again with *entire* conversation:
assistant_reply_2 = generate_response(conversation_history)
print("\n‚úÖ Second response (with full memory):\n", assistant_reply_2)

# Append the assistant's reply again
conversation_history.append({"role": "assistant", "content": assistant_reply_2})

# ==========================================
# ‚úÖ 4Ô∏è‚É£ Keep going as needed!
# ==========================================
# Next user message: just append, call, append, repeat.
```

---

## üîç **What‚Äôs different vs. your static example?**

| Static Example                                      | Production Pattern                                                      |
| --------------------------------------------------- | ----------------------------------------------------------------------- |
| You build a new `messages` list manually each time. | You maintain **one `conversation_history`** that grows over time.       |
| Good for demos, not reusable.                       | Reusable in real apps, chatbots, agents.                                |
| No `.append()` ‚Äî you just overwrite.                | Uses `.append()` for each message: user ‚Üí assistant ‚Üí user ‚Üí assistant. |
| No clear ‚Äúsource of truth‚Äù.                         | `conversation_history` is your single source of truth for memory.       |

---

## ‚úÖ **Key takeaway**

**Your job as the developer is:**

* Keep a single `conversation_history` list.
* Always pass *all relevant* messages to the LLM on every call.
* Append the model‚Äôs responses to the list after each turn.

The **LLM has no memory** ‚Äî your `conversation_history` is the memory.

---

## ‚ö°Ô∏è **Bonus tip: Use classes or files later**

When your agent gets bigger:

* You‚Äôll wrap this in a class: `AgentMemory`
* You might store the `conversation_history` to disk or a DB to persist state.
* You can prune or summarize old messages to control token costs.



‚úÖ In a *real-world implementation*, you wouldn‚Äôt hardcode `assistant_reply_1`, `assistant_reply_2`, etc. ‚Äî you‚Äôd run it in a **loop** or inside a function that handles each turn dynamically.

Let‚Äôs break down how it looks in practice:

---

## üóÇÔ∏è **Key idea**

In production, you‚Äôd have:

* A **single `conversation_history`** list that persists across turns.
* A **loop or handler** that:

  1. Takes the latest user input.
  2. Appends it to the conversation.
  3. Calls the LLM with the entire history.
  4. Gets the assistant‚Äôs reply.
  5. Appends the reply back to the conversation.
  6. Repeat.

---

## üîÑ **Production-style pattern**

Here‚Äôs a **realistic reusable loop** you could run in a Colab notebook, a Flask API, or a chatbot server:

---

```python
from openai import OpenAI
from dotenv import load_dotenv
import os

# Load API key
load_dotenv("/content/API_KEYS.env")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_response(messages):
    """Call the LLM and return its response."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=500
    )
    return response.choices[0].message.content

# ================================================
# ‚úÖ 1Ô∏è‚É£ Initialize the conversation
# ================================================
conversation_history = [
    {"role": "system", "content": "You are an expert Python engineer."}
]

# ================================================
# ‚úÖ 2Ô∏è‚É£ Agent loop: keep chatting!
# ================================================

while True:
    # --- 1. Get new user input ---
    user_input = input("\nüßë‚Äçüíª You: ")

    # Exit condition
    if user_input.lower() in ["exit", "quit"]:
        print("üëã Ending the conversation.")
        break

    # --- 2. Add user input to history ---
    conversation_history.append({"role": "user", "content": user_input})

    # --- 3. Call LLM with full conversation ---
    assistant_reply = generate_response(conversation_history)

    # --- 4. Show reply ---
    print("\nü§ñ Assistant:\n", assistant_reply)

    # --- 5. Append reply to history ---
    conversation_history.append({"role": "assistant", "content": assistant_reply})
```

---

## ‚úÖ **Key points**

‚úîÔ∏è **No hardcoded replies:**
Each new user input triggers a fresh LLM call with the *entire* conversation.

‚úîÔ∏è **One `conversation_history` list:**
Grows automatically as turns happen.

‚úîÔ∏è **Flexible stop:**
You can type `exit` or `quit` to end the loop.

‚úîÔ∏è **Same pattern everywhere:**
This is exactly what your agent loop in a real backend looks like ‚Äî just the input source changes:

* Local script ‚Üí use `input()`
* Chatbot ‚Üí messages from a frontend
* Slack bot ‚Üí messages from Slack‚Äôs API

---

## üîë **Why this is the same idea**

* In your earlier example, `assistant_reply_1` was hardcoded for illustration.
* Here, the `assistant_reply` is just a *variable that changes every turn*.
* The **conversation memory** is your growing `conversation_history` list.

---

## ‚ö°Ô∏è **Next-level tip**

When you build real agents:

* Use a `while` loop, `for` loop, or event handler.
* Manage token usage: prune or summarize old history if it gets huge.
* Save `conversation_history` to disk or DB if you want persistence across sessions.

---

**And that‚Äôs it!**
The same simple pattern: **append ‚Üí call ‚Üí append ‚Üí repeat** = the *core* of your agent‚Äôs memory loop. üîÅ





## üìö **Your conversation history = your agent‚Äôs memory**

But every LLM (like GPT-4o-mini) has a **maximum context window** ‚Äî the total number of tokens it can read in at once.

---

## ‚ö°Ô∏è **How this works in practice**

* Every time you call `client.chat.completions.create()`, the entire `messages` list is tokenized.
* That means **system + all user messages + all assistant replies + your new prompt** must fit inside the model‚Äôs context window.
* For example:

  * GPT-4o-mini supports \~128K tokens (depending on the tier).
  * Each ‚Äúturn‚Äù in a conversation can add a few dozen to a few thousand tokens (code, reasoning, explanations).

If your `conversation_history` gets too big, you hit the limit ‚Äî and the API will truncate or throw an error.

---

## üîç **This is why real agents need memory management**

When your agent runs for days or has long multi-step tasks, you‚Äôll need strategies to **stay within the context window**:

### üß© **Practical memory management strategies**

1Ô∏è‚É£ **Summarization**

* Have the LLM *summarize older conversation parts* into a shorter chunk.
* Replace multiple old messages with a concise ‚Äúmemory‚Äù message.

Example:

```python
# Replace:
# [Many old user/assistant messages]
# With:
{"role": "system", "content": "Summary of prior conversation: User is building a calendar agent, they discussed how to call the Google API and manage errors."}
```

---

2Ô∏è‚É£ **Trimming**

* Drop very old messages if they‚Äôre no longer relevant.
* Or keep only the system role + the last few turns.

---

3Ô∏è‚É£ **External long-term memory**

* Save structured facts or intermediate states to a database or file.
* Feed those facts back in when needed.

Example:

> Your agent builds a task list ‚Üí stores tasks in a DB ‚Üí only pulls the relevant tasks for each context.

---

4Ô∏è‚É£ **Chunking or windowing**

* Use a **sliding window** approach:
  Always send only the *most recent X turns* plus any critical context.
* Good for short conversations that don‚Äôt need deep history.

---

## ‚úÖ **Key takeaway**

> **Your `conversation_history` is your ‚Äúworking memory‚Äù** ‚Äî but you, the developer, decide how much context to keep in the LLM window vs. store externally.

---

## üöÄ **Agent Loop pattern**

1. **Perceive** (user input + tools + environment)
2. **Reason** (call LLM with the needed context)
3. **Act** (take action)
4. **Remember**

   * Append new state to your working memory
   * Summarize or store to long-term memory if needed




Awesome! Let‚Äôs level up your agent loop with a **practical `AgentMemory` helper class** that does:

‚úÖ Append messages (user & assistant)
‚úÖ Keep the full working memory (`conversation_history`)
‚úÖ **Auto-summarize** older messages to manage context window size
‚úÖ Simple `.prune()` method for dropping irrelevant chunks

---

## üß© **Why use a helper class?**

In production you don‚Äôt want to keep repeating:

```python
conversation_history.append({...})
```

or forgetting which messages you‚Äôve added.
A simple `AgentMemory` class makes this consistent and extendable.

---


## üóÇÔ∏è **How it works**

‚úÖ **`AgentMemory`** holds your entire conversation state.
‚úÖ You call `.add_user_message()` and `.add_assistant_message()` each turn.
‚úÖ `.summarize_old_messages()` uses the LLM itself to compress the old chat, so you stay under your token budget.
‚úÖ `.prune_oldest()` is a fallback to just drop old turns if you want.

---

## üîë **Why it‚Äôs practical**

* You get **auto memory management**, so you don‚Äôt accidentally blow up the context window.
* You can swap in more advanced logic later: vector store, DB, semantic retrieval.
* It‚Äôs modular and testable.




In [8]:

# ============================================
# ‚úÖ AgentMemory helper class
# ============================================

class AgentMemory:
    def __init__(self, system_prompt: str):
        self.system_message = {"role": "system", "content": system_prompt}
        self.history = [self.system_message]

    def add_user_message(self, content: str):
        self.history.append({"role": "user", "content": content})

    def add_assistant_message(self, content: str):
        self.history.append({"role": "assistant", "content": content})

    def get_history(self):
        return self.history

    def summarize_old_messages(self):
        """
        Summarize older messages into a compact chunk
        and keep the last few turns verbatim.
        """
        if len(self.history) <= 6:
            print("‚ÑπÔ∏è Nothing to summarize yet.")
            return

        # Keep system + last 4 messages
        recent_messages = [self.system_message] + self.history[-4:]

        # Summarize older part
        old_messages = self.history[1:-4]

        summary_prompt = [
            {"role": "system", "content": "You are a summarization assistant. Summarize the following conversation."},
            {"role": "user", "content": "\n\n".join([f"{m['role']}: {m['content']}" for m in old_messages])}
        ]

        summary = generate_response(summary_prompt)

        # Replace with summary message
        summarized_memory = {"role": "system", "content": f"Conversation summary: {summary}"}

        self.history = [summarized_memory] + recent_messages[1:]  # Avoid duplicate system message
        print("‚úÖ Summarized old messages!")

    def prune_oldest(self, keep_last_n: int = 6):
        """
        Drop oldest user/assistant messages, keep system + last N turns.
        """
        if len(self.history) <= keep_last_n + 1:
            print("‚ÑπÔ∏è Nothing to prune.")
            return

        self.history = [self.system_message] + self.history[-keep_last_n:]
        print(f"‚úÖ Pruned to keep last {keep_last_n} turns.")

# ============================================
# ‚úÖ Example usage loop
# ============================================

agent_memory = AgentMemory(system_prompt="You are an expert Python engineer.")

while True:
    user_input = input("\nüßë‚Äçüíª You: ")
    if user_input.lower() in ["exit", "quit"]:
        print("üëã Ending the conversation.")
        break

    agent_memory.add_user_message(user_input)

    assistant_reply = generate_response(agent_memory.get_history())
    print("\nü§ñ Assistant:\n", assistant_reply)

    agent_memory.add_assistant_message(assistant_reply)

    # Example: if history grows, auto-summarize or prune
    if len(agent_memory.get_history()) > 10:
        agent_memory.summarize_old_messages()



üßë‚Äçüíª You: What is the difference between class and method in python programming?

ü§ñ Assistant:
 In Python programming, classes and methods are both fundamental concepts associated with object-oriented programming (OOP), but they serve different purposes. Here‚Äôs a breakdown of the differences:

### Class

1. **Definition**: A class is a blueprint for creating objects. It defines the properties (attributes) and behaviors (methods) that the objects created from the class will have.

2. **Purpose**: Classes are used to bundle data and functionality together. They help in organizing code by grouping related properties and methods.

3. **Instantiation**: You create an instance (or object) of a class to use its functionality. Each object can have its own state (values of attributes).

4. **Example**:
    ```python
    class Dog:
        def __init__(self, name):
            self.name = name
        
        def bark(self):
            return "Woof!"
    
    my_dog = Dog("Rex")
 