<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/088_Programmatic_Prompting%2BMemory_Management.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# 🛠️ Sending Prompts & Managing Memory

## 🚀 From Manual Prompts to Automated Agents

**Programmatically sending prompts** is how we move from:

> *“A human types a prompt and reacts to the LLM’s response”*

…to:

> *“An agent automatically sends prompts and acts on the results.”*

To start building agents, we need two **core capabilities**:

### 1️⃣ Programmatic Prompting

Automating the **prompt → response** cycle that humans do manually.
This forms the foundation of the **Agent Loop**.

### 2️⃣ Memory Management

Controlling what information *persists* between iterations (e.g., API calls + results) so the agent maintains **context** while making decisions.

---

## 💬 Sending Prompts Programmatically & Managing Memory

### 🏷 System Message — *The Most Important Part*

* Tells the model **how to behave**.
* Sets the **ground rules** for the conversation.
* **Models pay more attention** to this than to user messages.
* You “**program**” the agent through this message.

### 👤 User Message — *The Question*

* Asks the model for a specific answer or action.

**Example Prompt Structure**:

```json
[
  {"role": "system", "content": "You are an expert software engineer..."},
  {"role": "user", "content": "Write a function to swap keys and values in a dict."}
]
```

---

## 📌 Why This Matters

LLMs are **stateless**. Without explicit instructions, they forget everything between turns.

To make **multi-turn conversations** work, you must *manually* manage context.

---

## 🧠 Key Takeaways

> ### ❌ No Inherent Memory

The LLM **has no knowledge** of past interactions unless you **include** them in the current prompt.

> ### 📜 Provide Full Context

To simulate continuity, include **all relevant messages** (both `user` and `assistant`) in your `messages` parameter.

> ### 🤝 Role of Assistant Messages

Adding **previous responses** as `assistant` messages lets the model:

* Maintain a **coherent conversation**
* Build on earlier exchanges
* “Remember” what actions (like API calls) it took in the past

> ### 🎯 Memory Management

You control what the model *remembers* by deciding which messages to include in the conversation.
Sometimes **forgetting** is powerful — for example:

* Breaking patterns of poor responses
* Resetting an agent’s thinking mid-loop





# 1) Programmatic prompting (vs. hand-typing)

**Concept:** Agents call the model in code with a `messages` list (role + content). That’s the foundation of an agent loop.&#x20;
**In your code:**

* `messages = [{"role": "system", ...}, {"role": "user", ...}]` is the programmatic prompt you’d send.
* `step(...)` is the “loop step”: choose visible context → get reply → append it.

Why it matters: this is how you go from “chatting” to **automating** the chat.&#x20;

# 2) Roles (system, user, assistant)

**Concept:** The **system** message sets behavior; **user** asks; **assistant** replies. Models weight the system message heavily.&#x20;
**In your code:**

* `system_instructions = "..."; messages = [{"role":"system", "content": system_instructions}]`
* You append user messages via `add_message("user", ...)`.
* Assistant outputs are appended inside `step(...)` with `add_message("assistant", reply)`.

Takeaway: You “program” the agent through the **system** message.&#x20;

# 3) LLMs are stateless

**Concept:** The model doesn’t remember anything between calls unless you include it again in `messages`.&#x20;
**In your code:**

* `step(memory_size=2)` may drop earlier turns; the mock replies “I need more context.”
* That’s the “missing context” demo from your notes.&#x20;

# 4) Memory = what you include in `messages`

**Concept:** “Memory” is **explicit**: you keep past turns (including prior assistant outputs) in the next request. That’s how continuity and multi-step reasoning work.&#x20;
**In your code:**

* `window(messages, max_messages)` is your memory policy.
* Small window ⇒ loses earlier assistant code ⇒ can’t “update the function.”
* Larger window (e.g., `memory_size=10`) ⇒ includes prior assistant code ⇒ the mock can add docs.

Key takeaways match your notes: no inherent memory; provide full context; assistant messages are crucial for continuity.&#x20;

# 5) Memory management (what to keep vs. forget)

**Concept:** You control recall by choosing which turns to pass along. Sometimes **forgetting** is useful to break bad patterns.&#x20;
**In your code:**

* `window(...)` is a sliding-window policy by **count**.
* Later you can swap this for token-budgeting or summarization without changing the rest.

# 6) GAIL framing (preview for later)

**Concept:** A practical way to think about agents: **G**oals, **A**ctions (tools), **I**nformation (memory/context), **L**anguage (prompts/instructions). Your Lecture 1 focuses on **I** and **L**; we’ll layer **G** and **A** as you add tools/APIs.&#x20;

---

## Quick “concept → code” checklist

* **Programmatic prompt:** your `messages` list.&#x20;
* **System grounding:** first `messages[0]`.&#x20;
* **Statelessness:** second query fails when earlier context isn’t included.&#x20;
* **Memory via assistant turns:** include previous assistant output to enable edits.&#x20;
* **Memory policy:** your `window(...)` function.



In [2]:
# --- Setup: roles and a starting system message ---
system_instructions = "You are an expert software engineer who explains simply."
messages = [{"role": "system", "content": system_instructions}]

def add_message(role, content):
    messages.append({"role": role, "content": content})

def window(msgs, max_messages=None):
    """Return the last N messages (or all if None)."""
    if max_messages is None:
        return msgs
    return msgs[-max_messages:]

# --- Mock LLM (no network): reacts to what's in the visible window ---
import textwrap

def mock_llm(visible_messages):
    # find latest user request
    last_user = ""
    for m in reversed(visible_messages):
        if m["role"] == "user":
            last_user = m["content"]
            break
    # find the most recent assistant message, if any
    prev_assistant = ""
    for m in reversed(visible_messages):
        if m["role"] == "assistant":
            prev_assistant = m["content"]
            break

    # very naive branching
    if "Update the function" in last_user and "def swap_keys_values" in prev_assistant:
        return textwrap.dedent("""\
            \"\"\"Swap keys and values in a dictionary.

            Args:
                d (dict): Input dictionary with hashable keys and values.
            Returns:
                dict: A new dictionary with keys/values swapped.
            \"\"\"
            def swap_keys_values(d):
                return {v: k for k, v in d.items()}
        """)
    if "swap the keys and values" in last_user:
        return "def swap_keys_values(d):\n    return {v: k for k, v in d.items()}"
    return "I need more context."

def step(memory_size=None):
    """Build visible context -> get reply -> store it -> return it."""
    visible = window(messages, memory_size)
    reply = mock_llm(visible)
    add_message("assistant", reply)
    return reply




## What each piece does (plain-English)

* `messages`: the whole conversation as a list of `{role, content}` dicts.
* `add_message(role, content)`: push a new turn.
* `window(msgs, max_messages)`: picks the *last N* messages. This is your **memory policy**.
* `mock_llm(visible_messages)`: a dumb stand-in for an LLM that:

  * looks at the **latest user** request,
  * optionally looks at the **previous assistant** code,
  * returns either the base function, the documented version, or “I need more context.”
* `step(memory_size)`: the “agent loop”—choose visible context ➜ get a reply ➜ store it.

This mirrors the lecture’s core ideas: LLMs are stateless, so **whatever’s in `visible_messages` is what the model “remembers.”** Small `memory_size` can make the follow-up fail; a bigger one makes it succeed.

Want me to keep this minimal style for the next lecture too (and only add complexity when a concept demands it)?


In [4]:
# Reset conversation
messages[:] = [{"role": "system", "content": system_instructions}]

# 1) Ask for the function
add_message("user", "Write a function to swap the keys and values in a dictionary.")
print("Initial response (small memory):\n", step(memory_size=2), "\n")

Initial response (small memory):
 def swap_keys_values(d):
    return {v: k for k, v in d.items()} 



In [5]:
# 2) Ask to update, but keep memory very small (likely loses the prior assistant code)
add_message("user", "Update the function to include documentation.")
print("Update with small memory:\n", step(memory_size=2), "\n")  # often fails

Update with small memory:
 """Swap keys and values in a dictionary.

Args:
    d (dict): Input dictionary with hashable keys and values.
Returns:
    dict: A new dictionary with keys/values swapped.
"""
def swap_keys_values(d):
    return {v: k for k, v in d.items()}
 



In [6]:
# 3) Try again with larger memory (keeps prior assistant output visible)
add_message("user", "Update the function to include documentation.")
print("Update with larger memory:\n", step(memory_size=10))

Update with larger memory:
 """Swap keys and values in a dictionary.

Args:
    d (dict): Input dictionary with hashable keys and values.
Returns:
    dict: A new dictionary with keys/values swapped.
"""
def swap_keys_values(d):
    return {v: k for k, v in d.items()}



# Chat History & Memory

## 1. **Chat history**

That’s your full record of *everything* that’s happened in the conversation so far.
In our code, it’s the `messages` list:

```python
messages = [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."},
    ...
]
```

Every time you call:

```python
add_message(role, content)
```

…you’re **adding to the chat history**.
The history grows forever unless you clear or reset it.

---

## 2. **Memory** (in the LLM sense)

LLMs are *stateless*, so they don’t actually “remember” the history you’ve stored.
The **only “memory” they have** is the **messages you include in the next API call**.

That’s why we have:

```python
def window(msgs, max_messages=None):
    if max_messages is None:
        return msgs  # send full history
    return msgs[-max_messages:]  # send only the last N messages
```

* `max_messages=None` ⇒ you send *full* chat history (perfect recall).
* Small `max_messages` ⇒ you send only a **slice** of the chat history — that’s your *memory policy*.

---

## 3. **Key difference**

* **Chat history** = everything that has happened (local to your program).
* **Memory** = what you *choose* to give the LLM this time.

If you drop parts of the history from the `window(...)` result, the LLM has no awareness of them — it’s as if they never happened.
That’s how you can simulate forgetting or limit context for performance/relevance reasons.

---

A nice mental model:

```
chat history   = full notebook of notes
memory         = what you flip open and show the model right now
```

---

Think of it like this:

* **Chat history** = your *entire transcript* stored in your program (perfect recall from your side).
* **Memory** = the *subset of that transcript* you hand over to the model right now.

If you pass the **entire chat history** to the LLM, then *memory = history*.
If you pass only the **last N turns** (or a summary), then *memory ⊂ history*.

This distinction matters because:

* It’s the **memory** that shapes the model’s current response.
* You can keep **history** for logging, analysis, or reloading later, even if you don’t send it every time.
* Controlling **memory** lets you decide what the model “knows” at any given turn — which is key for performance, cost, and avoiding drift.

---

If you want, I can show you in code how to:

1. Keep **full history**.
2. Send **small memory window** to model.
3. See exactly how the model’s behavior changes when earlier assistant messages are *in* vs. *out* of the memory.





In [None]:
# --- CHAT HISTORY ---
# This list holds the *entire* conversation so far.
messages = [{"role": "system", "content": system_instructions}]

def add_message(role, content):
    # Adds a message to the *full* chat history.
    messages.append({"role": role, "content": content})
# --- END CHAT HISTORY ---


# --- MEMORY ---
def window(msgs, max_messages=None):
    """Return the last N messages (or all if None)."""
    # Creates the *memory window* — the slice of history
    # you actually give to the model right now.
    if max_messages is None:
        return msgs  # full recall (memory == history)
    return msgs[-max_messages:]  # limited recall (memory ⊂ history)
# --- END MEMORY ---


# Full Agent Code

In [8]:
# --- 1. Install and import ---
!pip install --quiet python-dotenv openai

import os
from dotenv import load_dotenv
from openai import OpenAI

# --- 2. Load API key ---
load_dotenv('/content/API_KEYS.env')
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# --- 3. Start with a system message (chat history) ---
system_instructions = "You are an expert software engineer who explains simply."
messages = [{"role": "system", "content": system_instructions}]

def add_message(role, content):
    """Append a message to the full chat history."""
    messages.append({"role": role, "content": content})

def window(msgs, max_messages=None):
    """Pick which messages the model sees (memory)."""
    if max_messages is None:
        return msgs
    return msgs[-max_messages:]

def step(memory_size=None):
    """Send memory window to model, get reply, add to history."""
    visible = window(messages, memory_size)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=visible
    )
    reply = response.choices[0].message.content
    add_message("assistant", reply)
    return reply

# --- 4. Demo: small vs. large memory window ---

# Reset chat history
messages[:] = [{"role": "system", "content": system_instructions}]

# Step 1: Ask for the base function
add_message("user", "Write a Python function to swap the keys and values in a dictionary.")
print("\n--- Initial response (small memory) ---")
print(step(memory_size=2))


--- Initial response (small memory) ---
Certainly! You can create a function that swaps the keys and values in a dictionary using a simple dictionary comprehension. Here's a function to do that:

```python
def swap_dict(d):
    # Create a new dictionary by swapping keys and values
    swapped = {value: key for key, value in d.items()}
    return swapped

# Example usage:
original_dict = {'a': 1, 'b': 2, 'c': 3}
swapped_dict = swap_dict(original_dict)
print(swapped_dict)  # Output: {1: 'a', 2: 'b', 3: 'c'}
```

### Explanation:
- The function `swap_dict` takes a dictionary `d` as input.
- It uses a dictionary comprehension to create a new dictionary where each key is now the value from the original dictionary, and each value is now the key.
- Finally, it returns the new dictionary with swapped keys and values.

### Note:
- If there are duplicate values in the original dictionary, this method will lose some data because dictionary keys must be unique. The last value for each original ke

In [9]:
# Step 2: Ask to update, but with small memory (likely missing prior code)
add_message("user", "Update the function to include a clear docstring.")
print("\n--- Update with small memory ---")
print(step(memory_size=2))


--- Update with small memory ---
Certainly! Here is the updated function with a clear docstring that describes its purpose, parameters, return value, and any important notes:

```python
def swap_dict(d):
    """
    Swap the keys and values of a given dictionary.

    This function takes a dictionary as input and returns a new dictionary 
    where the keys and values are swapped. If there are duplicate values in 
    the original dictionary, the last corresponding key will overwrite 
    previous keys in the swapped dictionary.

    Parameters:
    d (dict): A dictionary with keys and values to be swapped.

    Returns:
    dict: A new dictionary with the keys and values swapped.

    Example:
    >>> original_dict = {'a': 1, 'b': 2, 'c': 3}
    >>> swapped_dict = swap_dict(original_dict)
    >>> print(swapped_dict)
    {1: 'a', 2: 'b', 3: 'c'}
    """
    # Create a new dictionary by swapping keys and values
    swapped = {value: key for key, value in d.items()}
    return swapped



In [10]:
# Step 3: Try again with larger memory (includes prior assistant output)
add_message("user", "Update the function to include a clear docstring.")
print("\n--- Update with larger memory ---")
print(step(memory_size=10))


--- Update with larger memory ---
It seems that I misunderstood your request initially, but I've already provided a clear docstring in the previous response. However, I'll go ahead and reaffirm that by structuring the code properly with additional clarity. Here’s the updated version with a refined docstring:

```python
def swap_dict(d):
    """
    Swap the keys and values of a given dictionary.

    This function takes a dictionary as input and returns a new dictionary 
    where the keys and values are swapped. If there are duplicate values in 
    the original dictionary, the last corresponding key will overwrite 
    earlier keys in the swapped dictionary.

    Parameters:
    d (dict): A dictionary with keys and values to be swapped.

    Returns:
    dict: A new dictionary with the keys and values swapped.

    Example:
    >>> original_dict = {'a': 1, 'b': 2, 'c': 3}
    >>> swapped_dict = swap_dict(original_dict)
    >>> print(swapped_dict)
    {1: 'a', 2: 'b', 3: 'c'}
    """

# Add External Data to Memory

This is where *memory* isn’t just the running chat history, but also **external knowledge** you decide to include in the next prompt.

---

## 🧩 How it fits the “history vs. memory” idea

Right now we have:

```
Chat history → memory window → model
```

We can expand it to:

```
Chat history
+ External sources (PDF text, URL data, DB results, etc.)
→ merged into memory window → model
```

---

## 🛠 How to do it in our Lecture-1 style

We’ll keep our **full chat history** the same, but when we build the `visible` messages list, we can also inject extra “assistant” or “system” messages containing external data.

Here’s a minimal example:

```python
def add_external_memory(visible, content, role="system"):
    """
    Injects external info into the memory window.
    This could be PDF content, URL scrape, database query, etc.
    """
    visible.append({"role": role, "content": content})
    return visible

# Example: simulate loading from PDF or URL
external_text = """
This is content from an external document:
The swap_keys_values function should also handle empty dictionaries.
"""

# Step with extra memory
def step_with_external(memory_size=None, external=None):
    visible = window(messages, memory_size)
    if external:
        visible = add_external_memory(visible, external, role="system")
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=visible
    )
    reply = response.choices[0].message.content
    add_message("assistant", reply)
    return reply

# Usage:
messages[:] = [{"role": "system", "content": system_instructions}]
add_message("user", "Write a function to swap keys and values in a dictionary.")
print(step_with_external(memory_size=10))

add_message("user", "Update the function to handle empty dictionaries.")
print(step_with_external(memory_size=10, external=external_text))
```

---

## 💡 Key points

* **Chat history**: still tracks the conversation.
* **Memory window**: now includes both the history slice *and* extra external data.
* External info can come from:

  * PDF extraction (`PyPDF2`, `pdfplumber`, etc.)
  * Web scraping or API calls
  * Database lookups
  * Pre-computed summaries
* You can label it as a `"system"` message (instructions) or `"assistant"` message (past outputs) depending on how you want the model to treat it.

---

If you want, I can make you a **Colab-ready example** that:

1. Loads a PDF,
2. Extracts relevant text,
3. Injects it into the memory window,
   …so GPT-4o-mini can use it in the same way it uses the chat history.

Do you want me to build that next?



## 🔑 Wrap-up Points for Lecture 1

1. **Memory = control**

   * *Chat history* is just storage in your program.
   * *Memory* is what you choose to show the model.
   * This choice directly controls model behavior.

2. **Memory can come from anywhere**

   * History of the conversation.
   * External docs (PDFs, URLs, databases).
   * Summaries or structured facts you prepare ahead of time.
   * The model has no clue *where* the info came from — it just sees it in `messages`.

3. **Assistant messages are gold**

   * If your follow-up depends on what the model previously generated (e.g., code), you must include that output in the memory window.
   * Dropping them makes the model “forget” its own work.

4. **Forgetting is sometimes good**

   * You can intentionally remove context to break bad patterns or reduce irrelevant info.
   * Good agents know when to prune memory.

5. **System message = “personality” + rules**

   * Use it to *program* the model’s role, constraints, and goals.
   * Models heavily weight system instructions.

6. **Costs & limits matter**

   * More messages → more tokens → higher cost.
   * That’s why smart memory policies (sliding windows, summarization, retrieval) are essential.

.

