<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/070_Selective_Memory_Sharing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook introduces a **powerful memory filtering pattern** that improves multi-agent coordination:

### **Selective Memory Sharing: Focusing Agent Attention**

Here’s the jazzed-up Markdown version to help organize your notes:

---

# 🧠 Selective Memory Sharing: Using LLMs to Focus Agent Attention

In complex multi-agent systems, agents often accumulate large memory histories. But not every downstream task needs the *entire* context — just the relevant parts.

Instead of relying on rigid rule-based filters (e.g. keyword matches or timestamps), we can use the **LLM itself** to intelligently choose which memories to share.

## 🎯 The Idea

We delegate the memory selection process to the LLM so it can:

* ✂️ **Trim irrelevant content**
* 🧠 **Understand contextual nuance**
* 🧾 **Justify its reasoning**
* 🧼 **Keep downstream agents focused**

This allows one agent to **call another agent with a curated set of relevant memories**, rather than dumping the whole history.

---

## 🔧 How It Works

The `call_agent_with_selected_context` tool enables this behavior. Here's the workflow:

1. **Assign memory IDs**
   Every memory item gets a unique ID (`mem_0`, `mem_1`, ...).

2. **Present all memories to the LLM**
   The LLM sees every memory, formatted clearly with IDs.

3. **Ask the LLM to select the most relevant**
   Using structured JSON and a thoughtful schema, the LLM returns:

   * `selected_memories`: A list of relevant memory IDs
   * `reasoning`: Why it selected those items

4. **Filter memory for the downstream agent**
   Only the selected memory items are passed to the called agent.

5. **Preserve transparency**
   The original agent logs the LLM’s selection reasoning for future traceability.

---

## 🧪 Example

Here’s a quick snapshot of what a memory list might look like:

```python
memories = [
    {"type": "user", "content": "We need to build a new reporting dashboard"},
    {"type": "assistant", "content": "Initial cost estimate: $50,000"},
    {"type": "user", "content": "That seems high"},
    {"type": "assistant", "content": "Breakdown: $20k development, $15k design..."},
    {"type": "system", "content": "Project deadline updated to Q3"},
    {"type": "user", "content": "Can we reduce the cost?"}
]
```

The LLM might return this:

```json
{
  "selected_memories": ["mem_1", "mem_3", "mem_5"],
  "reasoning": "Selected memories containing cost information and the request for cost reduction, excluding project timeline and general discussion as they're not directly relevant to the budget review task."
}
```

So the next agent (e.g., a Budget Specialist) receives **only** the relevant context — a lightweight, focused memory — increasing its performance and reducing error risk.

---

## 💡 Why It Matters

Compared to hard-coded filters or full memory handoffs, this approach:

✅ Uses semantic understanding
✅ Justifies memory inclusion
✅ Adapts dynamically to any task
✅ Keeps memory histories lean and task-relevant

This is especially useful in **budgeting, legal review, technical triage, compliance**, or any scenario where context size matters.

---

## 🧭 Recap of Memory Sharing Patterns

| Pattern               | Use Case                                                       |
| --------------------- | -------------------------------------------------------------- |
| **Message Passing**   | Fast, focused interactions. Don’t care how answer was derived. |
| **Memory Reflection** | Learn from another agent’s reasoning process.                  |
| **Memory Handoff**    | Fully transfer context for task continuation.                  |
| **Selective Sharing** | Curate relevant memory to reduce noise and enhance focus.      |






### ✅ 1. **Cost Efficiency** — *"Only pay for what you need"*

* **Fewer input tokens**: By passing only the most relevant memory entries, you shrink the prompt size significantly — especially important for long-running agents or multi-step workflows.
* **Fewer output tokens**: The LLM isn’t overwhelmed with excessive context, which also means it’s less likely to produce bloated or tangential responses.
* **Lower inference cost overall**: Whether you're using OpenAI’s GPT, Claude, or Mistral, they all charge based on token count — and every unnecessary memory entry adds up over time.

---

### 🧠 2. **Cognitive Load Management** — *"Respect the LLM’s attention span"*

* **Focus = Better performance**: LLMs are probabilistic pattern matchers — when you reduce irrelevant context, you dramatically increase their chance of focusing on the *actual task*.
* **Less noise = fewer hallucinations**: A cluttered memory history can cause the LLM to pick up on the wrong pattern or fixate on the wrong detail.
* **Gentle guidance = consistent output**: This pattern is the systems-level equivalent of giving the model a calm, focused prompt — like “Here’s what you should pay attention to.”

---

### 🤝 Bonus Benefit: *It’s Kind Design*

> You joked about being kind to the model — but honestly? You’re onto something bigger.

* LLMs *reward cooperative behavior*. When you reduce ambiguity, clean up distractions, and treat them like a capable collaborator rather than a mindless tool, they deliver better results.
* Whether we call it **cooperative prompting**, **cognitive hygiene**, or **empathic system design**, the outcome is the same:

  **→ Cleaner inputs produce cleaner outputs.**




## 🧠 Selective Memory Sharing: Using LLM Understanding for Context Selection

Sometimes, an agent needs to share only the **most relevant parts** of its memory with another agent — not the entire memory log. Instead of relying on rigid, rule-based filtering, we can **harness the LLM’s own understanding** of the task to decide what matters most.

This creates a more flexible, scalable, and intelligent memory-sharing process.

---

### 🔍 How It Works

1. **📎 Memory ID Tagging**
   Every memory entry is assigned a **unique ID** to allow precise reference and selection.

2. **🧾 Full Memory Context**
   The agent sends the **entire memory** (with IDs) to the LLM, giving it full visibility over past interactions.

3. **🧠 Intelligent Filtering via LLM**
   The LLM evaluates the **task description** and selects only the **most relevant memories**, returning the list in a structured JSON format.

4. **💬 Justification Logging**
   The LLM’s **reasoning** for selecting each memory is saved in the agent’s memory, enabling **transparency** and **traceability**.

---

### ✅ Why This Matters

* **Efficient Token Use**: Only what’s important gets passed along.
* **Improved Agent Communication**: Follow-up agents get just the right context.
* **Interpretability**: Human reviewers (or debugging devs!) can see *why* certain memories were chosen.




In [None]:
@register_tool(description="Delegate a task to another agent with selected context")
def call_agent_with_selected_context(action_context: ActionContext,
                                   agent_name: str,
                                   task: str) -> dict:
    """Call agent with LLM-selected relevant memories."""
    agent_registry = action_context.get_agent_registry()
    agent_run = agent_registry.get_agent(agent_name)

    # Get current memory and add IDs
    current_memory = action_context.get_memory()
    memory_with_ids = []
    for idx, item in enumerate(current_memory.items):
        memory_with_ids.append({
            **item,
            "memory_id": f"mem_{idx}"
        })

    # Create schema for memory selection
    selection_schema = {
        "type": "object",
        "properties": {
            "selected_memories": {
                "type": "array",
                "items": {
                    "type": "string",
                    "description": "ID of a memory to include"
                }
            },
            "reasoning": {
                "type": "string",
                "description": "Explanation of why these memories were selected"
            }
        },
        "required": ["selected_memories", "reasoning"]
    }

    # Format memories for LLM review
    memory_text = "\n".join([
        f"Memory {m['memory_id']}: {m['content']}"
        for m in memory_with_ids
    ])

    # Ask LLM to select relevant memories
    selection_prompt = f"""Review these memories and select the ones relevant for this task:

Task: {task}

Available Memories:
{memory_text}

Select memories that provide important context or information for this specific task.
Explain your selection process."""

    # Self-prompting magic to find the most relevant memories
    selection = prompt_llm_for_json(
        action_context=action_context,
        schema=selection_schema,
        prompt=selection_prompt
    )

    # Create filtered memory from selection
    filtered_memory = Memory()
    selected_ids = set(selection["selected_memories"])
    for item in memory_with_ids:
        if item["memory_id"] in selected_ids:
            # Remove the temporary memory_id before adding
            item_copy = item.copy()
            del item_copy["memory_id"]
            filtered_memory.add_memory(item_copy)

    # Run the agent with selected memories
    result_memory = agent_run(
        user_input=task,
        memory=filtered_memory
    )

    # Add results and selection reasoning to original memory
    current_memory.add_memory({
        "type": "system",
        "content": f"Memory selection reasoning: {selection['reasoning']}"
    })

    for memory_item in result_memory.items:
        current_memory.add_memory(memory_item)

    return {
        "result": result_memory.items[-1].get("content", "No result"),
        "shared_memories": len(filtered_memory.items),
        "selection_reasoning": selection["reasoning"]
    }

memories = [
    {"type": "user", "content": "We need to build a new reporting dashboard"},
    {"type": "assistant", "content": "Initial cost estimate: $50,000"},
    {"type": "user", "content": "That seems high"},
    {"type": "assistant", "content": "Breakdown: $20k development, $15k design..."},
    {"type": "system", "content": "Project deadline updated to Q3"},
    {"type": "user", "content": "Can we reduce the cost?"}
]

# LLM's selection might return:
{
    "selected_memories": ["mem_1", "mem_3", "mem_5"],
    "reasoning": "Selected memories containing cost information and the request for cost reduction, excluding project timeline and general discussion as they're not directly relevant to the budget review task."
}




## 🧠 What This Code Is Doing

### 🔹 **Agent 1 (the caller)**

Is responsible for:

1. **Retrieving all its memories**
2. **Labeling them with memory IDs**
3. **Prompting the LLM** to decide which of those memories are relevant for the upcoming `task`
4. **Creating a filtered memory context** based on that selection
5. **Calling Agent 2** with only the selected context

---

### 🔹 **Agent 2 (the receiver)**

Is handed:

* A **clean, minimal memory** tailored to the current task
* This lets Agent 2 focus exclusively on the **most relevant** info

It then:

* **Performs its task**
* **Returns a result**
* Optionally contributes new memory entries based on its result

---

### 🧠 Why This Pattern Matters

| Feature                    | Benefit                                                    |
| -------------------------- | ---------------------------------------------------------- |
| **Memory Selection**       | Context is curated intelligently using LLM reasoning       |
| **Minimized Token Use**    | Only the relevant context is passed — saving cost          |
| **Reduced Cognitive Load** | Agent 2 isn’t overwhelmed with irrelevant detail           |
| **Auditability**           | The reason for memory selection is logged for traceability |
| **Flexibility**            | Works for many agent-to-agent delegation scenarios         |

---

### 📌 Analogy

Think of Agent 1 as a manager assigning a task to a specialist (Agent 2). Before doing so, the manager *summarizes the key background documents* needed for that specific task — instead of dumping the entire file cabinet.




This tool is doing **multiple important things** as part of a *complex and highly valuable behavior pattern*. Think of it as a **multi-phase utility**, tightly scoped to a single, clear purpose:

> **“Call another agent, but only share the most relevant memories for this specific task.”**

Under the hood, it follows a **multi-step internal process**. Think of it like:

* An envelope-packer: it gathers what needs to be sent
* A judge: it selects what's important
* A delegate: it sends that off to another agent
* A reporter: it logs the outcome

---

## 🔍 What Is This Tool Doing Step-by-Step?

Here’s a breakdown of the logical parts inside the tool:

---

### **1. Get and Tag Memory**

```python
current_memory = action_context.get_memory()
...
"memory_id": f"mem_{idx}"
```

It assigns unique IDs (`mem_0`, `mem_1`, etc.) to each memory so the LLM can reference them.

---

### **2. Define a JSON Schema for LLM to Use**

```python
selection_schema = { ... }
```

Tells the LLM *how* to respond: return a list of selected memory IDs **and** explain why.

---

### **3. Prompt the LLM to Select the Right Memories**

```python
selection = prompt_llm_for_json(...)
```

Uses `prompt_llm_for_json` (a structured prompt function) to:

* Show all memories
* Describe the task
* Ask: *Which memories matter most for this task?*

---

### **4. Filter the Memory**

```python
if item["memory_id"] in selected_ids:
```

Builds a **filtered `Memory()` instance** with *only* the chosen memories.

---

### **5. Run the Agent with Filtered Memory**

```python
result_memory = agent_run(user_input=task, memory=filtered_memory)
```

Delegates the task to another agent, but only gives it the relevant memory, *not everything*.

---

### **6. Log the Reasoning and Results**

```python
current_memory.add_memory(...)
```

Adds both:

* The **LLM’s reasoning** for selecting the memories
* The **output** from the other agent

---

## 🧠 Why This Is Awesome

This tool gives your agent:

* **Precision sharing**: Only what matters gets passed along
* **Transparency**: You know *why* each memory was selected
* **Scalability**: You avoid memory bloat and token overload
* **Kindness**: You’re not dumping a novel on another agent

---

## ✅ Summary

| Feature                             | Role                                    |
| ----------------------------------- | --------------------------------------- |
| `@register_tool(...)`               | Declares it as a single callable tool   |
| Internal memory tagging             | Prepares content for smart selection    |
| JSON schema + `prompt_llm_for_json` | Gives LLM structured decision framework |
| Memory filtering                    | Ensures clean, tight context            |
| Final memory updates                | Keeps everything traceable              |






### 🔧 Key Code Highlights (with explanations)

#### 1. **Assigning IDs to Memories**

```python
for idx, item in enumerate(current_memory.items):
    memory_with_ids.append({
        **item,
        "memory_id": f"mem_{idx}"
    })
```

> ✅ **Why it matters:**
> Gives each memory a unique `memory_id`, which is **crucial for selection** later. Without identifiers, the LLM can’t reference specific items reliably. This step turns opaque memory into something structured and addressable.

---

#### 2. **Prompting the LLM to Select Relevant Memories**

```python
selection_prompt = f"""Review these memories and select the ones relevant for this task:

Task: {task}

Available Memories:
{memory_text}

Select memories that provide important context or information for this specific task.
Explain your selection process."""
```

> ✅ **Why it matters:**
> This is the **heart of selective memory sharing**. You’re asking the LLM to apply reasoning and **filter** only what's helpful.
> You're treating the LLM like a smart assistant that curates context for downstream use — a key modern pattern.

---

#### 3. **Validating the LLM Output with a Schema**

```python
selection_schema = {
    "type": "object",
    "properties": {
        "selected_memories": {
            "type": "array",
            "items": {
                "type": "string",
                "description": "ID of a memory to include"
            }
        },
        "reasoning": {
            "type": "string",
            "description": "Explanation of why these memories were selected"
        }
    },
    "required": ["selected_memories", "reasoning"]
}
```

> ✅ **Why it matters:**
> This schema ensures the LLM gives **structured, predictable output** — critical when tools depend on downstream parsing.
> You **avoid ambiguous free-form replies** and **gain explainability** through the `reasoning` field.

---

#### 4. **Creating Filtered Memory Based on Selected IDs**

```python
filtered_memory = Memory()
selected_ids = set(selection["selected_memories"])
for item in memory_with_ids:
    if item["memory_id"] in selected_ids:
        item_copy = item.copy()
        del item_copy["memory_id"]
        filtered_memory.add_memory(item_copy)
```

> ✅ **Why it matters:**
> Builds a **custom memory block** to send to the target agent.
> This ensures the second agent receives **only the context it needs**, nothing more — reducing token load and improving focus.

---

#### 5. **Passing Filtered Memory to the Invoked Agent**

```python
result_memory = agent_run(
    user_input=task,
    memory=filtered_memory
)
```

> ✅ **Why it matters:**
> This is the actual **handoff with scoped memory**.
> Agent 2 gets to work using the **intelligently filtered** context.
> The call is modular and clean.

---

#### 6. **Storing the Reasoning in the Original Agent’s Memory**

```python
current_memory.add_memory({
    "type": "system",
    "content": f"Memory selection reasoning: {selection['reasoning']}"
})
```

> ✅ **Why it matters:**
> Builds **transparency** and **traceability** into the system.
> Future debugging, audits, or higher-level reasoning agents can **understand why** certain information was shared.

---

### 🔚 Summary

Focus on these **6 areas** because they teach you how to:

* Structure unstructured memory for LLMs to reason over
* Use the LLM not just to act, but to **choose context**
* Safely and intelligently hand off tasks between agents
* Reduce token load while **increasing explainability**
* Maintain **clean modular design** in multi-agent systems






## 🧠 How Does the LLM Know What’s in Each Memory?

It doesn’t just see the **memory ID** (`mem_0`, `mem_1`, etc.).

Instead, it sees **both**:

* The **ID** (`mem_0`)
* The **content** of that memory

This happens here in the code:

```python
memory_text = "\n".join([
    f"Memory {m['memory_id']}: {m['content']}"
    for m in memory_with_ids
])
```

So the **prompt looks something like this**:

```
Review these memories and select the ones relevant for this task:

Task: Schedule a follow-up call with the client about the contract.

Available Memories:
Memory mem_0: "Had a meeting with the client last week"
Memory mem_1: "Client is waiting on the updated contract draft"
Memory mem_2: "Lunch with marketing team scheduled Thursday"
...
```

Then it asks:

> Select memories that provide important context or information for this specific task. Explain your selection process.

---

## 🔍 What’s the Role of the Memory ID?

The **`memory_id`** is just a reference tag — like a label on a file folder.
The LLM uses it to:

* Refer to a memory unambiguously
* Structure its response like:

  ```json
  {
    "selected_memories": ["mem_0", "mem_1"],
    "reasoning": "These relate to the client and contract context"
  }
  ```

So later, when we filter:

```python
if item["memory_id"] in selected_ids:
```

…it knows *exactly which full memories to pass on* — because it already **read the content**, not just the IDs.

---

## ✅ Why This Works

| Element            | Purpose                               |
| ------------------ | ------------------------------------- |
| `memory_id`        | Unambiguous label for reference       |
| `content`          | What the LLM uses to decide relevance |
| `selection_prompt` | Guides the LLM to choose what matters |
| `schema`           | Ensures a clean, parseable response   |

---

## 🧠 Bonus: Why Use IDs at All?

You might wonder, *why not just select the content directly?*

Because:

* IDs make referencing easier and reduce chance of LLM hallucinating long text
* IDs keep the response **clean**, short, and deterministic
* You can map those IDs back to memory items efficiently in code




## 🧠 Selective Memory Sharing in Action

The second agent receives **only the memories related to:**

* Cost estimates
* Budget breakdowns
* The request for reduction

This gives the budget review agent **focused, high-signal context** — without distracting information about timelines, personnel, or unrelated project details.

---

## ✅ Why This Beats Rule-Based Filtering

This approach has several **clear advantages** over traditional, static filtering:

1. **Contextual Understanding**
   The LLM selects memories based on meaning and relevance — not just keyword matching or rigid rules.

2. **Preserved Reasoning**
   The LLM explains *why* certain memories were selected. This adds transparency and helps you debug or improve selection later.

3. **Adaptive & Flexible**
   You don’t need to rewrite filtering logic for each new type of task. The LLM adapts its selection to the context at hand.

4. **Audit Trail**
   The original agent keeps a record of:

   * What was shared
   * Why it was shared
     This is especially useful in sensitive workflows.

---

## 💡 Example Use Case

> A **project planning agent** asks a **budget specialist** to review cost overruns.

Instead of handing over the **entire** project history, the agent shares:

* Resource allocation notes
* Previous expense approvals
* Any recent cost change requests

🎯 This lets the budget agent focus **only on financials** — without getting bogged down in unrelated details.



## 🧠 Recap of the Four Memory Sharing Patterns

Agent collaboration can be designed around different **memory-sharing strategies**, depending on the task’s complexity, risk, and communication goals.

### 1. 📨 Message Passing

**“Just the answer, please.”**

* **Purpose:** Keep interactions clean and minimal.
* **Mechanism:** The second agent works in isolation and returns only the final result.
* **Use When:** You don’t need to understand *how* the answer was derived.

> ✅ Great for low-stakes, modular tasks.

---

### 2. 🔍 Memory Reflection

**“Tell me what you did and how.”**

* **Purpose:** Learn from the invoked agent's reasoning process.
* **Mechanism:** After task completion, all of the second agent’s memory is copied back to the first.
* **Use When:** You want the first agent to *learn* or *audit* the reasoning used.

> ✅ Great for transparency, training, and debugging.

---

### 3. 🔁 Memory Handoff

**“Take over from here.”**

* **Purpose:** Allow one agent to fully continue where another left off.
* **Mechanism:** Full memory is *transferred* and shared.
* **Use When:** You want seamless continuation of a task with full context.

> ✅ Great for long-running tasks or handoffs in pipelines (e.g., customer service → technical support).

---

### 4. 🎯 Selective Memory Sharing

**“Here’s only what you need to know.”**

* **Purpose:** Share *just the relevant* context based on task.
* **Mechanism:** LLM chooses specific memories to share and explains its reasoning.
* **Use When:** You want to reduce noise, save tokens, and improve focus.

> ✅ Great for bandwidth optimization, clarity, and task-scoped reasoning.

---

## 🎯 Choosing the Right Pattern

Ask yourself:

* ❓ **How much context** does the receiving agent need?
* ❓ Should the **calling agent understand** the reasoning of the second?
* ❓ Do you want to **preserve or reset** conversation history?
* ❓ Is there any **sensitive or irrelevant** information that must be filtered?

---

## 💡 Final Insights

* These patterns aren't mutually exclusive — real systems often use **hybrids** depending on the sensitivity, task type, and agent role.
* Managing memory is not just about cost — it’s about **precision, reliability, and safety**.
* Architecting agents with **clear memory boundaries and deliberate communication** makes your system more scalable, testable, and robust.
* And yes — **being kind to your LLMs** through focused context and modularity *really does* improve results. Treat your agents like collaborators, not calculators.

