# Foundry Agent Memory

> **Author:** Ozgur Guler | AI Solution Leader, AI Innovation Hub
> **Contact:** [ozgur.guler1@gmail.com](mailto:ozgur.guler1@gmail.com)
> **© 2025 Ozgur Guler. All rights reserved.**

---

This notebook demonstrates how to add persistent memory to agents in Azure AI Foundry.

## What is Agent Memory?

Memory in Foundry Agent Service is a managed, long-term memory solution that enables:
- **Agent continuity** across sessions, devices, and workflows
- **User preference retention** (e.g., "I prefer dark roast coffee")
- **Personalized experiences** without repeating information

## How Memory Works

Memory is attached to agents using the **MemorySearchTool**:
1. Create a **Memory Store** - container for memories with chat + embedding models
2. Create a **MemorySearchTool** - tool that reads/writes to the memory store
3. Attach the tool to a **Prompt Agent** - agent with memory capabilities
4. Use **Conversations API** to interact with the agent

## Prerequisites

Before running this notebook:

1. **Azure CLI authenticated**: Run `az login` in your terminal
2. **Azure AI Foundry project**: Created via Azure Portal or `azd`
3. **Chat model deployment**: e.g., `gpt-4.1` or `gpt-5-nano`
4. **Embedding model deployment**: e.g., `text-embedding-3-small` (notebook can deploy this)
5. **Python packages**: `azure-ai-projects`, `azure-identity` (notebook installs these)

---

## Section 1: Setup and Configuration

In [None]:
# Install the preview SDK with memory support
!pip install azure-ai-projects --pre --quiet
!pip install azure-identity python-dotenv --quiet

In [None]:
import os
from dotenv import load_dotenv

# Load environment from parent directory
load_dotenv("../.env")

# Configuration
FOUNDRY_ACCOUNT = os.getenv("FOUNDRY_ACCOUNT_NAME", "ozgurguler-7212-resource")
PROJECT_NAME = os.getenv("FOUNDRY_PROJECT_NAME", "ozgurguler-7212")
PROJECT_ENDPOINT = f"https://{FOUNDRY_ACCOUNT}.services.ai.azure.com/api/projects/{PROJECT_NAME}"

# Model deployments for memory
CHAT_MODEL = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-5-nano")
EMBEDDING_MODEL = os.getenv("AZURE_TEXT_EMBEDDING_DEPLOYMENT_NAME", "text-embedding-3-small")

HOSTED_AGENT_NAME = "my-hosted-agent"
MEMORY_STORE_NAME = "hosted-agent-memory"

print(f"Project Endpoint: {PROJECT_ENDPOINT}")
print(f"Chat Model: {CHAT_MODEL}")
print(f"Embedding Model: {EMBEDDING_MODEL}")
print(f"Memory Store: {MEMORY_STORE_NAME}")

In [None]:
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient

# Initialize the client
credential = DefaultAzureCredential()
client = AIProjectClient(endpoint=PROJECT_ENDPOINT, credential=credential)

print("AIProjectClient initialized successfully")

---

## Section 1b: Verify Model Deployments

Memory requires both a **chat model** and an **embedding model**. Let's verify they exist.

In [None]:
import subprocess
import json

# Check existing model deployments
print("Checking model deployments...")

RESOURCE_GROUP = os.getenv("AZURE_RESOURCE_GROUP", "rg-ozgurguler-7212")
COGNITIVE_ACCOUNT = FOUNDRY_ACCOUNT  # Same as AI Services account

result = subprocess.run(
    ["az", "cognitiveservices", "account", "deployment", "list",
     "--name", COGNITIVE_ACCOUNT,
     "--resource-group", RESOURCE_GROUP,
     "-o", "json"],
    capture_output=True, text=True
)

if result.returncode == 0:
    deployments = json.loads(result.stdout)
    print(f"Found {len(deployments)} deployments:\n")
    
    chat_found = False
    embedding_found = False
    
    for d in deployments:
        name = d.get("name", "")
        model = d.get("properties", {}).get("model", {}).get("name", "")
        print(f"  - {name}: {model}")
        
        if CHAT_MODEL in name or "gpt" in model.lower():
            chat_found = True
        if EMBEDDING_MODEL in name or "embedding" in model.lower():
            embedding_found = True
    
    print(f"\n✅ Chat model ({CHAT_MODEL}): {'Found' if chat_found else 'NOT FOUND'}")
    print(f"✅ Embedding model ({EMBEDDING_MODEL}): {'Found' if embedding_found else 'NOT FOUND'}")
    
    if not embedding_found:
        print("\n⚠️  Embedding model not found! Run the next cell to deploy it.")
else:
    print(f"Error checking deployments: {result.stderr}")

In [None]:
# Deploy embedding model if not found
# This is REQUIRED for memory to work

DEPLOY_EMBEDDING = False  # Set to True if embedding model is missing

if DEPLOY_EMBEDDING:
    print(f"Deploying {EMBEDDING_MODEL}...")
    
    result = subprocess.run([
        "az", "cognitiveservices", "account", "deployment", "create",
        "--name", COGNITIVE_ACCOUNT,
        "--resource-group", RESOURCE_GROUP,
        "--deployment-name", EMBEDDING_MODEL,
        "--model-name", "text-embedding-3-small",
        "--model-version", "1",
        "--model-format", "OpenAI",
        "--capacity", "10",
        "--sku", "GlobalStandard"
    ], capture_output=True, text=True, timeout=300)
    
    if result.returncode == 0:
        print(f"✅ Successfully deployed {EMBEDDING_MODEL}")
    else:
        print(f"❌ Error: {result.stderr}")
else:
    print("Embedding deployment skipped (DEPLOY_EMBEDDING = False)")
    print("Set DEPLOY_EMBEDDING = True if the embedding model is missing")

---

## Section 2: Create a Memory Store

A memory store is the container for all memories. It defines:
- Which models process memory (chat + embedding)
- What types of memory to extract (user profile, chat summaries)
- What information is relevant to store

In [None]:
from azure.ai.projects.models import (
    MemoryStoreDefaultDefinition,
    MemoryStoreDefaultOptions,
)

# Check if memory store already exists
existing_stores = client.memory_stores.list()
store_exists = any(s.name == MEMORY_STORE_NAME for s in existing_stores.data)

if store_exists:
    print(f"Memory store '{MEMORY_STORE_NAME}' already exists")
    memory_store = client.memory_stores.get(MEMORY_STORE_NAME)
else:
    # Configure memory options
    options = MemoryStoreDefaultOptions(
        chat_summary_enabled=True,  # Store summaries of conversations
        user_profile_enabled=True,  # Store user preferences/info
        user_profile_details="Store user preferences, interests, and relevant context. Avoid sensitive personal data."
    )

    # Create the memory store
    definition = MemoryStoreDefaultDefinition(
        chat_model=CHAT_MODEL,
        embedding_model=EMBEDDING_MODEL,
        options=options
    )

    memory_store = client.memory_stores.create(
        name=MEMORY_STORE_NAME,
        definition=definition,
        description="Memory store for hosted chat agent"
    )
    print(f"Created memory store: {memory_store.name}")

print(f"\nMemory Store Details:")
print(f"  Name: {memory_store.name}")
print(f"  Description: {memory_store.description}")

---

## Section 3: Test Memory APIs Directly

Before integrating with the hosted agent, let's test the memory APIs directly.

In [None]:
from azure.ai.projects.models import ResponsesUserMessageItemParam

# Define a test user scope
TEST_SCOPE = "test_user_001"

# Add some memories
print("Adding memories...")

user_message = ResponsesUserMessageItemParam(
    content="My name is Alex and I prefer concise technical answers. I work as a software engineer."
)

update_poller = client.memory_stores.begin_update_memories(
    name=MEMORY_STORE_NAME,
    scope=TEST_SCOPE,
    items=[user_message],
    update_delay=0  # Trigger immediately
)

# Wait for completion
update_result = update_poller.result()
print(f"Memory update completed with {len(update_result.memory_operations)} operations")

for op in update_result.memory_operations:
    print(f"  - {op.kind}: {op.memory_item.content[:100]}...")

In [None]:
from azure.ai.projects.models import MemorySearchOptions

# Search for memories
print("Searching memories...")

query = ResponsesUserMessageItemParam(content="What do you know about the user?")

search_result = client.memory_stores.search_memories(
    name=MEMORY_STORE_NAME,
    scope=TEST_SCOPE,
    items=[query],
    options=MemorySearchOptions(max_memories=5)
)

print(f"Found {len(search_result.memories)} memories:")
for mem in search_result.memories:
    print(f"  - [{mem.memory_item.memory_id}] {mem.memory_item.content}")

In [None]:
# Retrieve static memories (user profile) without a query
print("Retrieving static (profile) memories...")

static_result = client.memory_stores.search_memories(
    name=MEMORY_STORE_NAME,
    scope=TEST_SCOPE,
    # No items = retrieve static/profile memories
)

print(f"Found {len(static_result.memories)} profile memories:")
for mem in static_result.memories:
    print(f"  - {mem.memory_item.content}")

---

## Section 4: Create a Prompt Agent with Memory Tool

Instead of modifying the hosted agent code directly, we attach memory to a **Prompt Agent** using the `MemorySearchTool`. This is the correct pattern for Foundry Agent Service.

The agent will:
1. **At conversation start**: Automatically inject static memories
2. **During conversation**: Retrieve contextual memories per turn
3. **After response**: Update memories (debounced by update_delay)

In [None]:
from azure.ai.projects.models import (
    MemorySearchTool,
    PromptAgentDefinition,
)

# Define the user scope for memory partitioning
USER_SCOPE = "demo_user_jordan"

# Create the memory search tool
memory_tool = MemorySearchTool(
    memory_store_name=MEMORY_STORE_NAME,
    scope=USER_SCOPE,
    update_delay=1,  # Wait 1 second of inactivity before updating memories
    # In production, use higher value like 300 (5 minutes)
)

print(f"Memory Tool configured:")
print(f"  Store: {MEMORY_STORE_NAME}")
print(f"  Scope: {USER_SCOPE}")
print(f"  Update Delay: 1 second")

In [None]:
# Create a Prompt Agent with the memory search tool
AGENT_WITH_MEMORY_NAME = "chat-agent-with-memory"

# Check if agent already exists
try:
    existing_agent = client.agents.retrieve(agent_name=AGENT_WITH_MEMORY_NAME)
    print(f"Agent '{AGENT_WITH_MEMORY_NAME}' already exists (version: {existing_agent.version})")
    agent = existing_agent
except Exception:
    # Create new agent with memory tool
    agent = client.agents.create_version(
        agent_name=AGENT_WITH_MEMORY_NAME,
        definition=PromptAgentDefinition(
            model=CHAT_MODEL,  # gpt-5-nano
            instructions="""You are a helpful AI assistant with memory capabilities.
You remember user preferences and past conversations.
Use the memory tool to recall what you know about the user.
Be friendly and personalize responses based on remembered information.""",
            tools=[memory_tool],
        )
    )
    print(f"Created agent: {agent.name} (version: {agent.version})")

print(f"\nAgent Details:")
print(f"  Name: {agent.name}")
print(f"  Version: {agent.version}")
print(f"  Model: {CHAT_MODEL}")

In [None]:
# Get the OpenAI client for invoking the agent
openai_client = client.get_openai_client()
print("OpenAI client ready for agent invocation")

---

## Section 5: Test Memory with Conversations API

Now we'll test the agent with memory using the Conversations API. 

The flow is:
1. Create a conversation
2. Send messages via `responses.create()` 
3. Memory is automatically updated after inactivity

In [None]:
# Conversation 1: Introduce user preferences
print("=" * 60)
print("Conversation 1: Introducing user preferences")
print("=" * 60)

# Create a new conversation
conversation1 = openai_client.conversations.create()
print(f"Created conversation: {conversation1.id}\n")

# Send a message with user preferences
user_input = "Hi! My name is Jordan and I really love hiking and outdoor activities. I also enjoy photography."

response = openai_client.responses.create(
    input=user_input,
    conversation=conversation1.id,
    extra_body={"agent": {"name": agent.name, "type": "agent_reference"}},
)

print(f"User: {user_input}")
print(f"Agent: {response.output_text}")
print(f"\nStatus: {response.status}")

---

## Section 6: Wait for Memory Extraction

Memory updates are debounced by `update_delay`. We need to wait for inactivity before memories are stored.

In [None]:
import time

# Wait for memory to be stored (update_delay=1 + processing time)
print("Waiting 65 seconds for memory extraction and storage...")
print("(In production with update_delay=300, this happens in the background)")

for i in range(65, 0, -5):
    print(f"  {i} seconds remaining...")
    time.sleep(5)

print("Done waiting!")

In [None]:
# Conversation 2: Test memory recall in a NEW conversation
print("=" * 60)
print("Conversation 2: Testing memory recall (NEW conversation)")
print("=" * 60)

# Create a completely new conversation
conversation2 = openai_client.conversations.create()
print(f"Created NEW conversation: {conversation2.id}\n")

# Ask about what the agent remembers
user_input = "What do you know about me?"

response = openai_client.responses.create(
    input=user_input,
    conversation=conversation2.id,
    extra_body={"agent": {"name": agent.name, "type": "agent_reference"}},
)

print(f"User: {user_input}")
print(f"Agent: {response.output_text}")

# Check if agent recalled the user info
response_lower = response.output_text.lower()
if any(word in response_lower for word in ['jordan', 'hiking', 'outdoor', 'photography']):
    print("\n✅ SUCCESS: Agent remembered user information from previous conversation!")
else:
    print("\n⚠️  Agent may not have recalled memories - check if memory was stored")

In [None]:
# Conversation 3: Personalized recommendation
print("=" * 60)
print("Conversation 3: Testing personalized recommendations")
print("=" * 60)

# Continue in the same conversation
user_input = "Can you suggest an activity for this weekend?"

response = openai_client.responses.create(
    input=user_input,
    conversation=conversation2.id,
    extra_body={"agent": {"name": agent.name, "type": "agent_reference"}},
)

print(f"User: {user_input}")
print(f"Agent: {response.output_text}")

# If the agent suggests hiking/outdoor activities, memory is working!
response_lower = response.output_text.lower()
if any(word in response_lower for word in ['hik', 'outdoor', 'trail', 'nature', 'photo', 'camera']):
    print("\n✅ SUCCESS: Agent used memory to personalize recommendation!")
else:
    print("\n⚠️  Response may not be personalized based on memories")

---

## Section 7: View Stored Memories

Let's check what memories were extracted and stored.

In [ ]:
# View all memories stored for our user scope
from azure.ai.projects.models import MemorySearchOptions

print(f"Searching memories for scope: {USER_SCOPE}")
print("-" * 60)

try:
    # Search all memories for the scope
    result = client.memory_stores.search_memories(
        name=MEMORY_STORE_NAME,
        scope=USER_SCOPE,
        options=MemorySearchOptions(max_memories=10)
    )
    
    if result.memories:
        print(f"Found {len(result.memories)} memories:\n")
        for mem in result.memories:
            print(f"  ID: {mem.memory_item.memory_id}")
            print(f"  Content: {mem.memory_item.content}")
            print()
    else:
        print("No memories found yet. Memory extraction may still be processing.")
        
except Exception as e:
    print(f"Error searching memories: {e}")

In [None]:
# Also check memories from our earlier API test
print(f"Memories for test scope: {TEST_SCOPE}")
print("-" * 60)

try:
    result = client.memory_stores.search_memories(
        name=MEMORY_STORE_NAME,
        scope=TEST_SCOPE,
        options=MemorySearchOptions(max_memories=10)
    )
    
    if result.memories:
        print(f"Found {len(result.memories)} memories:\n")
        for mem in result.memories:
            print(f"  Content: {mem.memory_item.content}")
    else:
        print("No memories found for test scope.")
        
except Exception as e:
    print(f"Error: {e}")

---

## Section 8: Cleanup (Optional)

Clean up resources when done testing.

In [None]:
# Delete memories for a specific scope
DELETE_SCOPE_MEMORIES = False  # Set to True to delete

if DELETE_SCOPE_MEMORIES:
    scopes_to_delete = [TEST_SCOPE, USER_SCOPE]
    for scope in scopes_to_delete:
        try:
            client.memory_stores.delete_scope(
                name=MEMORY_STORE_NAME,
                scope=scope
            )
            print(f"Deleted memories for scope: {scope}")
        except Exception as e:
            print(f"Error deleting scope {scope}: {e}")
else:
    print("Memory cleanup skipped (DELETE_SCOPE_MEMORIES = False)")

In [None]:
# Delete the agent created for memory demo
DELETE_AGENT = False  # Set to True to delete

if DELETE_AGENT:
    try:
        client.agents.delete(agent_name=AGENT_WITH_MEMORY_NAME)
        print(f"Deleted agent: {AGENT_WITH_MEMORY_NAME}")
    except Exception as e:
        print(f"Error deleting agent: {e}")
else:
    print("Agent deletion skipped (DELETE_AGENT = False)")

In [None]:
# Delete the entire memory store (use with caution!)
DELETE_MEMORY_STORE = False  # Set to True to delete

if DELETE_MEMORY_STORE:
    try:
        result = client.memory_stores.delete(MEMORY_STORE_NAME)
        print(f"Deleted memory store: {MEMORY_STORE_NAME}")
    except Exception as e:
        print(f"Error deleting memory store: {e}")
else:
    print("Memory store deletion skipped (DELETE_MEMORY_STORE = False)")
    print("⚠️  Warning: Deleting a memory store is irreversible!")

---

## Summary

### What We Built

1. **Created a Memory Store** - Managed storage with chat + embedding models
2. **Tested Memory APIs** - Add, search, and retrieve memories directly
3. **Created a Prompt Agent with MemorySearchTool** - Agent that reads/writes memories
4. **Verified Memory Persistence** - Agent recalls info across conversations

### Key Concepts

| Concept | Description |
|---------|-------------|
| **Memory Store** | Container for memories with chat/embedding model |
| **MemorySearchTool** | Tool that attaches memory to an agent |
| **Scope** | Partition key for user-specific memories |
| **User Profile** | Static info (name, preferences) |
| **Chat Summary** | Distilled conversation topics |
| **update_delay** | Wait time before extracting memories |

### Correct Pattern for Memory

```python
# 1. Create memory store
memory_store = client.memory_stores.create(
    name="my-memory",
    definition=MemoryStoreDefaultDefinition(...)
)

# 2. Create memory search tool
tool = MemorySearchTool(
    memory_store_name=memory_store.name,
    scope="user_123",
    update_delay=60
)

# 3. Create agent with tool
agent = client.agents.create_version(
    agent_name="my-agent",
    definition=PromptAgentDefinition(
        model="gpt-4.1",
        instructions="...",
        tools=[tool]
    )
)

# 4. Use conversations API
conversation = openai_client.conversations.create()
response = openai_client.responses.create(
    input="Hello!",
    conversation=conversation.id,
    extra_body={"agent": {"name": agent.name, "type": "agent_reference"}}
)
```

### Best Practices

- Use unique scopes per user for privacy
- Set `update_delay` to 5+ minutes in production
- Use `scope={{$userId}}` for authenticated users
- Only store necessary information

### Pricing

During preview, memory features are **free**. You only pay for chat/embedding model usage.

---

## Next Steps

Continue to `../05-observability-otel-to-appinsights` for agent monitoring.

---

<div align="center">

## License & Attribution

This notebook is part of the **Azure AI Foundry Demo Repository**

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](../LICENSE)

**Original Author:** Ozgur Guler | AI Solution Leader, AI Innovation Hub

**Contact:** [ozgur.guler1@gmail.com](mailto:ozgur.guler1@gmail.com)

---

*If you use, modify, or distribute this work, you must provide appropriate credit to the original author as required by the [Apache License 2.0](../LICENSE).*

**Copyright © 2025 Ozgur Guler. All rights reserved.**

</div>