<a href="https://colab.research.google.com/github/harald-gen01/My_AI_learning_path/blob/main/Agent_Memory_Is_a_Support_Performance_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Cell 1 — Install + imports**

## TL;DR (What this Colab demonstrates)

This notebook demonstrates **agent memory as an operational performance lever** in customer support.

It runs the same “reopened ticket” scenario in two modes:

1. **Without Memory (stateless agent)**
   The agent treats the reopened issue as new, repeats discovery (asks for logs again), and follows generic troubleshooting steps.

2. **With Memory (stateful agent)**
   The agent retrieves relevant prior ticket “memory notes” (customer + issue type), reuses what was already tried, asks only **what changed**, and escalates with a complete context package when recurrence rules are met.

---

## What’s inside (plain English)

* **Procedural memory** = an approved support playbook (SOP steps + escalation rule)
* **Episodic memory** = prior ticket notes (summary, actions taken, outcome, constraints)
* **Retrieval** = memory notes are embedded and recalled via similarity search
* **Retention (TTL)** = memory expires after **N minutes** to simulate governance/forgetting
* **LLM generation (OpenAI)** = the LLM only **composes** the customer-facing message using retrieved context (it does not “know” anything magically)

---

## The point of the demo

It shows that **continuity changes behavior**:

* fewer redundant questions (less rediscovery)
* faster path to next best action
* more consistent escalation decisions
* better handoff packages (when escalation happens)

---

## KPI proxy output (what you should look at)

The notebook prints a tiny KPI delta table comparing:

* **# Questions asked** (proxy for customer back-and-forth cycles)
* **# Steps recommended** (proxy for operational effort / repeated work)
* **Repeated discovery flag** (did the agent re-ask what it already knew?)

It also shows how overly aggressive retention (TTL too short) causes the agent to revert to rediscovery.

---

## What “faster” means here (important)

This demo is **not** measuring model latency (milliseconds). “With memory” can be slower per request because it performs retrieval and writes a higher-quality response.
The KPIs are **operational speed proxies**: fewer redundant questions and repeated steps → fewer back-and-forth cycles, fewer reopens/escalations, and faster **end-to-end** time-to-resolution.

---

## How to read the results

* **TTL = 20 minutes** → memory survives → continuity preserved → less rediscovery
* **TTL = 5 minutes** → memory expires → agent reverts to rediscovery → KPI benefit disappears

**Executive takeaway:** Memory improves support performance, but only if retention and governance are designed intentionally.


In [None]:
!pip -q install --upgrade \
  "openai>=1.0.0" \
  "pandas==2.2.2" \
  "numpy==1.26.4" \
  "scikit-learn==1.4.2" \
  "scipy==1.11.4"

# IMPORTANT: To avoid version issues, restart runtime after changing numpy/pandas
import os, signal
os.kill(os.getpid(), signal.SIGKILL)


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.0/18.0 MB[0m [31m33.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.2/12.2 MB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m35.8/35.8 MB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tsfresh 0.21.1 requires scipy>=1.14.0; python_version >= "3.10", but you have scipy 1.11.4 which is i

In [1]:
import os
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from sklearn.metrics.pairwise import cosine_similarity
from openai import OpenAI


**Set your OpenAI API key**

In [2]:
from google.colab import userdata
from openai import OpenAI

client = OpenAI(api_key=userdata.get("openai_api_key"))


**Configuration (models + demo knobs)**

Choose models. If you don’t know which embedding model to use, keep defaults and adjust later.

In [3]:
# Text generation model (Responses API)
GEN_MODEL = "gpt-4o-mini"  # default shown in OpenAI docs; change if your account uses a different one
# Embedding model (pick one you have access to; change if needed)
EMBED_MODEL = "text-embedding-3-large"  # common OpenAI embedding model name (change if needed)

# Demo knobs
NOW = datetime(2026, 2, 13, 12, 0, 0)          # fixed time for repeatable outputs
TOP_K = 3                                       # retrieval depth
USE_LLM_TO_WRITE_REPLY = True                   # set False to keep it fully rule-based


**Seed dataset (tickets = episodic memory notes)**

Includes one “recent” memory and one older one so TTL expiry is visible.

In [4]:
np.random.seed(7)

tickets = [
    {
        "ticket_id": "T-1001",
        "customer": "AcmeCo",
        "issue_type": "intermittent_api_timeouts",
        "created_at": NOW - timedelta(minutes=12),  # recent memory (survives TTL=20)
        "summary": "Intermittent API timeouts during peak traffic. Collected logs. Suggested retry with jitter and increased client timeout.",
        "actions": "Collected logs; enabled retry with exponential backoff+jitter; increased client timeout from 3s to 6s",
        "outcome": "Improved but not fully resolved",
        "constraints": "Cannot change gateway until Q3",
        "playbook_used": "PB-API-Timeouts-v2",
        "escalation": "No"
    },
    {
        "ticket_id": "T-0990",
        "customer": "AcmeCo",
        "issue_type": "intermittent_api_timeouts",
        "created_at": NOW - timedelta(minutes=55),  # older (expires if TTL=20)
        "summary": "Older AcmeCo timeout case. Recommended generic retries, no traces captured.",
        "actions": "Recommended generic retries; no tracing",
        "outcome": "Unknown",
        "constraints": "Cannot change gateway until Q3",
        "playbook_used": "PB-API-Timeouts-v1",
        "escalation": "No"
    },
    {
        "ticket_id": "T-1044",
        "customer": "BetaBank",
        "issue_type": "auth_token_expiry",
        "created_at": NOW - timedelta(minutes=40),
        "summary": "Auth token expiry causing 401s. Identified clock skew. Recommended NTP sync and token refresh margin.",
        "actions": "Verified clock skew; enabled NTP; set refresh margin to 90s",
        "outcome": "Resolved",
        "constraints": "High compliance; no debug logs in prod",
        "playbook_used": "PB-Auth-401-v1",
        "escalation": "No"
    },
]

# Add a few synthetic tickets for retrieval variety
for i in range(12):
    customer = np.random.choice(["AcmeCo","GammaShop","DeltaTel","AcmeCo"])
    issue_type = np.random.choice(["intermittent_api_timeouts","auth_token_expiry"], p=[0.75, 0.25])
    created_at = NOW - timedelta(minutes=np.random.randint(5, 120))

    if issue_type == "intermittent_api_timeouts":
        summary = "Intermittent API timeouts observed. Reviewed metrics and logs. Recommended retry with jitter and checking upstream latency."
        actions = "Reviewed metrics; collected traces; suggested retry+jitter; recommended upstream latency check"
        playbook = "PB-API-Timeouts-v2"
    else:
        summary = "401 spikes due to token expiry edge-case. Suggested refresh margin and clock sync verification."
        actions = "Verified clock skew; adjusted refresh margin; recommended NTP"
        playbook = "PB-Auth-401-v1"

    tickets.append({
        "ticket_id": f"T-2{i:03d}",
        "customer": customer,
        "issue_type": issue_type,
        "created_at": created_at,
        "summary": summary,
        "actions": actions,
        "outcome": np.random.choice(["Resolved","Improved but not fully resolved","No change"]),
        "constraints": np.random.choice(["Cannot change gateway until Q3","Limited maintenance window","No constraint noted"]),
        "playbook_used": playbook,
        "escalation": np.random.choice(["No","No","Yes"])
    })

df = pd.DataFrame(tickets).sort_values("created_at", ascending=False)
df.head(8)


Unnamed: 0,ticket_id,customer,issue_type,created_at,summary,actions,outcome,constraints,playbook_used,escalation
8,T-2005,GammaShop,intermittent_api_timeouts,2026-02-13 11:51:00,Intermittent API timeouts observed. Reviewed m...,Reviewed metrics; collected traces; suggested ...,No change,Limited maintenance window,PB-API-Timeouts-v2,No
0,T-1001,AcmeCo,intermittent_api_timeouts,2026-02-13 11:48:00,Intermittent API timeouts during peak traffic....,Collected logs; enabled retry with exponential...,Improved but not fully resolved,Cannot change gateway until Q3,PB-API-Timeouts-v2,No
2,T-1044,BetaBank,auth_token_expiry,2026-02-13 11:20:00,Auth token expiry causing 401s. Identified clo...,Verified clock skew; enabled NTP; set refresh ...,Resolved,High compliance; no debug logs in prod,PB-Auth-401-v1,No
6,T-2003,DeltaTel,auth_token_expiry,2026-02-13 11:11:00,401 spikes due to token expiry edge-case. Sugg...,Verified clock skew; adjusted refresh margin; ...,Improved but not fully resolved,Cannot change gateway until Q3,PB-Auth-401-v1,No
5,T-2002,AcmeCo,intermittent_api_timeouts,2026-02-13 11:07:00,Intermittent API timeouts observed. Reviewed m...,Reviewed metrics; collected traces; suggested ...,Resolved,Cannot change gateway until Q3,PB-API-Timeouts-v2,Yes
1,T-0990,AcmeCo,intermittent_api_timeouts,2026-02-13 11:05:00,Older AcmeCo timeout case. Recommended generic...,Recommended generic retries; no tracing,Unknown,Cannot change gateway until Q3,PB-API-Timeouts-v1,No
7,T-2004,AcmeCo,intermittent_api_timeouts,2026-02-13 10:54:00,Intermittent API timeouts observed. Reviewed m...,Reviewed metrics; collected traces; suggested ...,Resolved,No constraint noted,PB-API-Timeouts-v2,No
9,T-2006,AcmeCo,intermittent_api_timeouts,2026-02-13 10:51:00,Intermittent API timeouts observed. Reviewed m...,Reviewed metrics; collected traces; suggested ...,Resolved,No constraint noted,PB-API-Timeouts-v2,No


**Retention expiry (TTL in minutes)**

In [5]:
def apply_retention(df, now, ttl_minutes):
    cutoff = now - timedelta(minutes=ttl_minutes)
    kept = df[df["created_at"] >= cutoff].copy()
    dropped = df[df["created_at"] < cutoff].copy()
    return kept, dropped, cutoff

**Build “memory notes” text (for embeddings)**

In [6]:
def memory_note_text(row):
    # This is your "atomic note" text for embedding retrieval
    return (
        f"customer: {row['customer']}\n"
        f"issue_type: {row['issue_type']}\n"
        f"summary: {row['summary']}\n"
        f"actions: {row['actions']}\n"
        f"outcome: {row['outcome']}\n"
        f"constraints: {row['constraints']}\n"
        f"playbook_used: {row['playbook_used']}\n"
    )

def build_memory_notes(df_kept):
    df_kept = df_kept.copy()
    df_kept["note_text"] = df_kept.apply(memory_note_text, axis=1)
    return df_kept


**OpenAI embeddings helper**

In [7]:
def embed_texts(texts, model=EMBED_MODEL):
    # Returns list of vectors
    resp = client.embeddings.create(model=model, input=texts)
    return [d.embedding for d in resp.data]


**Build embedding index (with TTL applied)**

In [8]:
def build_embedding_index(df_kept):
    df_notes = build_memory_notes(df_kept)

    texts = df_notes["note_text"].tolist()
    if len(texts) == 0:
        # No memory survives TTL → return empty index
        return df_notes, np.zeros((0, 1), dtype=np.float32)

    vectors = embed_texts(texts, model=EMBED_MODEL)
    M = np.array(vectors, dtype=np.float32)
    return df_notes, M

    return df_notes, M

def retrieve_top_k(df_notes, M, query_text, top_k=TOP_K):
    # Always return a frame with a 'similarity' column (even if empty)
    base_cols = list(df_notes.columns)
    if "similarity" not in base_cols:
        base_cols.append("similarity")

    if df_notes.empty or M.shape[0] == 0:
        return pd.DataFrame(columns=base_cols)

    q_vec = np.array(embed_texts([query_text], model=EMBED_MODEL)[0], dtype=np.float32).reshape(1, -1)
    sims = cosine_similarity(q_vec, M).flatten()

    idx = np.argsort(-sims)[:top_k]
    out = df_notes.iloc[idx].copy()
    out["similarity"] = sims[idx]
    return out.sort_values("similarity", ascending=False)



In [9]:
# --- Procedural memory (approved playbooks) ---
PLAYBOOKS = {
    "intermittent_api_timeouts": {
        "id": "PB-API-Timeouts-v2",
        "steps": [
            "Confirm timeframe + traffic pattern (peak vs off-peak).",
            "Check client-side timeouts and retry strategy (backoff + jitter).",
            "Inspect upstream latency and error rates; capture traces.",
            "If recurring within 14 days, escalate to SRE with prior evidence."
        ],
        "escalation_rule": "If same issue repeats within 14 days for same customer → escalate to SRE"
    },
    "auth_token_expiry": {
        "id": "PB-Auth-401-v1",
        "steps": [
            "Confirm timeframe + scope of 401s (specific endpoints?).",
            "Check clock skew and NTP status.",
            "Increase token refresh margin and verify token TTL handling.",
            "Escalate if persists after NTP + refresh margin changes."
        ],
        "escalation_rule": "Escalate if persists after NTP + refresh changes"
    }
}

print("✅ PLAYBOOKS loaded:", list(PLAYBOOKS.keys()))


✅ PLAYBOOKS loaded: ['intermittent_api_timeouts', 'auth_token_expiry']


In [10]:
assert "intermittent_api_timeouts" in PLAYBOOKS

**Agent behaviors (without vs with memory)**

Includes KPI proxies and an optional LLM-written reply.

In [11]:
def agent_without_memory(new_ticket):
    pb = PLAYBOOKS.get(new_ticket["issue_type"])
    questions = [
        "Can you share logs and timestamps of the timeouts?",
        "Did you change anything recently in your client or network?",
        "What is your current client timeout setting and retry behavior?"
    ]
    steps = pb["steps"][:3] if pb else ["Collect logs and investigate."]
    escalation = "Escalate" if new_ticket.get("repeat_within_14d") else "No escalation"

    reply = None
    if USE_LLM_TO_WRITE_REPLY:
        reply = llm_compose_reply(
            customer=new_ticket["customer"],
            issue_type=new_ticket["issue_type"],
            retrieved_memories=None,
            playbook=pb,
            questions=questions,
            steps=steps,
            escalation=escalation,
            style="without_memory"
        )

    return {
        "mode": "without_memory",
        "questions": questions,
        "steps": steps,
        "escalation": escalation,
        "reply": reply,
        "kpi_questions_count": len(questions),
        "kpi_steps_count": len(steps),
        "kpi_repeated_discovery": 1
    }

def agent_with_memory(new_ticket, df_notes, M, top_k=TOP_K):
    pb = PLAYBOOKS.get(new_ticket["issue_type"])

    query_text = (
        f"customer: {new_ticket['customer']}\n"
        f"issue_type: {new_ticket['issue_type']}\n"
        f"customer_report: {new_ticket['description']}\n"
    )

    retrieved = retrieve_top_k(df_notes, M, query_text, top_k=top_k)

    # Prefer same customer + issue_type
    same = retrieved[
        (retrieved["customer"] == new_ticket["customer"]) &
        (retrieved["issue_type"] == new_ticket["issue_type"])
    ]

    questions, steps = [], []
    escalation = "No escalation"
    repeated_discovery = 0

    if not same.empty:
        best = same.iloc[0]
        questions.append(
            "Last time we tried retry with backoff+jitter and increased client timeout. "
            "What changed since then (traffic pattern, deployments, upstream latency)?"
        )
        questions.append("Are timeouts happening with the same endpoints and during the same peak windows?")

        steps.append(f"Reuse prior context: {best['actions']} (outcome: {best['outcome']})")
        if pb:
            steps.extend(pb["steps"][1:3])

        if new_ticket.get("repeat_within_14d"):
            escalation = "Escalate to SRE (include prior ticket summary + evidence)"
            steps.append("Prepare escalation package: prior summary, traces/metrics, attempted mitigations, constraints.")
    else:
        repeated_discovery = 1
        questions = [
            "Can you share logs and timestamps of the timeouts?",
            "What is your current client timeout and retry strategy?"
        ]
        steps = pb["steps"][:3] if pb else ["Collect logs and investigate."]
        if new_ticket.get("repeat_within_14d"):
            escalation = "Escalate (recurrence detected; limited historical context available)"

    reply = None
    if USE_LLM_TO_WRITE_REPLY:
        reply = llm_compose_reply(
            customer=new_ticket["customer"],
            issue_type=new_ticket["issue_type"],
            retrieved_memories=retrieved,
            playbook=pb,
            questions=questions,
            steps=steps,
            escalation=escalation,
            style="with_memory"
        )

    # Safe export of retrieved memories (works even if retrieved is empty or missing columns)
    cols = ["ticket_id","customer","issue_type","created_at","outcome","constraints","similarity"]
    retrieved_memories = retrieved.reindex(columns=cols).to_dict(orient="records")

    return {
        "mode": "with_memory",
        "retrieved_memories": retrieved_memories,
        "questions": questions,
        "steps": steps,
        "escalation": escalation,
        "reply": reply,
        "kpi_questions_count": len(questions),
        "kpi_steps_count": len(steps),
        "kpi_repeated_discovery": repeated_discovery
    }

**LLM response composer (OpenAI Responses API)**

This uses client.responses.create(...) (official pattern).

https://developers.openai.com/api/reference/python/?utm_source=chatgpt.com

In [12]:
# Debug patch: log which OpenAI methods are being called
orig_chat = client.chat.completions.create
orig_embed = client.embeddings.create

def chat_create_debug(*args, **kwargs):
    print("✅ OpenAI CALL: chat.completions.create")
    return orig_chat(*args, **kwargs)

def embed_create_debug(*args, **kwargs):
    print("✅ OpenAI CALL: embeddings.create")
    # show whether input is string or list, without printing contents
    inp = kwargs.get("input", None)
    if inp is not None:
        print("   embeddings input type:", type(inp), "len:", (len(inp) if hasattr(inp, "__len__") else "n/a"))
    return orig_embed(*args, **kwargs)

client.chat.completions.create = chat_create_debug
client.embeddings.create = embed_create_debug

print("✅ Debug patch installed")


✅ Debug patch installed


In [13]:


def llm_compose_reply(customer, issue_type, retrieved_memories, playbook, questions, steps, escalation, style):
    memory_snippet = ""
    if retrieved_memories is not None and len(retrieved_memories) > 0:
        top = retrieved_memories.head(2)
        memory_lines = []
        for _, r in top.iterrows():
            memory_lines.append(
                f"- Prior ticket {r['ticket_id']} ({r['created_at']:%Y-%m-%d %H:%M}), outcome={r['outcome']}, constraints={r['constraints']}"
            )
        memory_snippet = "\n".join(memory_lines)
    else:
        memory_snippet = "- No prior memory available"

    playbook_id = playbook["id"] if playbook else "N/A"
    playbook_steps = "\n".join([f"- {s}" for s in (playbook["steps"] if playbook else [])])

    prompt = f"""
You are a senior enterprise support agent. Write a concise customer-facing message.

Context:
- Customer: {customer}
- Issue type: {issue_type}
- Mode: {style}
- Retrieved memory:
{memory_snippet}

- Approved playbook: {playbook_id}
- Playbook steps:
{playbook_steps}

What we will ask:
{chr(10).join([f"- {q}" for q in questions])}

What we will do:
{chr(10).join([f"- {s}" for s in steps])}

Escalation decision: {escalation}

Rules:
- Be professional, crisp, and action-oriented.
- Do NOT mention embeddings, vector search, or "memory systems".
- Keep under ~180 words.
""".strip()

    resp = client.chat.completions.create(
        model=GEN_MODEL,
        messages=[
            {"role": "system", "content": "You write enterprise support responses."},
            {"role": "user", "content": prompt},
        ],
        temperature=0.2,
    )
    return resp.choices[0].message.content



In [14]:
import inspect, re

def find_responses_usage():
    suspects = []
    for name, obj in globals().items():
        if callable(obj):
            try:
                src = inspect.getsource(obj)
            except Exception:
                continue
            if "responses.create" in src or "client.responses" in src:
                suspects.append(name)
    return suspects

suspects = find_responses_usage()
suspects

['find_responses_usage']

**Run the demo with two TTL values (20 min vs 5 min)**

In [15]:
new_ticket = {
    "customer": "AcmeCo",
    "issue_type": "intermittent_api_timeouts",
    "description": "Timeouts are back during peak traffic. We already tried retries last time.",
    "repeat_within_14d": True
}

def run_demo(ttl_minutes):
    df_kept, df_dropped, cutoff = apply_retention(df, NOW, ttl_minutes)

    # Build embedding index on kept memories only
    df_notes, M = build_embedding_index(df_kept)

    out_no = agent_without_memory(new_ticket)
    out_yes = agent_with_memory(new_ticket, df_notes, M, top_k=TOP_K)

    return {
        "ttl_minutes": ttl_minutes,
        "cutoff": cutoff,
        "kept_count": len(df_kept),
        "dropped_count": len(df_dropped),
        "dropped_examples": df_dropped.sort_values("created_at").head(5)[["ticket_id","customer","issue_type","created_at"]],
        "without_memory": out_no,
        "with_memory": out_yes,
    }

demo_ttl_20 = run_demo(20)
demo_ttl_5  = run_demo(5)

demo_ttl_20["dropped_examples"], demo_ttl_5["dropped_examples"]


✅ OpenAI CALL: embeddings.create
   embeddings input type: <class 'list'> len: 2
✅ OpenAI CALL: chat.completions.create
✅ OpenAI CALL: embeddings.create
   embeddings input type: <class 'list'> len: 1
✅ OpenAI CALL: chat.completions.create
✅ OpenAI CALL: chat.completions.create
✅ OpenAI CALL: chat.completions.create


(   ticket_id   customer                 issue_type          created_at
 4     T-2001     AcmeCo  intermittent_api_timeouts 2026-02-13 10:05:00
 13    T-2010     AcmeCo  intermittent_api_timeouts 2026-02-13 10:14:00
 11    T-2008  GammaShop          auth_token_expiry 2026-02-13 10:23:00
 12    T-2009     AcmeCo  intermittent_api_timeouts 2026-02-13 10:24:00
 14    T-2011     AcmeCo  intermittent_api_timeouts 2026-02-13 10:32:00,
    ticket_id   customer                 issue_type          created_at
 4     T-2001     AcmeCo  intermittent_api_timeouts 2026-02-13 10:05:00
 13    T-2010     AcmeCo  intermittent_api_timeouts 2026-02-13 10:14:00
 11    T-2008  GammaShop          auth_token_expiry 2026-02-13 10:23:00
 12    T-2009     AcmeCo  intermittent_api_timeouts 2026-02-13 10:24:00
 14    T-2011     AcmeCo  intermittent_api_timeouts 2026-02-13 10:32:00)

**Pretty print both runs (what the CEO sees)**

In [16]:
def print_run(demo):
    print("="*90)
    print(f"TTL (retention expiry): {demo['ttl_minutes']} minutes")
    print(f"Memory cutoff time: {demo['cutoff']}")
    print(f"Memories kept: {demo['kept_count']} | dropped: {demo['dropped_count']}\n")

    w = demo["without_memory"]
    m = demo["with_memory"]

    print("=== WITHOUT MEMORY ===")
    print("Questions:")
    for q in w["questions"]:
        print("-", q)
    print("\nSteps:")
    for s in w["steps"]:
        print("-", s)
    print("\nEscalation:", w["escalation"])
    if w["reply"]:
        print("\nCustomer reply (LLM):\n", w["reply"])

    print("\n\n=== WITH MEMORY ===")
    print("Retrieved memories (top):")
    for r in m.get("retrieved_memories", []):
        print("-", r)
    print("\nQuestions:")
    for q in m["questions"]:
        print("-", q)
    print("\nSteps:")
    for s in m["steps"]:
        print("-", s)
    print("\nEscalation:", m["escalation"])
    if m["reply"]:
        print("\nCustomer reply (LLM):\n", m["reply"])

print_run(demo_ttl_20)
print_run(demo_ttl_5)


TTL (retention expiry): 20 minutes
Memory cutoff time: 2026-02-13 11:40:00
Memories kept: 2 | dropped: 13

=== WITHOUT MEMORY ===
Questions:
- Can you share logs and timestamps of the timeouts?
- Did you change anything recently in your client or network?
- What is your current client timeout setting and retry behavior?

Steps:
- Confirm timeframe + traffic pattern (peak vs off-peak).
- Check client-side timeouts and retry strategy (backoff + jitter).
- Inspect upstream latency and error rates; capture traces.

Escalation: Escalate

Customer reply (LLM):
 Subject: Assistance Required for Intermittent API Timeouts

Dear AcmeCo Team,

Thank you for reaching out regarding the intermittent API timeouts you are experiencing. To assist you effectively, we need to gather some additional information:

1. Could you please share the logs and timestamps of the timeouts?
2. Have there been any recent changes in your client configuration or network setup?
3. What are your current client timeout set

**Tiny KPI delta table (before/after, per TTL)**

In [17]:
def kpi_row(demo):
    w = demo["without_memory"]
    m = demo["with_memory"]
    return {
        "TTL_minutes": demo["ttl_minutes"],
        "Kept_memories": demo["kept_count"],
        "Questions_without": w["kpi_questions_count"],
        "Questions_with": m["kpi_questions_count"],
        "ΔQuestions": m["kpi_questions_count"] - w["kpi_questions_count"],
        "Steps_without": w["kpi_steps_count"],
        "Steps_with": m["kpi_steps_count"],
        "ΔSteps": m["kpi_steps_count"] - w["kpi_steps_count"],
        "Repeated_discovery_with": m["kpi_repeated_discovery"],
        "Escalation_without": w["escalation"],
        "Escalation_with": m["escalation"],
    }

kpi_df = pd.DataFrame([kpi_row(demo_ttl_20), kpi_row(demo_ttl_5)])

kpi_df["Interpretation"] = kpi_df["Repeated_discovery_with"].apply(
    lambda v: "Continuity preserved (less rediscovery)" if v == 0 else "Memory expired → rediscovery returns"
)

kpi_df


Unnamed: 0,TTL_minutes,Kept_memories,Questions_without,Questions_with,ΔQuestions,Steps_without,Steps_with,ΔSteps,Repeated_discovery_with,Escalation_without,Escalation_with,Interpretation
0,20,2,3,2,-1,3,4,1,0,Escalate,Escalate to SRE (include prior ticket summary ...,Continuity preserved (less rediscovery)
1,5,0,3,2,-1,3,3,0,1,Escalate,Escalate (recurrence detected; limited histori...,Memory expired → rediscovery returns


**Optional: quick “exec view” table**

In [18]:
kpi_df[[
    "TTL_minutes","Kept_memories",
    "Questions_without","Questions_with","ΔQuestions",
    "Steps_without","Steps_with","ΔSteps",
    "Interpretation"
]]


Unnamed: 0,TTL_minutes,Kept_memories,Questions_without,Questions_with,ΔQuestions,Steps_without,Steps_with,ΔSteps,Interpretation
0,20,2,3,2,-1,3,4,1,Continuity preserved (less rediscovery)
1,5,0,3,2,-1,3,3,0,Memory expired → rediscovery returns
