Skip to content

Python: Core: add experimental memory harness context provider#5613

Merged
eavanvalkenburg merged 3 commits intomicrosoft:mainfrom
eavanvalkenburg:harness/memory
May 4, 2026
Merged

Python: Core: add experimental memory harness context provider#5613
eavanvalkenburg merged 3 commits intomicrosoft:mainfrom
eavanvalkenburg:harness/memory

Conversation

@eavanvalkenburg
Copy link
Copy Markdown
Member

@eavanvalkenburg eavanvalkenburg commented May 3, 2026

Motivation and Context

Part of the experimental Agent Harness feature; the .NET counterpart work shipped in PR #5310 (.NET: Harness Feature branch) and follow-ups #5404, #5365, #5540.

Unlike sibling PRs #5611 (mode) and #5612 (todo), the Python MemoryContextProvider does not mirror a single .NET class one-to-one. It is a distinct take on long-term memory designed for chat-driven, multi-session agents. The closest .NET cousins are:

  • dotnet/src/Microsoft.Agents.AI/Harness/FileMemory/FileMemoryProvider.cs — session-scoped file working memory; the agent uses SaveFile/ReadFile/etc. to manage its own file bag.
  • dotnet/src/Microsoft.Agents.AI/Memory/ChatHistoryMemoryProvider.cs — derives memory from chat history.

Description

Adds MemoryContextProvider to the experimental _harness namespace: an LLM-managed long-term memory with topic indexing and chat-driven extraction / consolidation. Memory is materialized on disk as a top-level MEMORY.md index plus per-topic markdown files, plus a state file for bookkeeping.

Public types:

  • MemoryContextProvider — the context provider
  • MemoryStore — abstract backend
  • MemoryFileStore — JSONL/markdown-on-disk backend
  • MemoryIndexEntry, MemoryTopicRecord — record schemas
  • DEFAULT_MEMORY_SOURCE_ID

All new public symbols decorated with @experimental(ExperimentalFeature.HARNESS). If a sibling harness PR has not yet landed, this PR also adds the HARNESS value to the ExperimentalFeature enum and creates the (empty) _harness/__init__.py.

Relationship to .NET

.NET Python Same idea?
FileMemoryProvider (session-scoped file bag, agent-driven SaveFile/ReadFile) n/a in this PR No — different semantics. A Python equivalent could be added later.
ChatHistoryMemoryProvider (history-derived) MemoryContextProvider (history-derived, topic-indexed, on-disk index + topic files, LLM extraction & consolidation) Spiritually similar; mechanism diverges. The Python version adds explicit topic indexing, periodic consolidation, and a markdown-readable on-disk layout.
AgentFileStore (pluggable file backend) MemoryStore / MemoryFileStore (pluggable memory backend) Structurally similar pattern (abstract store + file-backed default).

The Python design choices (topic index, MEMORY.md, per-topic files, configurable extraction / consolidation prompts) are intentional and tuned for chat-first, multi-session agents. They are not meant to subsume the .NET FileMemory file-bag pattern; the two can coexist as siblings.

Size note: this is the largest of the three split PRs (~1458 LOC of source). Happy to split the file-store backend or the consolidation logic into a follow-up if reviewers prefer, but the public surface is cohesive enough to ship as one unit.

This PR is one of three splitting the _harness package work apart for review:

  1. Python: Core: add experimental session-mode harness context provider #5611 — session-mode context provider — mirrors .NET AgentModeProvider
  2. Python: Core: add experimental todo-list harness context provider #5612 — todo-list context provider — mirrors .NET TodoProvider
  3. (this PR) memory context provider — semantically distinct from .NET FileMemoryProvider; closer in spirit to .NET ChatHistoryMemoryProvider but with a richer on-disk topic index

The three modules are independent. They share only the HARNESS enum entry and the (empty) _harness/__init__.py. Mechanical merge conflicts in __init__.py __all__ and _feature_stage.py are expected if a sibling lands first.

Note: a couple of pyright errors in _telemetry.py and a flaky test_detect_hosted_fallback_import_error reproduce on a clean checkout of main and on the sibling harness/mode PR — they are unrelated to this change.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? No — gated behind @experimental(ExperimentalFeature.HARNESS).

Copilot AI review requested due to automatic review settings May 3, 2026 13:31
@moonbox3 moonbox3 added the python label May 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an experimental long-term memory context provider to the Python core package’s harness surface. It introduces a file-backed memory store plus topic/index record types so agents can persist durable memories, search transcript history, and run extraction/consolidation flows as part of the context provider pipeline.

Changes:

  • Add MemoryContextProvider, MemoryStore, MemoryFileStore, and related record/constant types under the experimental harness surface.
  • Implement filesystem-backed topic/index/state/transcript storage plus tool hooks for listing, reading, writing, deleting, searching, and consolidating memory.
  • Export the new symbols publicly, add the HARNESS experimental feature enum value, and add tests covering the new provider/store behavior.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
python/packages/core/agent_framework/_harness/_memory.py Implements the new memory provider, record types, abstract store, and file-backed store/tooling.
python/packages/core/tests/core/test_harness_memory.py Adds coverage for record serialization, file-store behavior, provider context injection, tools, consolidation, and experimental metadata.
python/packages/core/agent_framework/_feature_stage.py Adds the new HARNESS experimental feature flag.
python/packages/core/agent_framework/__init__.py Re-exports the new memory harness public APIs from the package root.
python/packages/core/agent_framework/_harness/__init__.py Adds the harness package marker for the new namespace.

Comment thread python/packages/core/agent_framework/_harness/_memory.py Outdated
Comment thread python/packages/core/agent_framework/_harness/_memory.py
Comment thread python/packages/core/agent_framework/_harness/_memory.py
Comment thread python/packages/core/agent_framework/_harness/_memory.py Outdated
Comment thread python/packages/core/agent_framework/_harness/_memory.py Outdated
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 2 | Confidence: 92%

✓ Correctness

This PR adds a memory harness system (MemoryIndexEntry, MemoryTopicRecord, MemoryStore, MemoryFileStore, MemoryContextProvider) with transcript-backed extraction, consolidation, and topic management. The code is well-structured and correctly uses the existing framework interfaces (HistoryProvider, SessionContext, FileHistoryProvider, SupportsChatGetResponse). I verified the signatures and behaviors of all referenced APIs against the source. The async tool handling, message grouping, index rebuilding, extraction/consolidation pipelines, and test assertions are all correct. No correctness bugs were found.

✗ Design Approach

The overall direction is promising, but the current design couples durable-memory context injection to the HistoryProvider hook in a way that does not compose with the framework’s existing per-service-call history mode. There is also a narrower namespace assumption in transcript search that makes the new source_id configurability only partially real. I would request changes before merging because the main provider abstraction is wired into the wrong lifecycle.

Flagged Issues

  • MemoryContextProvider inherits from HistoryProvider and does much more than history loading: its before_run adds tools, instructions, selected topic files, and recent transcript context. But the agent explicitly skips HistoryProvider.before_run during require_per_service_call_history_persistence (python/packages/core/agent_framework/_agents.py:1421-1425), and the per-service-call middleware only copies service_call_context.get_messages(include_input=True) back into the chat call (python/packages/core/agent_framework/_sessions.py:611-617, :687). That means this provider silently loses its memory tools/instructions in a supported agent mode. The better approach is to make the memory harness a plain ContextProvider that composes with an internal history/archive helper, instead of inheriting the history-provider lifecycle.

Automated review by eavanvalkenburg's agents

Adds MemoryContextProvider with topic-indexed long-term memory and
chat-driven compaction. Pluggable MemoryStore backends include
MemoryFileStore. Public types: MemoryIndexEntry, MemoryTopicRecord.
Behind @experimental(ExperimentalFeature.HARNESS).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented May 3, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _feature_stage.py117694%99, 156, 167, 174, 207, 235
packages/core/agent_framework/_harness
   _memory.py77211784%75–76, 92, 99, 151, 164, 171–173, 196, 216, 225–229, 292, 298, 333, 414, 422, 424, 428, 431, 435, 530, 542, 560–561, 572–573, 584, 699, 716, 725, 740, 768, 771–773, 783, 805–808, 857, 867, 870, 872, 892, 902, 907, 910, 922, 990, 992, 994, 996, 998, 1000, 1070, 1087, 1115, 1144, 1152–1154, 1156, 1210, 1236–1239, 1245, 1386, 1389, 1400–1401, 1405, 1408–1410, 1416, 1420, 1425, 1429, 1434, 1438, 1446–1447, 1498, 1501–1504, 1511–1512, 1540, 1544–1552, 1569, 1595–1596, 1599–1600, 1606, 1608, 1613, 1618, 1624
TOTAL31823369388% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
6252 30 💤 0 ❌ 0 🔥 1m 43s ⏱️

- mark MemoryStore as @experimental(HARNESS) for surface consistency
- safely encode owner id and verify path containment (matches FileHistoryProvider pattern)
- namespace MemoryFileStore on-disk layout by source_id to avoid cross-provider collisions
- before_run computes index_entries once and only rewrites MEMORY.md when content changes
- asyncio locks around topic/state read-modify-write to avoid concurrent-write races

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@eavanvalkenburg eavanvalkenburg enabled auto-merge May 3, 2026 20:17
Comment thread python/packages/core/agent_framework/_harness/_memory.py
Comment thread python/packages/core/agent_framework/_harness/_memory.py
Comment thread python/packages/core/agent_framework/_harness/_memory.py
Comment thread python/packages/core/agent_framework/_harness/_memory.py
Comment thread python/packages/core/agent_framework/_harness/_memory.py
Comment thread python/packages/core/agent_framework/_harness/_memory.py
Comment thread python/packages/core/agent_framework/_harness/_memory.py Outdated
- Atomic writes via os.replace + temp sibling for topic, state, and index files so
  crashes/disk-full failures cannot leave a truncated half-written file.
- Stop creating directories on read paths: list_topics/read_state/search_transcripts
  and get_messages return empty when nothing has been written. mkdir is deferred to
  the actual save path (write_topic/write_state/save_messages).
- Escape lines that look like markdown headings on render and unescape them on parse,
  so a memory or summary containing '## Summary'/'## Memories' cannot tamper with the
  topic file structure.
- Narrow extraction/consolidation chat-client failure handling to ChatClientException,
  asyncio.TimeoutError, and OSError. Programmer errors (AttributeError, TypeError, ...)
  now propagate so misconfigured clients fail loudly.
- Log a payload-prefix preview for every silent shape branch in _extract_memories and
  _consolidate_topic so unparsable extractor output is debuggable instead of invisible.
- Restructure _run_consolidation: read maintenance state and topic snapshot under the
  state lock, run the LLM consolidation loop without holding the state lock, and only
  advance last_consolidated_at/sessions_since_consolidation if at least one topic
  succeeded. Transient consolidation failures now leave the maintenance window in
  place so the next after_run retries instead of silently sliding forward.
- Add regression tests for: markdown-marker round-trip, atomic-write recovery on
  os.replace failure, no-mkdir on pure read paths, transient consolidation failure
  preserves state, and propagation of programmer errors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue May 4, 2026
Merged via the queue into microsoft:main with commit 4a2da95 May 4, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants