Skip to content

feat: add history threading and memory-retriever registry spec#77

Merged
aswhitehouse merged 6 commits into
mainfrom
feat/history-and-memory
Apr 13, 2026
Merged

feat: add history threading and memory-retriever registry spec#77
aswhitehouse merged 6 commits into
mainfrom
feat/history-and-memory

Conversation

@aswhitehouse
Copy link
Copy Markdown
Collaborator

OAS is stateless — it never stores conversation history. This PR implements two clean memory patterns that respect that boundary:

History threading (short-term memory)

  • history is now a reserved input convention: an array of prior turns in OpenAI wire format [{"role": "user"|"assistant", "content": "…"}]
  • Python runner extracts history from input_data and passes it through the provider stack; injected between system prompt and current user message at the HTTP payload level
  • npm runner (OpenAI + Anthropic providers) receives the same treatment
  • All provider signatures updated: base.IntelligenceProvider.invoke, OpenAIProvider, AnthropicProvider, CodexProvider, CustomProvider, invoke_intelligence registry function
  • 11 new tests in tests/test_history_threading.py; all existing test fake_invoke stubs updated to accept the optional history parameter

prime-vector/memory-retriever registry spec (long-term memory)

  • New registry spec at Website/public/registry/prime-vector/memory-retriever/
  • Accepts query, memory_store_url, top_k; calls external store; returns {history: [...turns], memory_count} ready for depends_on chaining
  • Added to Website/public/registry/index.json manifest

New examples

  • examples/chat-agent/: spec.yaml + chat.py REPL for session history
  • examples/memory-chat/: pipeline.yaml chaining memory-retriever → chat via depends_on, plus README with mock memory-store script

docs/REFERENCE.md: new Memory management section covering both patterns, boundary table (what OAS does NOT do), and links to examples

What changed

Why

How tested

  • All existing tests pass (pytest tests/)
  • New tests added (if applicable)

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Refactoring / CI / tooling

Breaking changes

Checklist

  • Code follows project style (ruff check . && ruff format --check .)
  • Self-review completed
  • Version bumped if needed (pyproject.toml, templates, docs)

OAS is stateless — it never stores conversation history. This PR
implements two clean memory patterns that respect that boundary:

History threading (short-term memory)
- `history` is now a reserved input convention: an array of prior turns
  in OpenAI wire format [{"role": "user"|"assistant", "content": "…"}]
- Python runner extracts `history` from input_data and passes it through
  the provider stack; injected between system prompt and current user
  message at the HTTP payload level
- npm runner (OpenAI + Anthropic providers) receives the same treatment
- All provider signatures updated: base.IntelligenceProvider.invoke,
  OpenAIProvider, AnthropicProvider, CodexProvider, CustomProvider,
  invoke_intelligence registry function
- 11 new tests in tests/test_history_threading.py; all existing test
  fake_invoke stubs updated to accept the optional history parameter

prime-vector/memory-retriever registry spec (long-term memory)
- New registry spec at Website/public/registry/prime-vector/memory-retriever/
- Accepts query, memory_store_url, top_k; calls external store; returns
  {history: [...turns], memory_count} ready for depends_on chaining
- Added to Website/public/registry/index.json manifest

New examples
- examples/chat-agent/: spec.yaml + chat.py REPL for session history
- examples/memory-chat/: pipeline.yaml chaining memory-retriever → chat
  via depends_on, plus README with mock memory-store script

docs/REFERENCE.md: new Memory management section covering both patterns,
boundary table (what OAS does NOT do), and links to examples

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
open-agent-spec Ready Ready Preview, Comment Apr 13, 2026 6:09am

Request Review

@sgriffiths
Copy link
Copy Markdown
Contributor

Code review

Found 2 issues:

  1. History silently dropped when tools are activehistory is extracted from input_data on line 673 but only forwarded in the no-tools branch (invoke_intelligence on line 682). The tools branch calls _invoke_with_tools on line 678 without passing history, so any task that declares tools will lose all conversation context. The _invoke_with_tools function signature has no history parameter and its internal message list starts without prior turns.

# history is a reserved input convention — never stored by OAS, just forwarded.
history: list[dict] | None = input_data.get("history") or None
try:
tools = resolve_task_tools(spec_data, task_name)
if tools:
raw_output = _invoke_with_tools(
system, user, tools, intelligence_config, task_name
)
else:
raw_output = invoke_intelligence(system, user, intelligence_config, history)

  1. memory-retriever spec references template variables that can never be populated — The user prompt in the registry spec uses {memory_store_results} and {top_k_effective}, but neither appears in the task's input.properties (which only defines query, memory_store_url, top_k). OAS template substitution only replaces variables present in input_data — there is no mechanism to make an HTTP call to the memory store or compute intermediate values during prompt rendering. The prompt will be sent to the model with literal unfilled placeholders, making the spec non-functional as published.

https://github.com/prime-vector/open-agent-spec/blob/06c7a4e041756407e08fd94f7d23c65fc9b752ad/Website/public/registry/prime-vector/memory-retriever/1.0.0/spec.yaml#L93-L116

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Fixes two issues from code review:

1. History silently dropped when tools active (runner.py)
   _invoke_with_tools had no history parameter so conversation context
   was always lost for tool-using tasks.
   - Added history: list[dict] | None = None to _invoke_with_tools
   - Prior turns injected before the current user turn in the initial
     message list (mirroring the no-tools path)
   - Falls back path (text-only providers) also forwards history to
     invoke_intelligence
   - Call site in _run_single_task now passes history to both branches
   - 2 new regression tests in test_history_threading.py

2. memory-retriever spec referenced unfillable template variables
   {memory_store_results} and {top_k_effective} appeared in the user
   prompt but were not in input.properties; OAS has no mechanism to
   make HTTP calls during prompt rendering so the spec was non-functional.
   Redesigned as an LLM re-ranker:
   - Input now requires `candidates` (pre-fetched turns from caller's
     store) instead of performing retrieval itself
   - {candidates} and {top_k} are always in scope from input_data
   - Spec description updated to clarify the two-layer architecture:
     store fetch (application code) + re-rank (this spec)
   - examples/memory-chat/input.json updated with sample candidates
   - examples/memory-chat/README.md rewritten with correct architecture
     diagram and Python integration snippet
   - docs/REFERENCE.md Pattern 2 updated to match

Made-with: Cursor
from .openai_http import OpenAIProvider

return OpenAIProvider().invoke(system=system, user=user, config=config)
return OpenAIProvider().invoke(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor one: CustomProvider._invoke_class has the same history gap you just fixed in _invoke_with_tools. invoke() accepts history and forwards it on the OpenAI fallback path (line 40-42), but the class-based path calls _invoke_class() on line 44 without it _invoke_class has no history parameter.
Lower impact since custom module providers are less common, but same pattern.

The OpenAI-fallback path in CustomProvider.invoke() already forwarded
history correctly, but the class-based path called _invoke_class()
without it — same pattern as the _invoke_with_tools gap fixed earlier.

Custom router classes use a flat single-string prompt interface, so
history turns are serialised and prepended between the system prompt
and the current user turn:

  <system>

  User: <prior turn>
  Assistant: <prior turn>
  …

  <current user message>

This preserves conversation context for custom providers without
requiring any changes to the router class interface (run() signature
is unchanged).

Added two tests in TestCustomProviderHistory:
- history_prepended_to_merged_prompt: verifies turns appear in the
  correct position within the merged prompt string
- no_history_omits_history_section: verifies the prompt is unchanged
  when history is None

Made-with: Cursor
Copy link
Copy Markdown
Contributor

@sgriffiths sgriffiths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all four review items addressed. History threading is now consistent across tools path, custom providers, and the memory-retriever spec.

@aswhitehouse aswhitehouse merged commit f10950e into main Apr 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants