Skip to content

medright/context-agent

Repository files navigation

context-agent

A new agent type for pydantic-ai: agents that spin up mini-programs running in context.

AI That Iterates Toward A Better Answer

Most AI agents stop at the first plausible response.

ContextAgent is built to keep going.

It turns quality into explicit criteria, generates candidate outputs, evaluates them, mutates its strategy, and repeats until it converges or reaches a defined limit.

That makes it a better fit for work where omission and vagueness have real consequences.

Why It Exists

One-shot chat is often enough for brainstorming.

It is not enough for work that must be complete, auditable, and aligned to visible criteria before a human should trust it.

ContextAgent is for that second category.

Best Early Uses

  • Healthcare: discharge instructions, care-gap summaries, prior authorization drafts, and structured health-data repair
  • Public service: permit packages, code-compliance reviews, and eligibility guidance
  • Interoperability and validation: structured artifacts that must satisfy validators, profiles, or completeness checks
  • Scientific and regulatory work: protocols, consent materials, and submission sections where traceability and rigor matter more than speed

The Problem

The Karpathy autoresearch pattern (generate → evaluate → mutate → repeat) demonstrated the value of structured iterative refinement, but the original loop was still orchestrated in Python.

What this project is trying to do is lift that pattern into a reusable agent runtime abstraction, so the loop is a first-class typed program instead of one-off notebook or script glue.

But pydantic-ai's current Agent is a single-turn request/response abstraction. To build an autoresearch loop today, you'd wire together multiple agents with manual state threading, hand-rolled loops, and ad-hoc scoring logic. The graph primitives exist in pydantic_graph, but there's no high-level API that says: "here's a program — run it iteratively until it converges."

The Solution: ContextAgent

A ContextAgent is a pydantic-ai agent type that runs context programs — structured, stateful mini-programs with explicit loop state, typed evaluation criteria, and convergence logic. The current implementation uses pydantic_graph to orchestrate the loop while feeding the accumulated program state back into each model call.

+-------------------------------------------------------+
|                     ContextAgent                      |
|                pydantic_graph runtime                 |
|                                                       |
|  +-------------+  +-------------+  +-------------+   |
|  | Generate    |->| Evaluate    |->| Decide      |   |
|  | node        |  | node        |  | node        |   |
|  +------+------+  +------+------+  +------+------+   |
|         |                |                |          |
|         v                v                v          |
|  +-------------+  +-------------+  +-------------+   |
|  | Generator   |  | Evaluator   |  | Mutate      |   |
|  | agent       |  | agent       |<-| node        |   |
|  +------+------+  +-------------+  +------+------+   |
|         ^                                 |          |
|         +---------- Mutator agent <-------+          |
|                                                       |
|  LoopState: cycle, best_score, best_output,           |
|             current_instructions, stalled_cycles,     |
|             history                                   |
+-------------------------------------------------------+

Key Concepts

Concept Description
ContextProgram A typed definition of a mini-program: what to generate, how to evaluate, when to stop
Criterion A single PASS/FAIL evaluation check with typed conditions
CycleResult Immutable snapshot of one iteration: candidates, scores, mutations, and whether the run improved
LoopState Mutable runtime state carried across iterations: score, instructions, stop conditions, and history
ContextAgent Orchestrator that runs the program loop using pydantic_graph underneath

Implications for Agent Frameworks

1. Convergence as a First-Class Runtime

Today's agent patterns treat LLMs as black-box call-response units wired together by Python glue code. ContextAgent pulls the iterative refinement loop into a reusable runtime shape: generate, evaluate, mutate, and stop on convergence. That makes search and refinement a first-class agent capability rather than bespoke application logic.

2. Rich State vs. Stateless Step Calls

The important distinction here is not that Python disappears. It does not. The distinction is that the runtime preserves and reuses the full loop state across iterations instead of treating each model call as an isolated step.

  • Ad hoc orchestration: manual loops, manual state threading, ad hoc scoring logic
  • ContextAgent: explicit program state, typed criteria, graph-driven transitions, and model calls that see the accumulated context needed for the current phase

This matters because each phase can reason over the accumulated working state: prior candidates, failure patterns, current instructions, and cycle history. The loop is still externally orchestrated, but it is no longer just one-off glue code.

3. Context Programs vs. One-Off Scripts

The auto-improve skill is the same generate -> evaluate -> mutate pattern written as an in-agent procedure. ContextAgent promotes that pattern into a reusable runtime abstraction: ContextProgram, Criterion, CycleResult, graph nodes, and shared loop state.

4. Self-Improving Agent Infrastructure

When context programs can target other agent artifacts (prompts, tools, system instructions), agents can improve their own infrastructure. The auto-improve skill is a manual version of this — ContextAgent makes it a reusable pattern.

The broader framing is that ContextAgent is a reusable runtime pattern first and a pydantic-ai implementation second.

Architecture

context-agent/
├── README.md
├── .env.example             # Sample runtime configuration
├── pyproject.toml           # Package metadata and CLI entry points
├── uv.lock                  # Locked dependency set for uv
├── src/context_agent/
│   ├── __init__.py          # Public API exports
│   ├── agent.py             # ContextAgent orchestration entry point
│   ├── bootstrap.py         # Prompt -> ContextProgram decomposition
│   ├── program.py           # ContextProgram, Criterion, CycleResult, ProgramResult
│   ├── nodes.py             # pydantic_graph loop nodes
│   ├── cli.py               # context-agent CLI
│   ├── ui.py                # Gradio UI entry point
│   ├── data_sources.py      # Pull-based context sources (FHIR, etc.)
│   ├── runtime_connectors.py # Runtime toolsets, MCP, and plugin loading
│   ├── plugin_examples.py   # Example runtime plugin factory
│   ├── eval_types.py        # Evaluation result models
│   ├── defaults.py          # Shared model defaults
│   └── model_retry.py       # Model retry and transient failure handling
├── examples/
│   ├── prompt_runner.py     # Shared runner for prompt-driven examples
│   ├── autoresearch_skill_improver.py # Bespoke auto-improve loop example
│   ├── fhir_a1c_optimizer.py
│   ├── fhir_care_gaps.py
│   ├── fhir_wearable_monitor.py
│   ├── healthcare_cds_optimizer.py
│   ├── k8s_hardener.py
│   ├── permit_compliance_reviewer.py
│   ├── crop_rotation_optimizer.py
│   ├── spc_rule_optimizer.py
│   ├── trial_protocol_optimizer.py
│   └── sample_inputs/
├── tests/
│   ├── test_context_agent.py
│   ├── test_cli.py
│   ├── test_runtime_connectors.py
│   └── test_ui.py
├── pydantic-ai-context-agent.patch # Upstream patch/proposal artifact
└── vendor/                  # Vendored upstream reference code

Implementation Note

ContextAgent is implemented in this repository as its own runtime layer on top of pydantic-ai and pydantic_graph.

Quick Start

from pydantic_ai import Agent
from context_agent import ContextAgent, ContextProgram, Criterion

# Define evaluation criteria
criteria = [
    Criterion(name="clear_trigger", pass_when="specific testable scenarios", fail_when="vague guidance"),
    Criterion(name="actionable_steps", pass_when="concrete named actions", fail_when="'consider' or 'ensure'"),
]

# Define the program
program = ContextProgram(
    name="improve-skill",
    generator_instructions="Rewrite this SKILL.md to fix all failing criteria...",
    evaluator_instructions="Score this SKILL.md against the criteria...",
    mutator_instructions="Update the rewrite strategy based on failures...",
    criteria=criteria,
    max_cycles=3,
)

# Run it
agent = ContextAgent('openrouter:nvidia/nemotron-3-super-120b-a12b:free')
result = await agent.run_program(program, input_text=original_skill_md)
print(f"Final score: {result.best_score}/{len(criteria)}")
print(result.best_output)

CLI — Just Pass a Prompt

The fastest way to use context-agent: describe what you want. The agent bootstraps its own evaluation criteria, generator, and mutation strategy from your prompt.

# Install
cd context-agent && uv venv && uv sync --extra dev

# Install UI and OAuth extras when using context-agent-ui
uv sync --extra dev --extra ui

# If you already had the project installed before dependency changes, resync it
uv sync --extra dev

# Optional: start from the sample environment file
cp .env.example .env

# Open-ended: agent self-assembles a program from your prompt
uv run context-agent "Write a production-ready Dockerfile for a Python FastAPI app"

# Improve an existing file
uv run context-agent "Improve this system prompt for clarity and specificity" --input prompt.txt

# Evaluate without rewriting
uv run context-agent "Evaluate this API schema" --input openapi.yaml --eval-only

# More cycles + verbose output
uv run context-agent "Optimize this SQL query for performance" --input query.sql --cycles 5 --verbose

# Explicit model override
uv run context-agent "Harden this Terraform module" --input main.tf --model openrouter:nvidia/nemotron-3-super-120b-a12b:free

# Backward-compat: auto-improve a SKILL.md
uv run context-agent improve-skill --target path/to/SKILL.md

# Optional: expose the in-repo example runtime plugin factory
export CONTEXT_AGENT_TOOL_FACTORIES=context_agent.plugin_examples:build_example_runtime_plugins

# Inspect which runtime connection keys are available right now
uv run context-agent --list-connections

# Let the bootstrapper decide that CLI inspection is needed
CONTEXT_AGENT_ENABLE_CLI=1 uv run context-agent \
  "Determine what operating system this machine is running on. Use the appropriate tool if needed and answer briefly."

# Or force the CLI connector explicitly for deterministic OS checks
CONTEXT_AGENT_ENABLE_CLI=1 uv run context-agent \
  "Determine what operating system this machine is running on. Use the available command tool, prefer uname and sw_vers when available, and answer briefly." \
  --connection cli

When CONTEXT_AGENT_TOOL_FACTORIES or CONTEXT_AGENT_MCP_CONFIG are set, the UI shows any discovered tool: and mcp: keys at startup so you can see exactly which runtime surfaces came from env/config before a run begins.

ContextAgent defaults to the native pydantic-ai openrouter: provider. That provider still uses the openai Python client underneath, so the base install includes the openrouter extra from pydantic-ai-slim to bring in the required client library automatically.

Most one-shot scripts in examples/ are now thin prompt wrappers around the same bootstrap path as the CLI. The only intentionally bespoke examples left are the continuous monitoring loop and the auto-improve rubric, where the program structure itself is part of the example.

How it works under the hood:

  1. Bootstrap: An LLM call decomposes your prompt into 3-6 typed evaluation criteria + domain-specific instructions
  2. Generate: The generator agent produces candidate outputs
  3. Evaluate: The evaluator agent scores each candidate against the criteria (PASS/FAIL)
  4. Decide: Keep the best candidate if it improved the score
  5. Mutate: Analyze failures and evolve the generation strategy
  6. Repeat: Loop until convergence or max cycles

About

Iterative convergence agent runtime for pydantic-ai. Generate → Evaluate → Mutate → Repeat.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages