context-agent

A new agent type for pydantic-ai: agents that spin up mini-programs running in context.

AI That Iterates Toward A Better Answer

Most AI agents stop at the first plausible response.

ContextAgent is built to keep going.

It turns quality into explicit criteria, generates candidate outputs, evaluates them, mutates its strategy, and repeats until it converges or reaches a defined limit.

That makes it a better fit for work where omission and vagueness have real consequences.

Why It Exists

One-shot chat is often enough for brainstorming.

It is not enough for work that must be complete, auditable, and aligned to visible criteria before a human should trust it.

ContextAgent is for that second category.

Best Early Uses

Healthcare: discharge instructions, care-gap summaries, prior authorization drafts, and structured health-data repair
Public service: permit packages, code-compliance reviews, and eligibility guidance
Interoperability and validation: structured artifacts that must satisfy validators, profiles, or completeness checks
Scientific and regulatory work: protocols, consent materials, and submission sections where traceability and rigor matter more than speed

The Problem

The Karpathy autoresearch pattern (generate → evaluate → mutate → repeat) demonstrated the value of structured iterative refinement, but the original loop was still orchestrated in Python.

What this project is trying to do is lift that pattern into a reusable agent runtime abstraction, so the loop is a first-class typed program instead of one-off notebook or script glue.

But pydantic-ai's current Agent is a single-turn request/response abstraction. To build an autoresearch loop today, you'd wire together multiple agents with manual state threading, hand-rolled loops, and ad-hoc scoring logic. The graph primitives exist in pydantic_graph, but there's no high-level API that says: "here's a program — run it iteratively until it converges."

The Solution: `ContextAgent`

A ContextAgent is a pydantic-ai agent type that runs context programs — structured, stateful mini-programs with explicit loop state, typed evaluation criteria, and convergence logic. The current implementation uses pydantic_graph to orchestrate the loop while feeding the accumulated program state back into each model call.

+-------------------------------------------------------+
|                     ContextAgent                      |
|                pydantic_graph runtime                 |
|                                                       |
|  +-------------+  +-------------+  +-------------+   |
|  | Generate    |->| Evaluate    |->| Decide      |   |
|  | node        |  | node        |  | node        |   |
|  +------+------+  +------+------+  +------+------+   |
|         |                |                |          |
|         v                v                v          |
|  +-------------+  +-------------+  +-------------+   |
|  | Generator   |  | Evaluator   |  | Mutate      |   |
|  | agent       |  | agent       |<-| node        |   |
|  +------+------+  +-------------+  +------+------+   |
|         ^                                 |          |
|         +---------- Mutator agent <-------+          |
|                                                       |
|  LoopState: cycle, best_score, best_output,           |
|             current_instructions, stalled_cycles,     |
|             history                                   |
+-------------------------------------------------------+

Key Concepts

Concept	Description
ContextProgram	A typed definition of a mini-program: what to generate, how to evaluate, when to stop
Criterion	A single PASS/FAIL evaluation check with typed conditions
CycleResult	Immutable snapshot of one iteration: candidates, scores, mutations, and whether the run improved
LoopState	Mutable runtime state carried across iterations: score, instructions, stop conditions, and history
ContextAgent	Orchestrator that runs the program loop using pydantic_graph underneath

Implications for Agent Frameworks

1. Convergence as a First-Class Runtime

Today's agent patterns treat LLMs as black-box call-response units wired together by Python glue code. ContextAgent pulls the iterative refinement loop into a reusable runtime shape: generate, evaluate, mutate, and stop on convergence. That makes search and refinement a first-class agent capability rather than bespoke application logic.

2. Rich State vs. Stateless Step Calls

The important distinction here is not that Python disappears. It does not. The distinction is that the runtime preserves and reuses the full loop state across iterations instead of treating each model call as an isolated step.

Ad hoc orchestration: manual loops, manual state threading, ad hoc scoring logic
ContextAgent: explicit program state, typed criteria, graph-driven transitions, and model calls that see the accumulated context needed for the current phase

This matters because each phase can reason over the accumulated working state: prior candidates, failure patterns, current instructions, and cycle history. The loop is still externally orchestrated, but it is no longer just one-off glue code.

3. Context Programs vs. One-Off Scripts

The auto-improve skill is the same generate -> evaluate -> mutate pattern written as an in-agent procedure. ContextAgent promotes that pattern into a reusable runtime abstraction: ContextProgram, Criterion, CycleResult, graph nodes, and shared loop state.

4. Self-Improving Agent Infrastructure

When context programs can target other agent artifacts (prompts, tools, system instructions), agents can improve their own infrastructure. The auto-improve skill is a manual version of this — ContextAgent makes it a reusable pattern.

The broader framing is that ContextAgent is a reusable runtime pattern first and a pydantic-ai implementation second.

Architecture

context-agent/
├── README.md
├── .env.example             # Sample runtime configuration
├── pyproject.toml           # Package metadata and CLI entry points
├── uv.lock                  # Locked dependency set for uv
├── src/context_agent/
│   ├── __init__.py          # Public API exports
│   ├── agent.py             # ContextAgent orchestration entry point
│   ├── bootstrap.py         # Prompt -> ContextProgram decomposition
│   ├── program.py           # ContextProgram, Criterion, CycleResult, ProgramResult
│   ├── nodes.py             # pydantic_graph loop nodes
│   ├── cli.py               # context-agent CLI
│   ├── ui.py                # Gradio UI entry point
│   ├── data_sources.py      # Pull-based context sources (FHIR, etc.)
│   ├── runtime_connectors.py # Runtime toolsets, MCP, and plugin loading
│   ├── plugin_examples.py   # Example runtime plugin factory
│   ├── eval_types.py        # Evaluation result models
│   ├── defaults.py          # Shared model defaults
│   └── model_retry.py       # Model retry and transient failure handling
├── examples/
│   ├── prompt_runner.py     # Shared runner for prompt-driven examples
│   ├── autoresearch_skill_improver.py # Bespoke auto-improve loop example
│   ├── fhir_a1c_optimizer.py
│   ├── fhir_care_gaps.py
│   ├── fhir_wearable_monitor.py
│   ├── healthcare_cds_optimizer.py
│   ├── k8s_hardener.py
│   ├── permit_compliance_reviewer.py
│   ├── crop_rotation_optimizer.py
│   ├── spc_rule_optimizer.py
│   ├── trial_protocol_optimizer.py
│   └── sample_inputs/
├── tests/
│   ├── test_context_agent.py
│   ├── test_cli.py
│   ├── test_runtime_connectors.py
│   └── test_ui.py
├── pydantic-ai-context-agent.patch # Upstream patch/proposal artifact
└── vendor/                  # Vendored upstream reference code

Implementation Note

ContextAgent is implemented in this repository as its own runtime layer on top of pydantic-ai and pydantic_graph.

Quick Start

from pydantic_ai import Agent
from context_agent import ContextAgent, ContextProgram, Criterion

# Define evaluation criteria
criteria = [
    Criterion(name="clear_trigger", pass_when="specific testable scenarios", fail_when="vague guidance"),
    Criterion(name="actionable_steps", pass_when="concrete named actions", fail_when="'consider' or 'ensure'"),
]

# Define the program
program = ContextProgram(
    name="improve-skill",
    generator_instructions="Rewrite this SKILL.md to fix all failing criteria...",
    evaluator_instructions="Score this SKILL.md against the criteria...",
    mutator_instructions="Update the rewrite strategy based on failures...",
    criteria=criteria,
    max_cycles=3,
)

# Run it
agent = ContextAgent('openrouter:nvidia/nemotron-3-super-120b-a12b:free')
result = await agent.run_program(program, input_text=original_skill_md)
print(f"Final score: {result.best_score}/{len(criteria)}")
print(result.best_output)

CLI — Just Pass a Prompt

The fastest way to use context-agent: describe what you want. The agent bootstraps its own evaluation criteria, generator, and mutation strategy from your prompt.

# Install
cd context-agent && uv venv && uv sync --extra dev

# Install UI and OAuth extras when using context-agent-ui
uv sync --extra dev --extra ui

# If you already had the project installed before dependency changes, resync it
uv sync --extra dev

# Optional: start from the sample environment file
cp .env.example .env

# Open-ended: agent self-assembles a program from your prompt
uv run context-agent "Write a production-ready Dockerfile for a Python FastAPI app"

# Improve an existing file
uv run context-agent "Improve this system prompt for clarity and specificity" --input prompt.txt

# Evaluate without rewriting
uv run context-agent "Evaluate this API schema" --input openapi.yaml --eval-only

# More cycles + verbose output
uv run context-agent "Optimize this SQL query for performance" --input query.sql --cycles 5 --verbose

# Explicit model override
uv run context-agent "Harden this Terraform module" --input main.tf --model openrouter:nvidia/nemotron-3-super-120b-a12b:free

# Backward-compat: auto-improve a SKILL.md
uv run context-agent improve-skill --target path/to/SKILL.md

# Optional: expose the in-repo example runtime plugin factory
export CONTEXT_AGENT_TOOL_FACTORIES=context_agent.plugin_examples:build_example_runtime_plugins

# Inspect which runtime connection keys are available right now
uv run context-agent --list-connections

# Let the bootstrapper decide that CLI inspection is needed
CONTEXT_AGENT_ENABLE_CLI=1 uv run context-agent \
  "Determine what operating system this machine is running on. Use the appropriate tool if needed and answer briefly."

# Or force the CLI connector explicitly for deterministic OS checks
CONTEXT_AGENT_ENABLE_CLI=1 uv run context-agent \
  "Determine what operating system this machine is running on. Use the available command tool, prefer uname and sw_vers when available, and answer briefly." \
  --connection cli

When CONTEXT_AGENT_TOOL_FACTORIES or CONTEXT_AGENT_MCP_CONFIG are set, the UI shows any discovered tool: and mcp: keys at startup so you can see exactly which runtime surfaces came from env/config before a run begins.

ContextAgent defaults to the native pydantic-ai openrouter: provider. That provider still uses the openai Python client underneath, so the base install includes the openrouter extra from pydantic-ai-slim to bring in the required client library automatically.

Most one-shot scripts in examples/ are now thin prompt wrappers around the same bootstrap path as the CLI. The only intentionally bespoke examples left are the continuous monitoring loop and the auto-improve rubric, where the program structure itself is part of the example.

How it works under the hood:

Bootstrap: An LLM call decomposes your prompt into 3-6 typed evaluation criteria + domain-specific instructions
Generate: The generator agent produces candidate outputs
Evaluate: The evaluator agent scores each candidate against the criteria (PASS/FAIL)
Decide: Keep the best candidate if it improved the score
Mutate: Analyze failures and evolve the generation strategy
Repeat: Loop until convergence or max cycles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

context-agent

AI That Iterates Toward A Better Answer

Why It Exists

Best Early Uses

The Problem

The Solution: `ContextAgent`

Key Concepts

Implications for Agent Frameworks

1. Convergence as a First-Class Runtime

2. Rich State vs. Stateless Step Calls

3. Context Programs vs. One-Off Scripts

4. Self-Improving Agent Infrastructure

Architecture

Implementation Note

Quick Start

CLI — Just Pass a Prompt

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
examples		examples
src/context_agent		src/context_agent
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pydantic-ai-context-agent.patch		pydantic-ai-context-agent.patch
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

context-agent

AI That Iterates Toward A Better Answer

Why It Exists

Best Early Uses

The Problem

The Solution: ContextAgent

Key Concepts

Implications for Agent Frameworks

1. Convergence as a First-Class Runtime

2. Rich State vs. Stateless Step Calls

3. Context Programs vs. One-Off Scripts

4. Self-Improving Agent Infrastructure

Architecture

Implementation Note

Quick Start

CLI — Just Pass a Prompt

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

The Solution: `ContextAgent`

Packages