An async, modular LLM agent framework for tool use, deterministic retrieval, multi-agent orchestration, and reproducible evaluation.
EvoAgent is a Python framework for building autonomous LLM agents around a single, inspectable ReAct loop. It ships a broad built-in toolset, a deterministic-first code-retrieval stack, an MCP client, parallel sub-agents, real token-level streaming, crash-safe resume, and an OpenTelemetry-compatible tracing layer β all verified end-to-end against a live model API.
Every component (models, tools, memory, retrieval, workflow nodes, evaluators) is replaceable, and every run is traceable, making EvoAgent suitable for both production automation and agent research.
- Features
- Architecture
- Installation
- Quick Start
- Configuration
- CLI
- Built-in Tools
- Advanced Capabilities
- Testing
- Security Model
- Project Layout
- Roadmap
- Contributing
- License
- Canonical ReAct engine β one readβdecideβact loop with permission checks, context compaction, cost tracking, and a provider-safe message history (every
tool_callsturn is answered). - Rich tool suite β file editing, patching with undo, shell/Python execution, test running, glob, AST symbol outlines, deterministic code search, and web fetch/search with SSRF protection.
- Deterministic-first retrieval β symbol-aware code chunking + keyword ranking, with an optional persistent vector store and hashing embeddings layered on top.
- MCP client β connect to any Model Context Protocol server over stdio JSON-RPC and expose its tools to the agent.
- Parallel sub-agents β a
tasktool that fans out independent sub-tasks to fresh, isolated agents running concurrently. - Real streaming β token-level SSE with streamed tool-call assembly (no dropped tool calls).
- Interrupt & steering β inject instructions, stop after the current tool, cancel a long-running command, or forbid edits to specific files, all mid-run.
- Crash recovery β atomic per-run checkpoints with
agent.resume(run_id). - Reliability & safety β HTTP retry with
Retry-Afterbackoff, recursive secret redaction before any persistence, and a deny/ask/allow permission policy. - Observability β dependency-optional OpenTelemetry tracing for run, LLM, and tool spans.
- Evaluation β a Docker-free SWE-bench-style harness that produces a patch and verifies it against the instance's tests on a clean checkout.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent - memory Β· checkpoints Β· tracing Β· steering β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ReAct Engine - loop Β· compaction Β· cost β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Model Router - OpenAI-compatible / DeepSeek β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tool Registry - files Β· shell Β· web Β· β
β code_search Β· MCP Β· task β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Permission Policy + Retrieval / Vector Store β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Requires Python 3.11+.
git clone https://github.com/mingbo-yang/EvoAgent.git
cd EvoAgent
pip install -e .
# optional extras
pip install -e ".[dev]" # tests, linting
pip install -e ".[observability]" # OpenTelemetry tracingSet an API key (DeepSeek or any OpenAI-compatible endpoint):
export DEEPSEEK_API_KEY="sk-..."Run a one-shot task from Python:
import asyncio
from pathlib import Path
from evoagent.core.agent import Agent
from evoagent.models.schema import ModelConfig
from evoagent.models.factory import ProviderFactory
from evoagent.models.router import ModelRouter
from evoagent.tools.builtin import create_builtin_registry
async def main():
workspace = Path(".")
provider = ProviderFactory.create(ModelConfig(
provider="deepseek",
model="deepseek-chat",
base_url="https://api.deepseek.com/v1",
api_key_env="DEEPSEEK_API_KEY",
))
router = ModelRouter(providers={"default": provider})
registry = create_builtin_registry(workspace, auto_approve=True)
agent = Agent(model_router=router, tool_registry=registry, workspace=workspace)
result = await agent.run("Find the failing test in this project and fix it.")
print(result.final_answer)
asyncio.run(main())Or from the terminal:
evoagent run "Summarize the structure of this repository"
evoagent chat # interactive sessionEvoAgent reads configuration from evoagent.yaml and environment variables. Initialize a project with:
evoagent initAPI keys are referenced by environment-variable name (api_key_env) and are never written to source, logs, traces, or session files.
| Variable | Purpose |
|---|---|
DEEPSEEK_API_KEY |
DeepSeek API key (required for the model) |
TAVILY_API_KEY |
Optional Tavily search API key β enables the web_search fallback |
EVOAGENT_EGRESS_ALLOWLIST |
Optional comma-separated host allowlist for web tools |
The web_search tool works without any API key by scraping Bing and
DuckDuckGo HTML results. For higher reliability you can optionally enable the
Tavily search API as a fallback β it is only called
when the free HTML backends return nothing or are unreachable, so it conserves
your Tavily credits:
export TAVILY_API_KEY="tvly-..." # optional; never commit this valueWhen set, search resolution is: Bing β DuckDuckGo β Tavily. The key is read only from the environment and is never written to source, logs, or sessions.
| Command | Description |
|---|---|
evoagent run <task> |
Run a one-shot agent task |
evoagent chat |
Start an interactive chat session |
evoagent code <task> |
Run the code agent on a software task |
evoagent eval <suite> |
Run an evaluation benchmark suite |
evoagent init |
Initialize EvoAgent in the current directory |
evoagent config |
Manage configuration |
evoagent memory |
Manage agent memories |
evoagent trace |
Inspect execution traces |
| Category | Tools |
|---|---|
| Files | read_file, write_file, edit_file, multi_edit, apply_patch, undo_last |
| Navigation | list_directory, grep, glob, outline, code_search |
| Execution | bash, python, run_tests |
| Version control | git_status, git_diff |
| Planning | write_todos, list_todos |
| Web | web_fetch, web_search (SSRF-guarded) |
| Orchestration | task (parallel sub-agents), plus any MCP server tools |
Interrupt & steering
from evoagent.core.steering import SteeringController
steering = SteeringController()
agent = Agent(..., steering=steering)
steering.inject("Also update the changelog") # queue an instruction
steering.forbid_file("config.py") # protect a file
steering.request_stop() # stop after the current tool
steering.cancel() # cancel an in-flight toolCrash recovery & resume
agent = Agent(..., checkpoint_dir=".runs")
result = await agent.run("Long multi-step task")
# After a crash/restart:
result = await agent.resume(result.run_id, follow_up="Continue where you left off")MCP client
from evoagent.mcp import MCPClient, register_mcp_tools
client = MCPClient(["python", "my_mcp_server.py"])
await register_mcp_tools(registry, client) # tools registered under mcp__*Observability
from evoagent.observability import Tracer, configure_otel
configure_otel("evoagent") # if opentelemetry SDK is installed
tracer = Tracer(use_otel=True)
agent = Agent(..., tracer=tracer)
# tracer.spans_named("tool.execute") -> recorded spanspip install -e ".[dev]"
ruff check evoagent tests
pytest -qThe suite contains 649 tests. Every major capability is additionally verified end-to-end against a live model API rather than mocks alone.
- Permission policy β deny > ask > allow, with safe defaults that block destructive shell commands and writes to system paths.
- Workspace sandboxing β file tools reject paths that escape the workspace;
globrejects..traversal. - Egress protection β web tools block requests to private, loopback, link-local, and reserved addresses (re-checked on every redirect) and honor an optional host allowlist.
- Secret redaction β API keys, tokens, and credentials are recursively redacted before any logging, tracing, or session persistence.
EvoAgent is research-grade software. Review the Security Policy before running it against untrusted inputs or in production.
evoagent/
βββ core/ # ReAct engine, agent, steering, checkpoints, cost, redaction
βββ models/ # provider abstraction, OpenAI-compatible + DeepSeek, streaming
βββ tools/ # built-in tool registry and implementations
βββ sandbox/ # permission policy + egress (SSRF) protection
βββ retrieval/ # code retriever, vector store, embeddings, keyword index
βββ rag/ # document loading, chunking, query engine
βββ mcp/ # Model Context Protocol stdio client
βββ observability/ # OpenTelemetry-compatible tracing
βββ conversation/ # sessions, runtime, context compaction
βββ memory/ # experience/reflection memory store
βββ planning/ # planner, executor, critic, reflector
βββ workflow/ # workflow graph + runtime
βββ multi_agent/ # multi-agent roles and protocols
βββ eval/ # evaluation harness + SWE-bench-style runner
βββ skills/ # reusable agent skills
βββ code/ # code-agent helpers (repo map, patching, diagnostics)
βββ logging/ # JSONL events, traces, diffs
βββ config/ # configuration models and loading
βββ cli/ # Typer CLI and terminal UI
See ROADMAP.md. Native Anthropic and Gemini adapters are planned; today they are reachable via any OpenAI-compatible gateway.
Contributions are welcome. Please read CONTRIBUTING.md, run ruff and pytest before opening a pull request, and keep changes covered by tests.
Released under the MIT License.