Skip to content

mingbo-yang/EvoAgent

Repository files navigation

EvoAgent

An async, modular LLM agent framework for tool use, deterministic retrieval, multi-agent orchestration, and reproducible evaluation.

English δΈ­ζ–‡ Python Tests License Style


EvoAgent is a Python framework for building autonomous LLM agents around a single, inspectable ReAct loop. It ships a broad built-in toolset, a deterministic-first code-retrieval stack, an MCP client, parallel sub-agents, real token-level streaming, crash-safe resume, and an OpenTelemetry-compatible tracing layer β€” all verified end-to-end against a live model API.

Every component (models, tools, memory, retrieval, workflow nodes, evaluators) is replaceable, and every run is traceable, making EvoAgent suitable for both production automation and agent research.

Table of Contents

Features

  • Canonical ReAct engine β€” one read–decide–act loop with permission checks, context compaction, cost tracking, and a provider-safe message history (every tool_calls turn is answered).
  • Rich tool suite β€” file editing, patching with undo, shell/Python execution, test running, glob, AST symbol outlines, deterministic code search, and web fetch/search with SSRF protection.
  • Deterministic-first retrieval β€” symbol-aware code chunking + keyword ranking, with an optional persistent vector store and hashing embeddings layered on top.
  • MCP client β€” connect to any Model Context Protocol server over stdio JSON-RPC and expose its tools to the agent.
  • Parallel sub-agents β€” a task tool that fans out independent sub-tasks to fresh, isolated agents running concurrently.
  • Real streaming β€” token-level SSE with streamed tool-call assembly (no dropped tool calls).
  • Interrupt & steering β€” inject instructions, stop after the current tool, cancel a long-running command, or forbid edits to specific files, all mid-run.
  • Crash recovery β€” atomic per-run checkpoints with agent.resume(run_id).
  • Reliability & safety β€” HTTP retry with Retry-After backoff, recursive secret redaction before any persistence, and a deny/ask/allow permission policy.
  • Observability β€” dependency-optional OpenTelemetry tracing for run, LLM, and tool spans.
  • Evaluation β€” a Docker-free SWE-bench-style harness that produces a patch and verifies it against the instance's tests on a clean checkout.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Agent  -  memory Β· checkpoints Β· tracing Β· steering β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ReAct Engine  -  loop Β· compaction Β· cost           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Model Router  -  OpenAI-compatible / DeepSeek       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Tool Registry  -  files Β· shell Β· web Β·             β”‚
β”‚  code_search Β· MCP Β· task                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Permission Policy  +  Retrieval / Vector Store      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Installation

Requires Python 3.11+.

git clone https://github.com/mingbo-yang/EvoAgent.git
cd EvoAgent
pip install -e .

# optional extras
pip install -e ".[dev]"            # tests, linting
pip install -e ".[observability]"  # OpenTelemetry tracing

Quick Start

Set an API key (DeepSeek or any OpenAI-compatible endpoint):

export DEEPSEEK_API_KEY="sk-..."

Run a one-shot task from Python:

import asyncio
from pathlib import Path

from evoagent.core.agent import Agent
from evoagent.models.schema import ModelConfig
from evoagent.models.factory import ProviderFactory
from evoagent.models.router import ModelRouter
from evoagent.tools.builtin import create_builtin_registry


async def main():
    workspace = Path(".")
    provider = ProviderFactory.create(ModelConfig(
        provider="deepseek",
        model="deepseek-chat",
        base_url="https://api.deepseek.com/v1",
        api_key_env="DEEPSEEK_API_KEY",
    ))
    router = ModelRouter(providers={"default": provider})
    registry = create_builtin_registry(workspace, auto_approve=True)

    agent = Agent(model_router=router, tool_registry=registry, workspace=workspace)
    result = await agent.run("Find the failing test in this project and fix it.")
    print(result.final_answer)


asyncio.run(main())

Or from the terminal:

evoagent run "Summarize the structure of this repository"
evoagent chat        # interactive session

Configuration

EvoAgent reads configuration from evoagent.yaml and environment variables. Initialize a project with:

evoagent init

API keys are referenced by environment-variable name (api_key_env) and are never written to source, logs, traces, or session files.

Variable Purpose
DEEPSEEK_API_KEY DeepSeek API key (required for the model)
TAVILY_API_KEY Optional Tavily search API key β€” enables the web_search fallback
EVOAGENT_EGRESS_ALLOWLIST Optional comma-separated host allowlist for web tools

Web search

The web_search tool works without any API key by scraping Bing and DuckDuckGo HTML results. For higher reliability you can optionally enable the Tavily search API as a fallback β€” it is only called when the free HTML backends return nothing or are unreachable, so it conserves your Tavily credits:

export TAVILY_API_KEY="tvly-..."   # optional; never commit this value

When set, search resolution is: Bing β†’ DuckDuckGo β†’ Tavily. The key is read only from the environment and is never written to source, logs, or sessions.

CLI

Command Description
evoagent run <task> Run a one-shot agent task
evoagent chat Start an interactive chat session
evoagent code <task> Run the code agent on a software task
evoagent eval <suite> Run an evaluation benchmark suite
evoagent init Initialize EvoAgent in the current directory
evoagent config Manage configuration
evoagent memory Manage agent memories
evoagent trace Inspect execution traces

Built-in Tools

Category Tools
Files read_file, write_file, edit_file, multi_edit, apply_patch, undo_last
Navigation list_directory, grep, glob, outline, code_search
Execution bash, python, run_tests
Version control git_status, git_diff
Planning write_todos, list_todos
Web web_fetch, web_search (SSRF-guarded)
Orchestration task (parallel sub-agents), plus any MCP server tools

Advanced Capabilities

Interrupt & steering
from evoagent.core.steering import SteeringController

steering = SteeringController()
agent = Agent(..., steering=steering)

steering.inject("Also update the changelog")  # queue an instruction
steering.forbid_file("config.py")             # protect a file
steering.request_stop()                       # stop after the current tool
steering.cancel()                             # cancel an in-flight tool
Crash recovery & resume
agent = Agent(..., checkpoint_dir=".runs")
result = await agent.run("Long multi-step task")
# After a crash/restart:
result = await agent.resume(result.run_id, follow_up="Continue where you left off")
MCP client
from evoagent.mcp import MCPClient, register_mcp_tools

client = MCPClient(["python", "my_mcp_server.py"])
await register_mcp_tools(registry, client)   # tools registered under mcp__*
Observability
from evoagent.observability import Tracer, configure_otel

configure_otel("evoagent")           # if opentelemetry SDK is installed
tracer = Tracer(use_otel=True)
agent = Agent(..., tracer=tracer)
# tracer.spans_named("tool.execute") -> recorded spans

Testing

pip install -e ".[dev]"
ruff check evoagent tests
pytest -q

The suite contains 649 tests. Every major capability is additionally verified end-to-end against a live model API rather than mocks alone.

Security Model

  • Permission policy β€” deny > ask > allow, with safe defaults that block destructive shell commands and writes to system paths.
  • Workspace sandboxing β€” file tools reject paths that escape the workspace; glob rejects .. traversal.
  • Egress protection β€” web tools block requests to private, loopback, link-local, and reserved addresses (re-checked on every redirect) and honor an optional host allowlist.
  • Secret redaction β€” API keys, tokens, and credentials are recursively redacted before any logging, tracing, or session persistence.

EvoAgent is research-grade software. Review the Security Policy before running it against untrusted inputs or in production.

Project Layout

evoagent/
β”œβ”€β”€ core/            # ReAct engine, agent, steering, checkpoints, cost, redaction
β”œβ”€β”€ models/          # provider abstraction, OpenAI-compatible + DeepSeek, streaming
β”œβ”€β”€ tools/           # built-in tool registry and implementations
β”œβ”€β”€ sandbox/         # permission policy + egress (SSRF) protection
β”œβ”€β”€ retrieval/       # code retriever, vector store, embeddings, keyword index
β”œβ”€β”€ rag/             # document loading, chunking, query engine
β”œβ”€β”€ mcp/             # Model Context Protocol stdio client
β”œβ”€β”€ observability/   # OpenTelemetry-compatible tracing
β”œβ”€β”€ conversation/    # sessions, runtime, context compaction
β”œβ”€β”€ memory/          # experience/reflection memory store
β”œβ”€β”€ planning/        # planner, executor, critic, reflector
β”œβ”€β”€ workflow/        # workflow graph + runtime
β”œβ”€β”€ multi_agent/     # multi-agent roles and protocols
β”œβ”€β”€ eval/            # evaluation harness + SWE-bench-style runner
β”œβ”€β”€ skills/          # reusable agent skills
β”œβ”€β”€ code/            # code-agent helpers (repo map, patching, diagnostics)
β”œβ”€β”€ logging/         # JSONL events, traces, diffs
β”œβ”€β”€ config/          # configuration models and loading
└── cli/             # Typer CLI and terminal UI

Roadmap

See ROADMAP.md. Native Anthropic and Gemini adapters are planned; today they are reachable via any OpenAI-compatible gateway.

Contributing

Contributions are welcome. Please read CONTRIBUTING.md, run ruff and pytest before opening a pull request, and keep changes covered by tests.

License

Released under the MIT License.

About

A research-friendly and production-ready LLM agent framework for tool use, memory evolution, workflow orchestration, multi-agent collaboration, and reproducible evaluation.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages