A research-and-verification pipeline coordinated by an in-process A2A registry and client. Five agents — Planner, Searcher, Reader, FactChecker, Synthesizer — run on a mix of runtimes (PocketFlow planner, smolagents search/read, LangGraph fact-check loop, Pydantic AI synthesis). The system uses live web search and page extraction instead of a local RAG corpus: it decomposes queries into claims, gathers evidence from the public web, verifies claims, and renders a structured markdown report.
┌── Searcher (web search)
Planner ──► FactChecker ──┤── Reader (fetch + extract) ──► … loop … ──► Synthesizer
└──────────────────────────────────────────────────┘
| Agent | Runtime | Role | Output |
|---|---|---|---|
| Planner | PocketFlow | Decomposes the user query into claims and seed search queries | claims, seed_queries |
| Searcher | smolagents | Parallel Tavily + DuckDuckGo search | hits, errors |
| Reader | smolagents | Fetches URLs and extracts main text (trafilatura) | pages |
| FactChecker | LangGraph | Orchestrates search/read/LLM verify loop until evidence converges or search exhausts | verified_claims, sources |
| Synthesizer | Pydantic AI | Structured report from verified claims + citations | ReportOutput → markdown |
Orchestration: workflow/coordinator.py — run_research_async / run_research_sync drive the pipeline via A2AClient and AgentRegistry.
A2A: a2a_research.a2a — role-scoped AgentExecutor registration, agent cards, in-process task store.
Evidence: a2a_research.tools — web_search, fetch_and_extract.
UI: Mesop web app (src/a2a_research/ui/app.py).
# 1. Install dependencies (use make dev to also setup pre-commit hooks)
make install
# Or: uv sync --all-groups
# 2. Configure credentials
# macOS/Linux: cp .env.example .env
# Windows PowerShell: Copy-Item .env.example .env
# Edit .env — set LLM_API_KEY, optional TAVILY_API_KEY for better search
# 3. Start the Mesop UI
make mesop
# Or: uv run mesop src/a2a_research/ui/app.py
# Opens at http://localhost:32123
# Run tests (no API key required for unit tests)
make test
# Or: uv run pytestAll settings are environment variables. The LLM stack is provider-agnostic (same LLM_* keys drive Planner, FactChecker, Searcher/Reader hosts, and the Pydantic AI Synthesizer).
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
Vendor: openrouter, openai, anthropic, google, ollama |
openrouter |
LLM_MODEL |
Model name (provider-specific) | openrouter/elephant-alpha |
LLM_BASE_URL |
OpenAI-compatible base URL (OpenRouter/Ollama); optional for native Anthropic/Google | see .env.example |
LLM_API_KEY |
API key for your chosen provider | (required for cloud) |
| Variable | Description | Default |
|---|---|---|
TAVILY_API_KEY |
Tavily API key; blank = DuckDuckGo-only search | (empty) |
SEARCH_MAX_RESULTS |
Per-provider hit cap (Tavily + DDG merged) | 5 |
RESEARCH_MAX_ROUNDS |
Max FactChecker loop rounds | 3 |
WORKFLOW_TIMEOUT |
Coordinator timeout (seconds) | 180 |
| Variable | Description | Default |
|---|---|---|
LOG_LEVEL |
DEBUG, INFO, WARNING, ERROR |
INFO |
MESOP_PORT |
Mesop web server port | 32123 |
OpenRouter (default):
LLM_PROVIDER=openrouter
LLM_MODEL=openrouter/elephant-alpha
LLM_API_KEY=sk-or-...OpenAI:
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
LLM_API_KEY=sk-...Ollama (local, no API key):
LLM_PROVIDER=ollama
LLM_MODEL=llama3.2
LLM_BASE_URL=http://localhost:11434/v1
EMBEDDING_PROVIDER=ollama
EMBEDDING_BASE_URL=http://localhost:11434
EMBEDDING_API_KEY=Anthropic:
LLM_PROVIDER=anthropic
LLM_MODEL=claude-sonnet-4-20250514
LLM_API_KEY=sk-ant-...User query
│
▼
Planner ──► LLM ──► claims + seed_queries
│
▼
FactChecker (LangGraph loop)
│ Searcher ──► Tavily + DDG ──► URLs
│ Reader ──► fetch + trafilatura ──► page text
│ Verify ──► LLM ──► verdicts / follow-up queries
│ (repeat until converged, max rounds, or search exhausted)
│
▼
Synthesizer ──► LLM (structured) ──► ReportOutput → markdown
Planner, FactChecker verification, and Synthesizer call get_llm() in providers.py. Searcher/Reader use smolagents with OpenAI-compatible models from the same LLM_* settings.
Public API:
from a2a_research.workflow import run_research_sync, run_research_asyncImplementation lives in src/a2a_research/workflow/coordinator.py (linear coordinator over A2AClient). Older PocketFlow builder/adapter modules may still exist for experiments; the Mesop UI and tests target run_research_async.
The Mesop app exposes five sections:
- Query input — textarea at the bottom; submitting triggers the full pipeline
- Agent timeline — per-role card for Planner, Searcher, Reader, FactChecker, Synthesizer (PENDING → RUNNING → COMPLETED/FAILED)
- Verified claims — verdict badges (✅ SUPPORTED / ❌ REFUTED /
⚠️ INSUFFICIENT_EVIDENCE / …), confidence, sources, snippets - Sources panel — deduplicated URLs from the FactChecker
- Final report — markdown from the Synthesizer (
session.final_report)
ResearchSession is the source of truth for timeline, errors, and outputs.
src/a2a_research/
├── agents/ # Planner (pocketflow), Searcher/Reader (smolagents), FactChecker (langgraph), Synthesizer (pydantic_ai)
├── a2a/ # AgentRegistry, A2AClient, agent cards, task helpers
├── workflow/ # run_research_* coordinator
├── tools/ # web_search, fetch_and_extract
├── models/ # ResearchSession, Claim, AgentRole, ReportOutput, WebSource, …
├── providers.py # get_llm() — shared LLM vendor abstraction
├── ui/ # Mesop web app (app.py, components, …)
└── settings.py # pydantic-settings (`LLM_*`, Tavily, timeouts, …)
data/corpus/ # Optional sample markdown (not used by the default web pipeline)
tests/ # Pytest suite (no API key required for unit tests)
src/a2a_research/agents/ also ships three reference basic chat agents built on alternative frameworks — not wired into the research pipeline by default, but ready to be registered as A2A handlers when you want to experiment with a different runtime.
| Folder | Framework | Canonical primitive | Highlight |
|---|---|---|---|
agents/langgraph/ |
LangGraph | StateGraph + MessagesState + InMemorySaver |
Multi-turn memory via thread_id=session.id |
agents/pydantic_ai/ |
Pydantic AI | Agent(model, instructions=…) with OpenAIChatModel |
Typed, deps_type for request-scoped context |
agents/smolagents/ |
smolagents | ToolCallingAgent(tools=[], …) with OpenAIServerModel |
No Python execution; see folder README for the CodeAgent security note |
Each folder exposes the same surface:
from a2a_research.agents.langgraph import chat_invoke # or pydantic_ai / smolagents
from a2a_research.models import ResearchSession
session = ResearchSession(query="What is RAG?")
result = chat_invoke(session)
print(result.raw_content)Run the standalone CLI demo per framework:
uv run python -m a2a_research.agents.langgraph "hi"
uv run python -m a2a_research.agents.pydantic_ai "hi"
uv run python -m a2a_research.agents.smolagents "hi"Replace a default agent by registering an AgentExecutor factory (a zero-argument callable that returns a new executor instance from a2a.server.agent_execution):
from a2a_research.a2a import register_executor_factory
from a2a_research.agents.pocketflow.planner.main import PlannerExecutor
from a2a_research.models import AgentRole
register_executor_factory(AgentRole.PLANNER, PlannerExecutor)Use any class or zero-argument callable that produces an executor; the built-in defaults are registered from a2a_research.a2a.registry._register_defaults.
See each folder's README.md for framework-specific details (streaming hooks, multi-turn patterns, DI, and security notes).
# Start UI (ensure .env has LLM_API_KEY; optional TAVILY_API_KEY)
make mesop
# Open http://localhost:32123from a2a_research.workflow import run_research_sync
session = run_research_sync("What is retrieval-augmented generation?")
print(session.final_report)
for claim in session.claims:
print(f"{claim.verdict.value} ({claim.confidence:.0%}): {claim.text}")
print(session.error)The project includes a self-documenting Makefile. Run make or make help to see all available commands:
$ make
install Install package with uv
dev Full dev setup: install + activate pre-commit hooks
test Run pytest suite with coverage
watch Run pytest in watch mode (re-runs on file changes)
lint Run ruff linter
format Format code with ruff
format-check Check formatting without modifying files
typecheck Run mypy type checker
typecheck-ty Run ty type checker
check Run all quality checks (no tests)
all Run tests + all quality checks (CI-ready)
clean Remove build artifacts and cache directories
mesop Start Mesop UI (with MESOP_STATE_SESSION_BACKEND=memory)
htmlcov Generate HTML coverage reportSetup for development:
make dev # Install + activate pre-commit hooksDuring development:
make test # Run tests
make watch # Run tests in watch mode (TDD)Before committing:
make check # Run lint + typecheck + format-check (fast)
make all # Run everything including tests (CI-ready)Run the UI:
make mesop # Start the Mesop UI dev serverYou can also run tools directly without Make:
uv run ruff check src/ tests/ # lint
uv run ruff format src/ tests/ # format
uv run mypy src/ # type check (strict py311)
uv run ty check src/ # type check with ty
uv run pytest # run test suiteInstall pre-commit hooks to catch issues before pushing:
pre-commit installGitHub Actions runs linting, formatting checks, type checking, and tests on every push and pull request to main. See .github/workflows/ci.yml for details.