Template repository for building protoLabs A2A agents on LangGraph.
The purpose of this repo is to keep the boring parts — A2A spec
handling, cost/extension emission, tracing, release pipeline —
stable across every agent in the fleet, so forking an agent is
close to a rewrite of SOUL.md, graph/prompts.py, and
tools/lg_tools.py and not much else.
Canonical reference implementation: protoLabsAI/quinn. Quinn was the first agent built on this template — it's a good example of what a filled-in fork looks like end-to-end.
Try it in 5 minutes: clone, pip install -r requirements.txt,
python server.py, open http://localhost:7870, and walk the
setup wizard — no forking, no sed, no Docker required to get
your first agent talking. See the first-agent tutorial.
When you're ready to ship your own: click "Use this template" at the top of the GitHub repo, then follow Customize & deploy for the fork / rename / release-pipeline wiring.
| Concern | Where it lives | What it does |
|---|---|---|
| A2A server | a2a_handler.py |
JSON-RPC 2.0 over /a2a, SSE streaming, tasks/* lifecycle, push notifications, well-known agent card, dual token-shape parsing |
| Agent runtime | graph/agent.py, server.py |
LangGraph create_agent() wired to the A2A handler, with streaming token capture for cost-v1 |
| LLM gateway | graph/llm.py |
OpenAI-compatible client pointed at LiteLLM — swap models by editing the gateway config, not the fork |
| Subagents | graph/subagents/config.py |
DeerFlow-pattern delegation via a task() tool; one placeholder worker ships |
| Starter tools | tools/lg_tools.py |
Free, keyless tools so a fresh fork can demo real behaviour: echo, current_time, calculator (safe AST eval), web_search (DuckDuckGo), fetch_url |
| Tracing | tracing.py |
Langfuse trace_session with distributed a2a.trace propagation and the OTel cross-context-detach filter |
| Observability | metrics.py, audit.py |
Prometheus metrics with per-agent prefix, JSONL audit log with trace IDs |
| Output protocol | graph/output_format.py |
<scratch_pad> / <output> parsing so the model can think without it leaking to users |
| UI | chat_ui.py, static/ |
Gradio chat with PWA shell, dark theme, offline fallback |
| Release pipeline | .github/workflows/*.yml |
Autonomous semver bumps, GHCR image push, GitHub release with filtered notes, optional Discord post |
# 1. Get the code (no fork needed for a first run)
git clone https://github.com/protoLabsAI/protoAgent.git my-agent
cd my-agent
# 2. Install deps into a venv
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 3. Run the server — no env vars required
python server.py
# 4. Open the wizard — pick your endpoint, pick a model, name the
# agent, pick a persona preset, hit Launch. The chat UI appears
# on the same page.
open http://localhost:7870First-agent tutorial walks through every wizard step with screenshots.
Once you're happy and want to ship it as your own image in your own GHCR: Customize & deploy.
┌──────────────┐ A2A JSON-RPC + SSE ┌─────────────────┐
│ Consumer │ ──────────────────────────▶ │ a2a_handler │
│ (any A2A │ │ (FastAPI) │
│ client) │ ◀──── cost-v1 DataPart ─────│ │
└──────────────┘ └────────┬────────┘
│
▼
┌─────────────────┐
│ graph/agent.py │
│ (LangGraph │
│ create_agent) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ LiteLLM │ ← model selection
│ gateway │ lives here,
└─────────────────┘ not in code
The A2A handler never talks to the LLM directly — it submits a
message to the LangGraph runtime, which owns the tool loop, the
subagent task() delegation, and the structured-output protocol.
| URI | Declared on card | Emitted at runtime |
|---|---|---|
cost-v1 (https://protolabs.ai/a2a/ext/cost-v1) |
Yes | Yes — every terminal task carries a cost-v1 DataPart with token usage + durationMs |
a2a.trace propagation |
No (it's a protocol convention, not a card extension) | Yes — reads caller's Langfuse trace context from params.metadata["a2a.trace"] and nests this agent's trace under it |
Declare additional extensions on the card in
server.py::_build_agent_card when your agent's skills actually
mutate shared state (see effect-domain-v1 in the Workstacean
docs for when this applies).
The A2A handler supports both token shapes the spec permits:
Both produce Authorization: Bearer shared-secret on outgoing
webhooks. If your fork is getting 401s on callbacks, check which
shape the consumer is sending before changing anything —
_extract_push_token in a2a_handler.py reads both and the
test suite covers both.
| What | Where | How to use |
|---|---|---|
| Prometheus metrics | /metrics |
Scrape; metric prefix is AGENT_NAME_* (sanitised) |
| JSONL audit log | /sandbox/audit/audit.jsonl |
jq for forensic replay; every entry has trace_id |
| Langfuse traces | LANGFUSE_* env vars |
Trace tag is AGENT_NAME, so filter by tag to find this agent's runs |
| Container logs | docker logs <container> |
INFO is the default — LOG_LEVEL=DEBUG for more |
The included GitHub Actions pipeline is optional but opinionated.
- On every merge to
main→docker-publish.ymlbuilds and pushesghcr.io/protolabsai/<image>:latest+sha-<short>. Watchtower (or similar) can polllatestfor auto-deploy. - When a non-release PR merges →
prepare-release.ymlopens a "chore: release vX.Y.Z" bump PR, auto-merges it, and pushes a semver tag. - When a semver tag lands →
release.ymlbuilds and pushes the stable semver Docker tags, creates a GitHub release with filtered notes, and posts a Discord embed viascripts/post-release-notes.mjs.
All three workflows gate on github.repository == 'protoLabsAI/<name>' so they no-op on clones that haven't
updated the owner — avoids surprise releases on forks. Update
the repo check in all three files when forking.
- Python 3.12+
- Docker (for the bundled deployment)
- A LiteLLM-compatible OpenAI gateway somewhere on the network
(see
config/langgraph-config.yaml) - Optional: Langfuse, Prometheus, Discord webhook
protoAgent includes an end-to-end skill loop where successful subagent workflows are captured as reusable skills, retrieved automatically on future tasks, and periodically optimised by the skill curator.
| Component | Where it lives | What it does |
|---|---|---|
| Skill emission | graph/extensions/skills.py |
Captures task() results as SkillV1Artifact when emit_skill=True |
| Skill index | /sandbox/skills/index.jsonl |
JSONL store of accumulated skill recipes |
| Knowledge injection | graph/middleware/knowledge.py |
Queries index before each LLM call, injects top-k matching skills as context |
| Skill curator | graph/skills/curator.py |
Periodic agent that deduplicates, decays, and prunes the skill index |
# Dry-run — see what would change without touching the index
python -m graph.skills.curator --dry-run
# Full curation pass (reads index.jsonl, writes audit.jsonl)
python -m graph.skills.curatorThe curator applies a 90-day confidence half-life (confidence halves for every 90 days a skill goes unused), clusters near-duplicate skills by similarity and keeps the highest-confidence copy, and prunes any skill whose confidence has fallen below 0.2.
See docs/tutorials/skill-loop.md for a complete end-to-end example and cron setup.
This is a template repo — bugs and improvements to the shared
runtime (a2a_handler.py, graph/agent.py, extension support,
release pipeline) land here. Domain-specific agent logic lives
in the fork, not here.