QA platform for AI agents. Capture production traces, auto-generate chaos-tested regression suites, enforce agent contracts at the PR gate.
Tracq turns every production failure into a replayable test with fault injection, so the next version of your agent can't regress.
- Quick Start
- What Tracq Does
- Prerequisites
- Local Development
- SDK
- API
- Architecture
- Testing
- Production Deployment
- Scripts
- License
# 1. Start infrastructure
docker compose up -d
# 2. Start API
cd apps/api && PYTHONPATH=. uv run uvicorn app.main:app --reload --port 8000
# 3. Start frontend
cd apps/web && pnpm install && pnpm dev
# 4. Seed demo data
bash scripts/seed.shOpen localhost:3000 and log in with demo@tracq.dev / tracq_demo_2026!.
- Trace capture — Auto-instruments your agent via OpenTelemetry. Captures system prompts, tool calls, LLM responses, costs, and latencies across any framework.
- World Model — Temporal knowledge graph (Graphiti + FalkorDB) that learns your agent's behavior patterns and failure modes over time.
- Auto-regression tests — Every production failure automatically becomes a deduplicated, replayable test case. No manual curation.
- Chaos simulation — Replays your agent with real LLM calls against faulted tool responses: 429s, timeouts, schema drift, partial objects, rate limits.
- Agent contracts — Auto-inferred from traces. Declares tools, permissions, budgets, SLOs, and routing constraints. Versioned, diffable, enforceable.
- PR gate — Blocks merges when agent behavior regresses. Explainable diffs show exactly what changed: tools added, permissions widened, cost-per-success increased.
| Tool | Version | Install |
|---|---|---|
| Docker | 24+ | docker.com |
| Python | 3.12+ | python.org |
| uv | latest | curl -LsSf https://astral.sh/uv/install.sh | sh |
| Node.js | 20+ | nodejs.org |
| pnpm | 9+ | npm install -g pnpm |
docker compose up -dThis starts:
| Service | Port | Purpose |
|---|---|---|
| PostgreSQL | 5432 | Metadata (users, orgs, contracts, test suites) |
| ClickHouse | 8123 | Trace analytics (traces, tool_calls, llm_calls) |
| Redis | 6379 | Cache and sessions |
| FalkorDB | 6380 | Knowledge graph (World Model) |
| MinIO | 9090 (API), 9091 (console) | S3-compatible artifact storage |
| Temporal | 7233 | Workflow orchestration |
| Temporal UI | 8080 | Workflow dashboard |
Default credentials for all dev services: tracq / tracq_dev.
cd apps/api
set -a && source ../../.env && set +a
PYTHONPATH=. uv run uvicorn app.main:app --reload --port 8000API docs at localhost:8000/api/docs.
cd apps/web
pnpm install
pnpm devFrontend at localhost:3000.
cd apps/worker
PYTHONPATH=. uv run python -m tracq_worker.workerRequired for background test suite execution and change intelligence workflows.
bash scripts/seed.shCreates a demo account, 2 agents, 12 traces (including failures), inferred contracts, and compiled regression tests. Login: demo@tracq.dev / tracq_demo_2026!.
Copy the example and fill in values:
cp .env.production.example .envKey variables:
| Variable | Default (dev) | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://tracq:tracq_dev@localhost:5432/tracq |
PostgreSQL async connection |
CLICKHOUSE_URL |
http://tracq:tracq_dev@localhost:8123/tracq |
ClickHouse HTTP endpoint |
REDIS_URL |
redis://localhost:6379/0 |
Redis connection |
S3_ENDPOINT |
http://localhost:9090 |
MinIO/S3 endpoint |
JWT_SECRET |
(generate) | openssl rand -hex 32 |
TEMPORAL_ADDRESS |
localhost:7233 |
Temporal server |
Feature flags (all off by default): FF_TEMPORAL_WORKFLOW, FF_GITHUB_APP, FF_CHANGE_INTELLIGENCE, FF_ENFORCEMENT.
See apps/api/app/config.py for the full list.
pip install tracq # Core
pip install "tracq[openai]" # + OpenAI auto-instrumentation
pip install "tracq[anthropic]" # + Anthropic auto-instrumentation
pip install "tracq[langchain]" # + LangChain/LangGraph
pip install "tracq[all]" # Everythingimport tracq
tracq.init()
# That's it. Traces flow automatically for:
# OpenAI, Anthropic, LangChain, LangGraph, CrewAI, LlamaIndexSet TRACQ_API_KEY and TRACQ_BASE_URL (defaults to http://localhost:8000).
Async (recommended):
from tracq import AsyncLogRun
lr = AsyncLogRun(api_key="tr_xxx")
@lr.trace("support-agent")
async def handle_request(user_input):
result = await search_kb(user_input)
lr.tool_call("search_kb", input={"q": user_input}, output=result)
lr.llm_call("claude-sonnet-4", provider="anthropic",
prompt_tokens=500, completion_tokens=100, cost_usd=0.01)
return "Response"Sync:
from tracq import LogRun
lr = LogRun(api_key="tr_xxx")
@lr.trace("research-agent")
def handle_request(user_input):
lr.tool_call("search", input={"q": user_input}, output={"results": 5})
lr.llm_call("gpt-4o", provider="openai",
prompt_tokens=500, completion_tokens=100)
return "Response"Context manager:
with lr.start_trace("my-agent", name="custom-trace") as trace:
trace.full_input = "User input"
lr.tool_call("search", input={"q": "test"})
trace.full_output = "Output"Events and constraints:
lr.event("payment_validated", {"amount": 99.99, "currency": "USD"})
lr.constraint("max_refund_amount", 500.0, "Maximum refund without manager approval")Base path: /api/v1. Interactive docs at /api/docs.
Auth:
POST /auth/signup— Create account (email, name, org_name, password)POST /auth/login— Login, returns JWTPOST /auth/api-keys— Create API key with scopes (trace:write,trace:read,test:read,contract:read)
Traces:
POST /traces/— Ingest a single tracePOST /traces/bulk— Bulk ingestGET /traces/— List traces with pagination
Contracts:
POST /contracts/infer/{agent_id}— Auto-infer contract from production traces
Tests:
POST /tests/compile/{agent_id}/{trace_id}— Compile regression test from a tracePOST /tests/suites/{suite_id}/run— Execute test suiteGET /tests/runs/{run_id}— Poll run results
Simulation:
POST /simulation/— Run chaos simulation with real LLM calls
Other: Agents, World Model, Fault Taxonomy, Conversations, Gate Decisions, Change Sets, Issues, Proactive Scenarios, Dashboard.
apps/
api/ -> FastAPI backend (Python 3.12, Pydantic v2)
web/ -> Next.js 16, React 19, Tailwind 4, shadcn/ui
worker/ -> Temporal workers (suite runs, change intelligence)
packages/
sdk-python/ -> Python SDK (async/sync clients, OTel exporter, auto-instrumentation)
contracts/ -> Agent Contract schema (agent.yaml spec)
scripts/ -> seed.sh, deploy.sh, demo_sdk.py
Data stores:
- ClickHouse — Traces, tool calls, LLM calls (OLAP, columnar, fast aggregation)
- PostgreSQL 16 — Users, orgs, contracts, test suites, gate decisions (relational, ACID)
- FalkorDB + Graphiti — World Model (temporal knowledge graph of agent behavior)
- Redis — Cache, sessions, rate limiting
- MinIO — Replay artifacts, evidence bundles (S3-compatible)
Orchestration: Temporal for durable workflow execution (test suite runs, change intelligence pipelines).
# Run all API tests
cd apps/api && uv run pytest
# Run a specific test
uv run pytest tests/test_auth_rbac.py
# Run worker tests
cd apps/worker && uv run pytest
# Run contract schema tests
cd packages/contracts && uv run pytest
# Run SDK tests
cd packages/sdk-python && uv run pytestTests use real databases (no mocks). Make sure docker compose up -d is running.
cp .env.production.example .env.productionGenerate all passwords:
openssl rand -hex 32 # Use for POSTGRES_PASSWORD, CLICKHOUSE_PASSWORD, REDIS_PASSWORD, JWT_SECRETSet your DOMAIN (e.g., tracq.dev).
bash scripts/deploy.shThis builds and starts all services via docker-compose.prod.yml with Caddy as the reverse proxy (automatic HTTPS via Let's Encrypt).
Everything from the dev stack, plus:
- API container (FastAPI + Uvicorn)
- Web container (Next.js standalone build)
- Worker container (Temporal workers)
- Caddy reverse proxy (ports 80/443, automatic TLS)
| Script | Usage | What it does |
|---|---|---|
scripts/seed.sh |
bash scripts/seed.sh |
Seeds demo account, agents, traces, contracts, and tests |
scripts/deploy.sh |
bash scripts/deploy.sh |
Production deployment with Docker Compose + Caddy |
scripts/demo_sdk.py |
API_KEY=tr_xxx python scripts/demo_sdk.py |
Demonstrates SDK decorator, context manager, and error capture |
scripts/test_with_metaforms.py |
API_KEY=tr_xxx python scripts/test_with_metaforms.py |
Sends realistic multi-agent traces (5 agents, tool calls, failures) |
Proprietary.