The Open Control Plane for AI Agents
Framework-agnostic observability for AI agents—like LangGraph Studio, but works with ANY framework. Open-source alternative to Dynatrace, DataDog, and LangSmith.
The Observability Control Plane Race
In February 2026, Dynatrace repositioned observability as the "operating system for AI agents"—the control plane that coordinates execution, not just monitors it.
This creates vendor lock-in risks:
- Observability platforms become gatekeepers for agent deployment
- Framework-specific tools force architectural decisions
- Proprietary formats trap your data
- $100K-$500K+ annual licensing for production agents
Production teams need:
- Framework flexibility (teams use 2-3 frameworks, not one)
- Data sovereignty (your traces, your infrastructure)
- Cost control ($0 licensing vs $100K+/year)
- Migration safety (change frameworks without losing observability)
Market validation:
- 94% of production deployments need observability (InfoQ 2025)
- LangGraph rated S-tier specifically for visual debugging
- Vendor lock-in = #1 concern for production teams
As major vendors pivot observability from "insight" to "execution authority" (Dynatrace, Feb 2026), we're building the open alternative.
Control Plane Pattern:
- Observability isn't just monitoring—it's the coordination layer for agent operations
- Framework-agnostic: works with LangChain, CrewAI, AutoGen, raw Python
- Data sovereignty: your traces, your infrastructure, your control
What you get:
- Visual execution traces - See exactly what your agent did, step-by-step
- Step-level debugging - Inspect inputs, outputs, LLM calls, reasoning
- Production monitoring - Real-time alerts, cost tracking, quality metrics
- Framework-agnostic - One tool for all your agents (no vendor lock-in)
- Open-source - Apache 2.0 license, self-hosted, full control
Cost comparison (100 agents, 3 years):
- Agent Observability Kit: ~$10K-$18K (infrastructure only)
- Dynatrace: ~$300K-$900K (licensing + infrastructure)
- DataDog: ~$240K-$720K (licensing + infrastructure)
pip install agent-observability-kitfrom agent_observability import observe, trace, init_tracer
from agent_observability.span import SpanType
# Initialize
tracer = init_tracer(agent_id="my-agent")
# Decorate your functions
@observe(span_type=SpanType.AGENT_DECISION)
def choose_action(state):
# Your agent logic here
action = my_llm.predict(state)
return action
# Or use context managers
with trace("my_agent_run"):
result = choose_action(current_state)from agent_observability.integrations import LangChainCallbackHandler
# Add to your LangChain calls
handler = LangChainCallbackHandler(agent_id="my-agent")
chain.run(
input="query",
callbacks=[handler] # ← Automatic tracing!
)# Start the web UI
python server/app.py
# Open browser
open http://localhost:5000Every trace includes:
{
"trace_id": "tr_abc123",
"agent_id": "customer-service-agent",
"framework": "langchain",
"spans": [
{
"name": "classify_intent",
"span_type": "agent_decision",
"inputs": {"query": "Why was I charged twice?"},
"outputs": {"intent": "billing_issue"},
"llm_calls": [
{
"model": "claude-3-5-sonnet",
"prompt": "Classify this query: ...",
"response": "billing_issue",
"tokens": {"input": 234, "output": 12},
"latency_ms": 450,
"cost": 0.0023
}
],
"duration_ms": 520,
"status": "success"
}
]
}The Phase 2 dashboard now includes:
- Framework Distribution - Visual breakdown of traces by framework (🟦 🟩 🟧)
- Performance by Framework - Average latency and success rate per framework
- Active Adapters - Real-time status of detected frameworks
- Smart Filters - One-click filtering by framework
┌─────────────────────────────────────┐
│ Trace: Customer Service Flow │
├─────────────────────────────────────┤
│ │
│ [User Query] │
│ ↓ │
│ ┌─────────────┐ │
│ │ Classify │ 🟢 250ms │
│ │ Intent │ │
│ └─────────────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ Check │ 🟢 150ms │
│ │ Order │ │
│ └─────────────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ Generate │ 🟢 340ms │
│ │ Response │ │
│ └─────────────┘ │
│ ↓ │
│ [Response to User] │
│ │
└─────────────────────────────────────┘
Click any node to see:
- Full LLM prompt & response
- Input/output data
- Token usage & cost
- Error details (if failed)
from agent_observability.integrations import LangChainCallbackHandler
handler = LangChainCallbackHandler(agent_id="my-agent")
chain.run(input="...", callbacks=[handler])from agent_observability import init_tracer
# Auto-detects CrewAI!
tracer = init_tracer(agent_id="my-crew")
# Use CrewAI normally - automatically traced!
crew.kickoff()→ Full CrewAI Integration Guide
from agent_observability import init_tracer
# Auto-detects AutoGen!
tracer = init_tracer(agent_id="my-agents")
# All agent messages automatically traced!
user.initiate_chat(assistant, message="...")→ Full AutoGen Integration Guide
from agent_observability import observe
from agent_observability.span import SpanType
@observe(span_type=SpanType.AGENT_DECISION)
def my_agent_function(input):
return process(input)from agent_observability import init_tracer
# Auto-detects ALL frameworks!
tracer = init_tracer(agent_id="hybrid-system")
# LangChain, CrewAI, AutoGen - all in ONE trace!
langchain_chain.run(...) # Traced
crew.kickoff(...) # Traced
user.initiate_chat(...) # Traced
# View unified trace at http://localhost:5000projects/observability-toolkit/
├── src/agent_observability/
│ ├── tracer.py # Core tracing SDK
│ ├── storage.py # Trace persistence
│ ├── span.py # Data structures
│ └── integrations/ # Framework plugins
│ ├── langchain.py
│ └── openclaw.py
├── server/
│ ├── app.py # Flask web server
│ └── static/ # Web UI
│ ├── index.html
│ ├── trace-viewer.html
│ └── style.css
├── examples/
│ ├── basic_example.py
│ └── langchain_example.py
└── tests/
Core SDK:
- ✅ Universal tracing decorators (
@observe) - ✅ Context managers (
with trace()) - ✅ LLM call tracking
- ✅ Error capture
- ✅ JSON-based storage
Framework Integrations:
- ✅ LangChain callback handler
- ✅ CrewAI adapter (auto-detection)
- ✅ AutoGen adapter (auto-detection)
- ✅ OpenClaw native support
- ✅ Multi-framework tracing (single trace, multiple frameworks)
Web UI:
- ✅ Trace list with filtering
- ✅ Execution graph visualization
- ✅ Step-level inspection
- ✅ LLM call details
- ✅ Real-time updates
- ✅ Framework badges in UI (🟦 LangChain, 🟩 CrewAI, 🟧 AutoGen)
- ✅ Framework filters (show/hide by framework)
- ✅ Framework-specific detail panels
- ✅ Multi-framework insights dashboard
- ✅ Adapter status indicators
- ✅ Framework-specific color coding in timeline
- Interactive debugging (pause/resume traces)
- Trace comparison (before/after optimization)
- AI-powered root cause analysis
- Performance profiling
- Real-time dashboards
- Cost tracking & alerts
- Quality metrics (accuracy, latency, success rate)
- Anomaly detection (ML-based)
- Multi-tenancy
- Role-based access control
- Self-hosted deployment (Docker, K8s)
- PII redaction
- Compliance (SOC2, GDPR)
cd examples
python basic_example.pyThis generates several demo traces showing:
- Successful multi-step workflows
- Error handling
- LLM call tracking
- Performance metrics
export OPENAI_API_KEY="sk-..."
python langchain_example.pycd server
python app.py
# Open http://localhost:5000- <1% latency impact (async data collection)
- <5MB memory per 1000 traces
- No blocking I/O (background storage)
- Default: JSON files in
~/.openclaw/traces/ - Production: ClickHouse, TimescaleDB, or S3
- Retention: Configurable (default 90 days)
- Local-first: All data stored on your machine
- No telemetry: We don't collect anything
- Redaction: Optional PII masking (emails, SSNs, etc.)
We're in active development! Contributions welcome:
- Fork the repo
- Create a feature branch
- Add tests
- Submit PR
Priority areas:
- Framework integrations (CrewAI, AutoGen)
- Production monitoring features
- Performance optimizations
Apache 2.0 - See LICENSE
Inspired by:
- LangGraph Studio - Best-in-class visual debugging
- LangSmith - Production observability for LLMs
- OpenTelemetry - Distributed tracing standard
Built by Kai 🌊 (itskai.dev). Open-source, framework-agnostic observability for AI agents.
From Discovery #10:
"LangGraph is S-tier specifically because of state graph debugging and visual execution traces. The most-read Data Science Collective article in 2025 was about LangGraph debugging."
Visual debugging is why developers choose frameworks.
We're making that capability universal—no framework lock-in.
Questions? Open an issue or reach out via GitHub issues or itskai.dev
Star the repo if you find this useful! ⭐

