-
Notifications
You must be signed in to change notification settings - Fork 0
Observability
sarmakska edited this page May 3, 2026
·
2 revisions
Three layers, all on by default.
Every tool call logs as JSON via structlog:
{"timestamp": "2026-05-03T12:34:56", "level": "info", "event": "tool_call", "tool": "search_docs", "duration_ms": 142, "user": "anon"}Pipe stdout to your log aggregator (BetterStack, Axiom, Datadog, Loki).
Set MCP_OTEL_ENDPOINT to your collector:
MCP_OTEL_ENDPOINT=https://otel.your-domain.com:4317
Each tool call gets a span. Spans include the tool name, duration, success/failure, and any custom attributes you add via tracer.start_span().
GET /health returns:
{"ok": true, "tools_registered": 7, "uptime_seconds": 12345}Use as a readiness probe in Kubernetes or as a uptime check from your monitoring service.
| Signal | Threshold | Why |
|---|---|---|
| 5xx rate | > 1% over 5 min | Internal errors |
| Tool latency P99 | > 5s | Downstream slowness |
| Auth failure rate | > 10/min | Possible attack |
| Memory growth | sustained over 1h | Likely leak |
- Single tool calls timing out occasionally. The retry on the client side handles it.
- 401s when nobody is in the office and someone is fuzzing your URL.
- Cold-start latency on the first request after a deploy.