-
Notifications
You must be signed in to change notification settings - Fork 0
Observability
Two layers: structured logs always, span export when you opt in.
The server logs through structlog as JSON, written to stderr so the stdio transport keeps stdout reserved for JSON-RPC. Every tool call emits a tool_call event with the tool name and duration; failures emit tool_call_failed with the error:
{"timestamp": "2026-05-31T12:34:56", "level": "info", "event": "tool_call", "tool": "list_files", "duration_ms": 1.42}Pipe stderr to your log aggregator.
Span export is opt-in. Set an OTLP collector endpoint and every tool call is exported as a span:
MCP_OTEL_ENDPOINT=https://otel.your-domain.com:4317
MCP_SERVICE_NAME=mcp-server-toolkitEach span is named tool.<name> and carries mcp.tool.name, mcp.tool.argument_count, mcp.tool.duration_ms, and mcp.tool.error when the handler raises. The span is created in Registry.call, so it wraps validation and execution and records exceptions. When MCP_OTEL_ENDPOINT is unset, tracing is a no-op and only the structured logs flow.
GET /health is always open and returns:
{"ok": true, "tools_registered": 6, "uptime_seconds": 12345.6}Use it as a readiness or liveness probe; the container image already wires it into a Docker HEALTHCHECK.
| Signal | Threshold | Why |
|---|---|---|
| 5xx rate | > 1% over 5 min | Internal errors |
Tool latency P99 (mcp.tool.duration_ms) |
> 5s | Downstream slowness |
| 401 rate | sustained spike | Possible credential attack |
| 429 rate | sustained | Rate limit too tight or abuse |
| Memory growth | sustained over 1h | Likely leak |
- Occasional single tool timeouts; the client retries.
- 401s from idle fuzzing of a public URL.
- Cold-start latency on the first request after a deploy.