Skip to content

Commit a73f60b

Browse files
authored
feat: enhance telemetry with Prometheus metrics and comprehensive instrumentation (#592)
* feat: enhance telemetry with Prometheus metrics and comprehensive instrumentation - Add Prometheus HTTP server for metrics exposure on configurable port - Add comprehensive metrics collection: - Tool execution counters and duration histograms - Active conversation gauges - LLM request counters by provider/model/success - Add OpenAI and Anthropic API auto-instrumentation - Add distributed tracing to chat.step, API v2 operations, and tool execution - Update documentation with Prometheus setup, metrics examples, and queries - Add telemetry configuration via PROMETHEUS_PORT and PROMETHEUS_ADDR env vars This provides comprehensive observability for gptme operations including: - Performance monitoring and bottleneck identification - Resource usage tracking - API call success rates and latencies - Tool usage patterns and execution times * minor fixes * fix: prepare for otlp setup * fix: fixed typing * fix: improve tool call telemetry
1 parent 92b6818 commit a73f60b

File tree

11 files changed

+321
-41
lines changed

11 files changed

+321
-41
lines changed

.github/workflows/lint.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
- name: Install dependencies
4141
run: |
4242
make build
43-
poetry install -E server -E browser
43+
poetry install -E server -E browser -E telemetry
4444
poetry run pip install tomli tomli_w
4545
- name: Typecheck
4646
run: |

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ repos:
1515
rev: v1.12.0
1616
hooks:
1717
- id: mypy
18-
additional_dependencies: [types-tabulate, types-docutils, tomli, tomli_w, opentelemetry-api, opentelemetry-sdk]
18+
additional_dependencies: [types-tabulate, types-docutils, tomli, tomli_w, opentelemetry-api, opentelemetry-sdk, prometheus-client, prompt_toolkit, click, pytest, openai, rich, tomlkit]
1919
args: [--ignore-missing-imports, --check-untyped-defs]
2020
- repo: local
2121
hooks:

docs/contributing.rst

Lines changed: 80 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,37 +67,113 @@ To enable telemetry during development:
6767
-p 9411:9411 \
6868
cr.jaegertracing.io/jaegertracing/jaeger:latest
6969
70-
3. Set the telemetry environment variable:
70+
3. (Optional) Run Prometheus for metrics collection:
71+
72+
.. code-block:: bash
73+
74+
# Simple default Prometheus
75+
docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
76+
77+
# Or with custom config to scrape gptme metrics on port 8000
78+
cat > scripts/prometheus.yml << EOF
79+
global:
80+
scrape_interval: 15s
81+
scrape_configs:
82+
- job_name: 'gptme'
83+
static_configs:
84+
- targets: ['host.docker.internal:8000']
85+
metrics_path: '/metrics'
86+
EOF
87+
88+
docker run --rm --name prometheus \
89+
-p 9090:9090 \
90+
-v $(pwd)/scripts/prometheus.yml:/etc/prometheus/prometheus.yml \
91+
prom/prometheus --enable-feature=otlp-write-receive
92+
93+
4. Set the telemetry environment variables:
7194
7295
.. code-block:: bash
7396
7497
export GPTME_TELEMETRY_ENABLED=true
7598
export OTLP_ENDPOINT=http://localhost:4317 # optional (default)
99+
export PROMETHEUS_PORT=8000 # optional (default)
100+
export PROMETHEUS_ADDR=0.0.0.0 # optional (default: localhost, use 0.0.0.0 for Docker access)
76101
77-
4. Run gptme:
102+
5. Run gptme:
78103
79104
.. code-block:: bash
80105
81106
poetry run gptme 'hello'
82107
# or gptme-server
83108
poetry run gptme-server
84109
85-
5. View traces in Jaeger UI:
110+
6. View data:
86111
87-
You can view traces in the Jaeger UI at http://localhost:16686.
112+
- **Traces**: Jaeger UI at http://localhost:16686
113+
- **Metrics**: Prometheus UI at http://localhost:9090
114+
- **Raw metrics**: Direct metrics endpoint at http://localhost:8000/metrics
88115
89116
Once enabled, gptme will automatically:
90117
91118
- Trace function execution times
92119
- Record token processing metrics
93120
- Monitor request durations
94121
- Instrument Flask and HTTP requests
122+
- Expose Prometheus metrics at `/metrics` endpoint
95123
96124
The telemetry data helps identify:
97125
98126
- Slow operations and bottlenecks
99127
- Token processing rates
100128
- Tool execution performance
129+
- Resource usage patterns
130+
131+
Available Metrics
132+
~~~~~~~~~~~~~~~~~
133+
134+
The following metrics are automatically collected:
135+
136+
- ``gptme_tokens_processed_total``: Counter of tokens processed by type
137+
- ``gptme_request_duration_seconds``: Histogram of request durations by endpoint
138+
- ``gptme_tool_calls_total``: Counter of tool calls made by tool name
139+
- ``gptme_tool_duration_seconds``: Histogram of tool execution durations by tool name
140+
- ``gptme_active_conversations``: Gauge of currently active conversations
141+
- ``gptme_llm_requests_total``: Counter of LLM API requests by provider, model, and success status
142+
- HTTP request metrics (from Flask instrumentation)
143+
- OpenAI/Anthropic API call metrics (from LLM instrumentations)
144+
145+
Example Prometheus Queries
146+
~~~~~~~~~~~~~~~~~~~~~~~~~~
147+
148+
Here are some useful Prometheus queries for monitoring gptme:
149+
150+
.. code-block:: promql
151+
152+
# Average tool execution time by tool
153+
rate(gptme_tool_duration_seconds_sum[5m]) / rate(gptme_tool_duration_seconds_count[5m])
154+
155+
# Most used tools
156+
topk(10, rate(gptme_tool_calls_total[5m]))
157+
158+
# LLM request success rate
159+
rate(gptme_llm_requests_total{success="true"}[5m]) / rate(gptme_llm_requests_total[5m])
160+
161+
# Tokens processed per second
162+
rate(gptme_tokens_processed_total[5m])
163+
164+
# Active conversations
165+
gptme_active_conversations
166+
167+
# Request latency percentiles
168+
histogram_quantile(0.95, rate(gptme_request_duration_seconds_bucket[5m]))
169+
170+
Environment Variables
171+
~~~~~~~~~~~~~~~~~~~~~
172+
173+
- ``GPTME_TELEMETRY_ENABLED``: Enable/disable telemetry (default: false)
174+
- ``OTLP_ENDPOINT``: OTLP endpoint for traces (default: http://localhost:4317)
175+
- ``PROMETHEUS_PORT``: Port for Prometheus metrics endpoint (default: 8000)
176+
- ``PROMETHEUS_ADDR``: Address for Prometheus metrics endpoint (default: localhost, use 0.0.0.0 for Docker access)
101177
102178
Release
103179
-------

gptme/chat.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,7 @@ def _wait_for_tts_if_enabled() -> None:
336336
stop()
337337

338338

339+
@trace_function(name="chat.step", attributes={"component": "chat"})
339340
def step(
340341
log: Log | list[Message],
341342
stream: bool,

gptme/llm/llm_openai.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,7 @@ def chat(messages: list[Message], model: str, tools: list[ToolSpec] | None) -> s
185185
api_model = model if is_proxy else base_model
186186

187187
from openai import NOT_GIVEN # fmt: skip
188+
from openai.types.chat import ChatCompletionMessageToolCall # fmt: skip
188189

189190
messages_dicts, tools_dict = _prepare_messages_for_api(messages, model, tools)
190191

@@ -201,6 +202,7 @@ def chat(messages: list[Message], model: str, tools: list[ToolSpec] | None) -> s
201202
result = []
202203
if choice.finish_reason == "tool_calls":
203204
for tool_call in choice.message.tool_calls or []:
205+
assert isinstance(tool_call, ChatCompletionMessageToolCall)
204206
result.append(
205207
f"@{tool_call.function.name}({tool_call.id}): {tool_call.function.arguments}"
206208
)

gptme/server/api_v2_sessions.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@
1111
import threading
1212
import uuid
1313
from collections import defaultdict
14+
from collections.abc import Generator
1415
from dataclasses import dataclass, field
1516
from datetime import datetime, timedelta
1617
from enum import Enum
1718
from pathlib import Path
18-
from collections.abc import Generator
1919

2020
import flask
2121
from dotenv import load_dotenv
@@ -27,6 +27,7 @@
2727
from ..llm.models import get_default_model
2828
from ..logmanager import LogManager, prepare_messages
2929
from ..message import Message
30+
from ..telemetry import trace_function
3031
from ..tools import ToolUse, get_tools, init_tools
3132
from .api_v2_common import ErrorEvent, EventType, msg2dict
3233
from .openapi_docs import (
@@ -168,6 +169,7 @@ def _append_and_notify(manager: LogManager, session: ConversationSession, msg: M
168169
)
169170

170171

172+
@trace_function("api_v2.step", attributes={"component": "api_v2"})
171173
def step(
172174
conversation_id: str,
173175
session: ConversationSession,
@@ -353,6 +355,7 @@ def start_tool_execution(
353355

354356
# This function would ideally run asynchronously to not block the request
355357
# For simplicity, we'll run it in a thread
358+
@trace_function("api_v2.execute_tool", attributes={"component": "api_v2"})
356359
def execute_tool_thread():
357360
config = Config.from_workspace(workspace=chat_config.workspace)
358361
config.chat = chat_config

gptme/telemetry.py

Lines changed: 105 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,20 +26,31 @@
2626
_meter = None
2727
_token_counter = None
2828
_request_histogram = None
29+
_tool_counter = None
30+
_tool_duration_histogram = None
31+
_active_conversations_gauge = None
32+
_llm_request_counter = None
2933

3034
TELEMETRY_AVAILABLE = False
3135
TELEMETRY_IMPORT_ERROR = None
3236

3337
try:
3438
from opentelemetry import metrics, trace # fmt: skip
35-
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter # fmt: skip
39+
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
40+
OTLPSpanExporter, # fmt: skip
41+
)
3642
from opentelemetry.exporter.prometheus import PrometheusMetricReader # fmt: skip
43+
from opentelemetry.instrumentation.anthropic import (
44+
AnthropicInstrumentor, # fmt: skip
45+
)
3746
from opentelemetry.instrumentation.flask import FlaskInstrumentor # fmt: skip
47+
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor # fmt: skip
3848
from opentelemetry.instrumentation.requests import RequestsInstrumentor # fmt: skip
3949
from opentelemetry.sdk.metrics import MeterProvider # fmt: skip
4050
from opentelemetry.sdk.resources import Resource # fmt: skip
4151
from opentelemetry.sdk.trace import TracerProvider # fmt: skip
4252
from opentelemetry.sdk.trace.export import BatchSpanProcessor # fmt: skip
53+
from prometheus_client import start_http_server # fmt: skip
4354

4455
TELEMETRY_AVAILABLE = True
4556
except ImportError as e:
@@ -56,9 +67,17 @@ def init_telemetry(
5667
service_name: str = "gptme",
5768
enable_flask_instrumentation: bool = True,
5869
enable_requests_instrumentation: bool = True,
70+
enable_openai_instrumentation: bool = True,
71+
enable_anthropic_instrumentation: bool = True,
72+
prometheus_port: int = 8000,
5973
) -> None:
6074
"""Initialize OpenTelemetry tracing and metrics."""
6175
global _telemetry_enabled, _tracer, _meter, _token_counter, _request_histogram
76+
global \
77+
_tool_counter, \
78+
_tool_duration_histogram, \
79+
_active_conversations_gauge, \
80+
_llm_request_counter
6281

6382
# Check if telemetry is enabled via environment variable
6483
if os.getenv("GPTME_TELEMETRY_ENABLED", "").lower() not in ("true", "1", "yes"):
@@ -89,7 +108,15 @@ def init_telemetry(
89108
if hasattr(tracer_provider, "add_span_processor"):
90109
tracer_provider.add_span_processor(span_processor) # type: ignore
91110

92-
# Initialize metrics
111+
# Initialize metrics with Prometheus reader
112+
prometheus_port = int(os.getenv("PROMETHEUS_PORT", prometheus_port))
113+
prometheus_addr = os.getenv("PROMETHEUS_ADDR", "localhost")
114+
115+
# Start Prometheus HTTP server to expose metrics
116+
start_http_server(port=prometheus_port, addr=prometheus_addr)
117+
118+
# Initialize PrometheusMetricReader which pulls metrics from the SDK
119+
# on-demand to respond to scrape requests
93120
prometheus_reader = PrometheusMetricReader()
94121
metrics.set_meter_provider(MeterProvider(metric_readers=[prometheus_reader]))
95122
_meter = metrics.get_meter(service_name)
@@ -107,13 +134,43 @@ def init_telemetry(
107134
unit="seconds",
108135
)
109136

137+
_tool_counter = _meter.create_counter(
138+
name="gptme_tool_calls",
139+
description="Number of tool calls made",
140+
unit="calls",
141+
)
142+
143+
_tool_duration_histogram = _meter.create_histogram(
144+
name="gptme_tool_duration_seconds",
145+
description="Tool execution duration in seconds",
146+
unit="seconds",
147+
)
148+
149+
_active_conversations_gauge = _meter.create_up_down_counter(
150+
name="gptme_active_conversations",
151+
description="Number of active conversations",
152+
unit="conversations",
153+
)
154+
155+
_llm_request_counter = _meter.create_counter(
156+
name="gptme_llm_requests",
157+
description="Number of LLM API requests made",
158+
unit="requests",
159+
)
160+
110161
# Auto-instrument Flask and requests if enabled
111162
if enable_flask_instrumentation:
112163
FlaskInstrumentor().instrument()
113164

114165
if enable_requests_instrumentation:
115166
RequestsInstrumentor().instrument()
116167

168+
if enable_openai_instrumentation:
169+
OpenAIInstrumentor().instrument()
170+
171+
if enable_anthropic_instrumentation:
172+
AnthropicInstrumentor().instrument()
173+
117174
_telemetry_enabled = True
118175

119176
# Import console for user-visible messages
@@ -122,6 +179,9 @@ def init_telemetry(
122179
# Log to console so users know telemetry is active
123180
console.log("📊 Telemetry enabled - performance metrics will be collected")
124181
console.log(f"🔍 Traces will be sent via OTLP to {otlp_endpoint}")
182+
console.log(
183+
f"📈 Prometheus metrics available at http://{prometheus_addr}:{prometheus_port}/metrics"
184+
)
125185

126186
except Exception as e:
127187
logger.error(f"Failed to initialize telemetry: {e}")
@@ -181,6 +241,49 @@ def record_request_duration(
181241
_request_histogram.record(duration, {"endpoint": endpoint, "method": method})
182242

183243

244+
def record_tool_call(
245+
tool_name: str,
246+
duration: float | None = None,
247+
success: bool = True,
248+
error_type: str | None = None,
249+
error_message: str | None = None,
250+
) -> None:
251+
"""Record tool call metrics."""
252+
if not is_telemetry_enabled() or _tool_counter is None:
253+
return
254+
255+
attributes = {"tool_name": tool_name, "success": str(success).lower()}
256+
257+
if error_type:
258+
attributes["error_type"] = error_type
259+
if error_message:
260+
# Truncate long error messages
261+
attributes["error_message"] = error_message[:200]
262+
263+
_tool_counter.add(1, attributes)
264+
265+
if duration is not None and _tool_duration_histogram is not None:
266+
_tool_duration_histogram.record(duration, attributes)
267+
268+
269+
def record_conversation_change(delta: int) -> None:
270+
"""Record change in active conversations (+1 for new, -1 for ended)."""
271+
if not is_telemetry_enabled() or _active_conversations_gauge is None:
272+
return
273+
274+
_active_conversations_gauge.add(delta)
275+
276+
277+
def record_llm_request(provider: str, model: str, success: bool = True) -> None:
278+
"""Record LLM API request metrics."""
279+
if not is_telemetry_enabled() or _llm_request_counter is None:
280+
return
281+
282+
_llm_request_counter.add(
283+
1, {"provider": provider, "model": model, "success": str(success).lower()}
284+
)
285+
286+
184287
def measure_tokens_per_second(func: F) -> F:
185288
"""Decorator to measure tokens per second for LLM operations."""
186289

0 commit comments

Comments
 (0)