Add OpenTelemetry tracing and metrics by hansent · Pull Request #2141 · roboflow/inference

hansent · 2026-03-23T17:11:20Z

Summary

Adds OpenTelemetry distributed tracing and metrics to the inference server. Integrates with the tracing in async-serverless#192 for end-to-end traces across the system.

Disabled by default (OTEL_TRACING_ENABLED=False). Zero overhead when off. Safe when no collector is running.

Tracing

HTTP requests — auto-instrumented via opentelemetry-instrumentation-fastapi (extracts traceparent, creates server spans)
Outgoing requests — auto-instrumented via opentelemetry-instrumentation-requests (injects traceparent, creates http.client spans)
Model load/infer — model.load and model.infer spans with model.id attributes
Roboflow API calls — roboflow_api.call and roboflow_api.http_get spans
Error recording — all route handler errors recorded on active spans
Workflow thread propagation — OTel context forwarded through ThreadPoolExecutor
Log correlation — trace_id/span_id injected into structlog entries
On-demand tracing — X-Force-Trace: true header forces sampling on standalone requests; honoured parent decisions take priority

Metrics (alongside existing Prometheus)

Never sampled — 100% of requests.

Metric	Type	Attributes
`inference.models.loaded`	UpDownCounter	—
`inference.model.loads` / `.unloads`	Counter	`model.id`
`inference.model.load.duration`	Histogram	`model.id`
`inference.model.infer.count`	Counter	`model.id`
`inference.model.infer.duration`	Histogram	`model.id`
`inference.roboflow_api.duration`	Histogram	`roboflow_api.function`
`inference.errors`	Counter	`error.type`

All metrics include service.instance.id for per-pod breakdown.

Configuration

Env Var	Default	Description
`OTEL_TRACING_ENABLED`	`False`	Master switch
`OTEL_SERVICE_NAME`	`inference-server`	Service name
`OTEL_EXPORTER_PROTOCOL`	`grpc`	`grpc` or `http`
`OTEL_EXPORTER_ENDPOINT`	`localhost:4317`	OTLP collector
`OTEL_SAMPLING_RATE`	`1.0`	Trace sampling (parent-based)
`OTEL_TRACE_EXPORT_INTERVAL_MS`	`5000`	Span push interval
`OTEL_METRIC_EXPORT_INTERVAL_MS`	`10000`	Metric push interval

Local dev tool

./development/otel/start-otel-dev.sh          # Grafana + Tempo + Prometheus
./development/otel/start-otel-dev.sh --clean   # fresh start
./development/otel/start-otel-dev.sh stop

OTEL_TRACING_ENABLED=True OTEL_EXPORTER_PROTOCOL=http OTEL_EXPORTER_ENDPOINT=localhost:4318 python debugrun.py

Pre-configured dashboard at http://localhost:3000 (admin/admin).

Files changed (12 files)

File	Change
`requirements/requirements.http.txt`	OTel packages
`inference/core/env.py`	`OTEL_*` env vars
`inference/core/telemetry.py`	New — tracing, metrics, force-trace, export error handling
`inference/core/interfaces/http/http_api.py`	Setup, shutdown, CORS, access log
`inference/core/managers/base.py`	Spans + metrics on model load/infer/unload
`inference/core/interfaces/http/error_handlers.py`	Error recording + metric
`inference/core/roboflow_api.py`	API call spans + timing
`inference/core/logger.py`	trace_context_log_processor
`inference/core/workflows/.../executor/core.py`	OTel context through ThreadPool
`inference_sdk/http/client.py`	Cleanup (manual injection removed)
`development/otel/start-otel-dev.sh`	Local dev tool
`development/otel/dashboard.json`	Grafana dashboard

🤖 Generated with Claude Code

Instrument the inference server with OpenTelemetry to participate in distributed traces initiated by the async-serverless system. When enabled via OTEL_TRACING_ENABLED=True, the server extracts W3C traceparent headers from incoming requests, creates spans for key operations, and exports traces to an OTLP collector. Key changes: - New inference/core/telemetry.py module with TracerProvider setup, span helpers, and error recording - FastAPI auto-instrumentation for HTTP server spans - Manual spans on model.load, model.infer, and all Roboflow API calls - Error recording on all route exception handlers - Trace context (traceparent) propagation through SDK HTTP calls for remote workflow step execution - OTel context propagation through workflow ThreadPoolExecutor - trace_id/span_id injection into structlog entries - Zero overhead when disabled (default) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the inference server is hit directly (no parent traceparent), the sampling rate controls which requests are traced. This adds an X-Force-Trace: true header that overrides sampling for a specific request, useful for debugging in production. How it works: - _ForceTraceASGIMiddleware (outermost ASGI layer) reads the header and sets a ContextVar before the OTel instrumentor runs - _ForceTraceRootSampler checks the ContextVar and force-samples when set, otherwise delegates to the ratio-based sampler - ParentBased wrapping ensures that when a parent traceparent exists (e.g. from async-serverless), the parent's decision is honoured and the force-trace header is ignored — no duplicate tracing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- telemetry.py: wrap opentelemetry imports in try/except so the module is always safe to import even without opentelemetry installed. All public helpers (start_span, record_error, inject_trace_context) degrade to noops. - Move from inference.core.telemetry imports to top-level in roboflow_api.py, error_handlers.py, base.py - Add inject_trace_context() helper to inference_sdk/config.py, replacing scattered try/except ImportError blocks in client.py, executors.py, and request_building.py - Add top-level guarded otel_context import in workflow executor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…metry - Add set_span_attribute() to telemetry.py — callers no longer need to guard against None spans, business logic stays OTel-unaware - Move add_trace_context from logger.py to telemetry.py as trace_context_log_processor, using the module's _OTEL_AVAILABLE flag instead of per-call try/except ImportError - Remove unused `as span` capture in base.py model.load block - Add None guard to SDK's inject_trace_context for consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nager When USE_INFERENCE_MODELS=True, model loading happens inside the inference-models package and is invisible to the inference server. This adds a TracingModelAccessManager that hooks into AutoModel's callback system to create spans covering the full loading pipeline. - TracingModelAccessManager subclasses LiberalModelAccessManager (preserves default behavior, adds tracing on top) - on_model_package_access_granted starts an inference_models.load span - on_model_loaded ends the span (captures total load time) - File operations recorded as span events - Factory function returns None when OTel/inference-models unavailable - All 5 adapter classes pass the manager to AutoModel.from_pretrained() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds OpenTelemetry metrics exported via OTLP to the same collector as traces. Metrics are never sampled — they aggregate across 100% of requests regardless of trace sampling rate. Metrics added: - inference.models.loaded (UpDownCounter) — currently loaded model count - inference.model.loads (Counter by model.id) — total cold starts - inference.model.unloads (Counter by model.id) — total unloads - inference.model.load.duration (Histogram by model.id) — load time - inference.model.infer.count (Counter by model.id) — inference count - inference.model.infer.duration (Histogram by model.id) — inference latency - inference.roboflow_api.duration (Histogram by function) — API call latency - inference.errors (Counter by error.type) — error count MeterProvider shares the same OTLP endpoint, protocol, and resource as the TracerProvider. Metrics are pushed every 10 seconds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Uses GLOBAL_INFERENCE_SERVER_ID so metrics backends (Grafana, etc.) can distinguish pods. Enables per-pod grouping for gauges like inference.models.loaded while still allowing fleet-wide aggregation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CompositePropagator is at opentelemetry.propagators.composite (not opentelemetry.propagation.composite), and TraceContextTextMapPropagator is at opentelemetry.trace.propagation.tracecontext. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The OTLP HTTP exporter doesn't accept insecure=True (gRPC-only). HTTP is insecure by default when using http:// scheme. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

development/otel/start-otel-dev.sh spins up grafana/otel-lgtm in Docker and imports a pre-configured Inference Server dashboard with panels for inference count, latency, model loads, and traces. Usage: ./development/otel/start-otel-dev.sh # start ./development/otel/start-otel-dev.sh stop # stop Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the OTLP collector is down, the SDK logs full connection-refused tracebacks every export cycle. This is expected and harmless (data is dropped), but clutters server logs. Set the export loggers to CRITICAL so only truly fatal issues are logged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the previous CRITICAL-level suppression with a log filter that emits a single warning on first export failure and suppresses the noisy tracebacks. Resets when the collector comes back. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The metrics exporter logger is opentelemetry.sdk.metrics._internal.export, not opentelemetry.sdk.metrics.export. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New env vars: - OTEL_TRACE_EXPORT_INTERVAL_MS (default: 5000) — BatchSpanProcessor - OTEL_METRIC_EXPORT_INTERVAL_MS (default: 10000) — PeriodicExportingMetricReader Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

By default, data persists across stop/start (docker start reuses the container). Use --clean to wipe all traces/metrics and start fresh. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rics

Move OTel instrumentation from the inference server's external TracingModelAccessManager into the inference-models library itself, following OTel library author guidelines. inference-models now: - Depends on opentelemetry-api (optional extra) — noop when not installed - Wraps AutoModel.from_pretrained() in inference_models.from_pretrained span - Auto-instruments returned model instances (infer, pre_process, forward, post_process) via monkey-patching — zero burden on model authors - Instruments Roboflow API calls and weight downloads with spans - All instrumentation is noop when no SDK is configured by the application Removes TracingModelAccessManager from inference server — the library handles its own tracing now. Spans automatically appear as children of the inference server's model.load span via in-process context propagation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ation Replace manual inject_trace_context() calls in the SDK with the RequestsInstrumentor which globally patches the requests library to: - Automatically inject traceparent/tracestate on all outgoing requests - Create http.client child spans with standard HTTP attributes This covers all requests.* calls across inference server, SDK, and inference-models — including ones we might have missed manually. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove built-in OTel from inference-models per maintainer request. The inference server's model.load and model.infer spans already cover timing from the server's perspective. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ecarrara · 2026-03-23T19:19:29Z

inference/core/roboflow_api.py

-            timeout=ROBOFLOW_API_REQUEST_TIMEOUT,
-            verify=ROBOFLOW_API_VERIFY_SSL,
-        )
+    with start_span(


Since RequestsInstrumentor().instrument() is called in telemetry.py, every requests call automatically gets an HTTP client span with http.url, method, status code, etc.

This span looks redundant.

ecarrara · 2026-03-23T20:17:36Z

inference/core/interfaces/http/http_api.py

+                # Extract trace_id from traceparent header if present
+                # (reading from header due to ContextVar isolation in BaseHTTPMiddleware)
+                traceparent = request.headers.get("traceparent")
+                if traceparent:
+                    parts = traceparent.split("-")
+                    if len(parts) >= 3:
+                        log_fields["trace_id"] = parts[1]


You might be able to extract the current trace_id using the otel library, something like:

# in telemetry.py def current_trace_id(): from opentelemetry import trace import opentelemetry.context as context current_span = trace.get_current_span() if current_span.span_context.is_valid: return format(current_span.span_context.trace_id, "032x")

ecarrara

Code Review: Add OpenTelemetry tracing and metrics

Strengths

Clean API surface — all public helpers are noop-safe, so callers never need to guard against OTel being absent
Force-trace feature (X-Force-Trace header) is a thoughtful addition for on-demand debugging in production
Export error filter suppresses noisy connection-refused tracebacks — good operational ergonomics
Dev tooling — the shell script + Grafana dashboard make it easy to try locally
Zero overhead when disabled — feature-flagged at startup, not per-request

Issues & Suggestions

1. `otel_context.attach()` without `detach()` leaks context in thread pool (Bug)

inference/core/workflows/execution_engine/v1/executor/core.py — otel_context.attach() returns a token that must be passed to otel_context.detach(token) when the scope ends. In a ThreadPoolExecutor, threads are reused, so attached contexts accumulate:

# Current:
otel_context.attach(otel_ctx)

# Should be:
token = otel_context.attach(otel_ctx)
try:
    # ... step execution ...
finally:
    otel_context.detach(token)

This is a correctness bug — leaked contexts can cause spans in subsequent tasks on the same thread to be parented incorrectly.

2. `_force_trace_flag` ContextVar is never reset

_ForceTraceASGIMiddleware sets _force_trace_flag.set(True) but never resets it to False. Since ASGI runs requests in separate tasks, this is probably fine in practice (each task gets the default False). However, if any middleware or framework code reuses the same task context, the flag could leak. A try/finally reset would be safer:

async def __call__(self, scope, receive, send):
    if scope["type"] == "http":
        for header_name, header_value in scope.get("headers", []):
            if header_name == FORCE_TRACE_HEADER:
                if header_value.lower() == b"true":
                    _force_trace_flag.set(True)
                break
    try:
        await self.app(scope, receive, send)
    finally:
        _force_trace_flag.set(False)

3. Duplicate `record_api_call` on every exception path (DRY)

In roboflow_api.py, record_api_call(function.__name__, time.perf_counter() - t_start) is repeated in every except block plus the happy path. A try/finally would be cleaner and also ensures the metric is recorded if an unexpected exception type is raised:

try:
    try:
        result = function(*args, **kwargs)
        return result
    except RetryRequestError as error:
        raise error.inner_error
except ...:
    ...
finally:
    record_api_call(function.__name__, time.perf_counter() - t_start)

4. Access log trace_id parses raw header instead of OTel context

http_api.py — The comment says this is due to "ContextVar isolation in BaseHTTPMiddleware", which is a known Starlette issue. However, this means the access log only gets the incoming traceparent trace ID, which won't exist for server-initiated traces (e.g., X-Force-Trace without a parent). Consider documenting this limitation or using trace.get_current_span() if you can move the access log away from BaseHTTPMiddleware.

5. `insecure=True` hardcoded for gRPC exporter

telemetry.py — The gRPC exporter always uses insecure=True. Fine for local dev but could be a concern in production if someone points OTEL_EXPORTER_ENDPOINT at a remote collector. Consider making this configurable or defaulting based on the endpoint.

6. `OTEL_SERVICE_NAME` collides with standard OTel env var

The OTel SDK itself reads OTEL_SERVICE_NAME as a standard environment variable. Since you're also reading it in env.py and passing it explicitly to Resource.create(), there's a risk of confusion — the SDK's auto-configuration and your explicit config may interact unexpectedly. Consider using a prefixed name (e.g., INFERENCE_OTEL_SERVICE_NAME) or documenting that you intentionally reuse the standard var.

7. No tests

At minimum, unit tests for:

start_span / record_error / set_span_attribute noop behavior when OTel is absent
_ForceTraceRootSampler sampling logic
_ExportErrorFilter suppress/reset behavior
trace_context_log_processor structlog integration

Summary

Priority	Issue
Must fix	`otel_context.attach()` without `detach()` — context leak in thread pool
Should fix	`record_api_call` duplication → use `try/finally`
Should fix	Add unit tests for the telemetry module
Consider	Reset `_force_trace_flag` in `finally` block
Consider	Make gRPC `insecure` configurable
Consider	Rename or document `OTEL_SERVICE_NAME` collision with standard OTel env var

hansent and others added 22 commits March 20, 2026 19:50

Move OTel imports to top-level in http_api.py

5864d7b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Run make style (black + isort)

d85419f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Guard span.set_attribute for None span when OTel not configured

229299c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix HTTP exporter: remove unsupported insecure parameter

6e002f5

The OTLP HTTP exporter doesn't accept insecure=True (gRPC-only). HTTP is insecure by default when using http:// scheme. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix HTTP exporter: remove unsupported insecure parameter

5caf95f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix metrics export error filter: use correct internal logger name

d902eef

The metrics exporter logger is opentelemetry.sdk.metrics._internal.export, not opentelemetry.sdk.metrics.export. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add --clean flag to OTel dev tool for fresh starts

51da836

By default, data persists across stop/start (docker start reuses the container). Use --clean to wipe all traces/metrics and start fresh. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'feat/opentelemetry-tracing' into feat/opentelemetry-met…

5451f4d

…rics

hansent requested review from PawelPeczek-Roboflow, dkosowski87, grzegorz-roboflow, probicheaux and yeldarby as code owners March 23, 2026 17:11

This was referenced Mar 23, 2026

Add OpenTelemetry distributed tracing instrumentation #2137

Closed

Add OTel metrics (models loaded, inference latency, API calls, errors) #2139

Closed

hansent changed the base branch from feat/opentelemetry-metrics to main March 23, 2026 17:13

hansent changed the title ~~Add built-in OTel instrumentation to inference-models~~ Add OpenTelemetry distributed tracing, metrics, and inference-models instrumentation Mar 23, 2026

hansent changed the title ~~Add OpenTelemetry distributed tracing, metrics, and inference-models instrumentation~~ Open Telemetry instrumentation Mar 23, 2026

hansent and others added 2 commits March 23, 2026 12:31

hansent changed the title ~~Open Telemetry instrumentation~~ Add OpenTelemetry tracing and metrics Mar 23, 2026

ecarrara reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenTelemetry tracing and metrics#2141

Add OpenTelemetry tracing and metrics#2141
hansent wants to merge 24 commits intomainfrom
feat/inference-models-otel

hansent commented Mar 23, 2026 •

edited

Loading

Uh oh!

ecarrara Mar 23, 2026

Uh oh!

ecarrara Mar 23, 2026

Uh oh!

ecarrara left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hansent commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tracing

Metrics (alongside existing Prometheus)

Configuration

Local dev tool

Files changed (12 files)

Uh oh!

ecarrara Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ecarrara Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ecarrara left a comment

Choose a reason for hiding this comment

Code Review: Add OpenTelemetry tracing and metrics

Strengths

Issues & Suggestions

1. otel_context.attach() without detach() leaks context in thread pool (Bug)

2. _force_trace_flag ContextVar is never reset

3. Duplicate record_api_call on every exception path (DRY)

4. Access log trace_id parses raw header instead of OTel context

5. insecure=True hardcoded for gRPC exporter

6. OTEL_SERVICE_NAME collides with standard OTel env var

7. No tests

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hansent commented Mar 23, 2026 •

edited

Loading

1. `otel_context.attach()` without `detach()` leaks context in thread pool (Bug)

2. `_force_trace_flag` ContextVar is never reset

3. Duplicate `record_api_call` on every exception path (DRY)

5. `insecure=True` hardcoded for gRPC exporter

6. `OTEL_SERVICE_NAME` collides with standard OTel env var