Skip to content

Telemetry

sarmakska edited this page May 31, 2026 · 1 revision

Telemetry

ai-eval-runner emits OpenTelemetry spans for each run and each example, carrying semantic attributes you can fan out to any OTLP collector. Telemetry is behind an optional extra, so the runner has no hard dependency on a telemetry stack.

Enabling it

uv sync --extra otel

Configure an exporter through the standard environment variables, for example:

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_SERVICE_NAME=ai-eval-runner

With the otel extra installed and a tracer provider configured, spans flow to your collector. Without the extra, the runner still records attributes internally (so tests and call sites are identical) but exports nothing.

What is captured

Two span types are emitted:

  • aieval.run, once per run
  • aieval.example, once per example

Attribute names follow the GenAI semantic conventions where they exist, and use the aieval.* namespace for runner-specific signal:

Attribute Span Meaning
gen_ai.request.model both The model under evaluation
gen_ai.system both The provider (sarmalink, openai)
aieval.run.name run The eval name
aieval.run.dataset_version run The dataset content version
aieval.run.git_sha run The working tree git SHA
aieval.run.pass_rate run Final pass rate
aieval.run.avg_latency_ms run Mean provider latency
aieval.example.index example The example index
aieval.example.latency_ms example Provider latency for the example
aieval.score.<scorer> example The score from each scorer
aieval.example.error example The error message when an example fails

Design

The capture lives in aieval/core/telemetry.py. A span() context manager yields a SpanHandle whose set() method records further attributes once they are known, such as scores computed inside the span. The handle keeps the captured attribute dict available in both modes, so the same code path works whether or not OpenTelemetry is installed and tests can assert on captured attributes without standing up a collector.

Clone this wiki locally