-
Notifications
You must be signed in to change notification settings - Fork 0
Telemetry
ai-eval-runner emits OpenTelemetry spans for each run and each example, carrying semantic attributes you can fan out to any OTLP collector. Telemetry is behind an optional extra, so the runner has no hard dependency on a telemetry stack.
uv sync --extra otelConfigure an exporter through the standard environment variables, for example:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_SERVICE_NAME=ai-eval-runner
With the otel extra installed and a tracer provider configured, spans flow to your collector. Without the extra, the runner still records attributes internally (so tests and call sites are identical) but exports nothing.
Two span types are emitted:
-
aieval.run, once per run -
aieval.example, once per example
Attribute names follow the GenAI semantic conventions where they exist, and use the aieval.* namespace for runner-specific signal:
| Attribute | Span | Meaning |
|---|---|---|
gen_ai.request.model |
both | The model under evaluation |
gen_ai.system |
both | The provider (sarmalink, openai) |
aieval.run.name |
run | The eval name |
aieval.run.dataset_version |
run | The dataset content version |
aieval.run.git_sha |
run | The working tree git SHA |
aieval.run.pass_rate |
run | Final pass rate |
aieval.run.avg_latency_ms |
run | Mean provider latency |
aieval.example.index |
example | The example index |
aieval.example.latency_ms |
example | Provider latency for the example |
aieval.score.<scorer> |
example | The score from each scorer |
aieval.example.error |
example | The error message when an example fails |
The capture lives in aieval/core/telemetry.py. A span() context manager yields a SpanHandle whose set() method records further attributes once they are known, such as scores computed inside the span. The handle keeps the captured attribute dict available in both modes, so the same code path works whether or not OpenTelemetry is installed and tests can assert on captured attributes without standing up a collector.