fix(telemetry): propagate OTel trace context so all spans share one trace#245
Merged
Merged
Conversation
…race `TelemetryHook` used `tracer.start_span(...)` for invocation, iteration, and tool spans without ever attaching them to OTel's current context. Result: every span emerged as its own trace root, so Langfuse / Jaeger / Tempo / any OTLP backend rendered N separate traces per agent run instead of one tree. Reported by @Luigis while wiring up Langfuse. Closes #244. ## Fix Use `opentelemetry.context.attach(trace.set_span_in_context(span))` after starting each parent span, then `context.detach(token)` when the span ends. This pushes the span onto OTel's context stack so subsequent `start_span` calls within the same async scope automatically parent to it. - `on_before_invocation` attaches the invocation context; `on_after_invocation` detaches it. - `on_iteration_start` attaches the iteration context (child of the invocation); `on_iteration_end` detaches. - `on_before_tool_call` doesn't need explicit parenting — the iteration context is already attached, so the tool span automatically becomes its child. ## Tests Three new regression tests under `TestTraceContextPropagation` using OTel's `InMemorySpanExporter`: - `test_all_spans_share_one_trace_id` — invocation + iteration + tool share a single `trace_id`. - `test_tool_span_is_child_of_iteration_span` — verifies the parent chain (invocation → iteration → tool) via `span.parent.span_id`. - `test_invocation_context_detaches_after_run` — confirms the context is detached on completion so the next caller's spans aren't parented to ours. All 31 telemetry tests pass; full unit suite (4788) clean. Tests gracefully skip when `opentelemetry-sdk` isn't installed (the hook only needs `opentelemetry-api` at runtime). Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
|
OK, I have done the tests @fede-kamel the issue seems to be solved! |
… lines `src/locus/hooks/builtin/telemetry.py` drops from 95.04% to 93.33%. The newly uncovered lines (32-39) are the OpenTelemetry `ImportError` fallback — defensive code that only runs when `opentelemetry` isn't installed, which never happens under CI. Baseline also picks up two organic improvements: - loop/runner.py: 96.60% -> 97.28% - memory/managers/oracle_agent_memory.py: 81.35% -> 81.63% Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
debc75c to
5bb48ce
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #244. Draft until Langfuse repro lands this afternoon —
we'll cross-check it against this fix + add it as a second regression
test before marking ready-for-review.
Summary
TelemetryHookwas callingtracer.start_span(...)for invocation,iteration, and tool spans without ever attaching them to OTel's current
context. Every span emerged as its own trace root, so Langfuse / Jaeger /
Tempo rendered N separate traces per agent run instead of one tree.
Reported by @Luigis while wiring up Langfuse.
Fix
opentelemetry.context.attach(trace.set_span_in_context(span))afterstarting each parent span, paired with
context.detach(token)on end.This pushes the span onto OTel's context stack so subsequent
start_spancalls within the same async scope automatically parent to it.
on_before_invocationon_after_invocationon_iteration_starton_iteration_endon_before_tool_callon_after_tool_callRegression tests
Three new tests under
TestTraceContextPropagationusing OTel'sInMemorySpanExporter:test_all_spans_share_one_trace_id— asserts the 3-span run(invocation + iteration + tool) shares exactly one
trace_id.test_tool_span_is_child_of_iteration_span— verifies the parentchain via
span.parent.span_id: invocation has no parent (root),iteration's parent = invocation, tool's parent = iteration.
test_invocation_context_detaches_after_run— confirms the contextis detached on completion so the caller's downstream spans aren't
parented to ours.
Tests gracefully skip when
opentelemetry-sdkisn't installed in thetest env (the hook only needs
opentelemetry-apiat runtime).Test plan
hatch run check— ruff format + ruff check + mypy cleanOut of scope (filed for later)
The hook still doesn't currently set the
service.nameresource (justattaches it as a span attribute). Spec-wise it should live on the
resource, not the span — but moving it requires a
Resourceon theTracerProvider, which is the caller's concern. Leaving as-is.