Propagate OTel context into spawned snapshot and move-in tasks#4149
Propagate OTel context into spawned snapshot and move-in tasks#4149
Conversation
`Task.Supervisor.async_nolink` / `start_child` start new Erlang processes that do not inherit the caller's OTel context. Spans created inside these tasks via `with_child_span` (e.g. `shape_snapshot.execute_for_shape` and its children) were therefore silently dropped on the initial-snapshot and move-in code paths, because `with_child_span` requires a parent span in the current process's context. Capture the context via `:otel_ctx.get_current()` before spawning and attach it inside the task closure with `:otel_ctx.attach/1`, mirroring the pattern already used for `state.otel_ctx` in the snapshotter's `handle_continue`.
Claude Code ReviewSummaryCommit What's Working Well
Issues FoundCritical (Must Fix)None. Important (Should Fix)None. Suggestions (Nice to Have)
File: The
Missing File: Both functions are public and referenced from the @doc """
Captures the current span and baggage context so it can be propagated to another process.
Use with `set_current_context/1`.
"""
@spec get_current_context() :: otel_ctx()
def get_current_context do
{current_span_context(), :otel_baggage.get_all()}
end
@doc """
Restores a span and baggage context previously captured by `get_current_context/0`.
Call this at the start of a spawned task to link its spans to the originating trace.
"""
@spec set_current_context(otel_ctx()) :: :ok
def set_current_context({span_ctx, baggage}) do
:otel_tracer.set_current_span(span_ctx)
:otel_baggage.set(baggage)
endIssue ConformanceNo linked public issue. The PR description, root cause analysis, and fix approach are clear and complete. The implementation matches what was described. Previous Review Status
Review iteration: 5 | 2026-04-28 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4149 +/- ##
===========================================
- Coverage 89.20% 66.70% -22.50%
===========================================
Files 25 135 +110
Lines 2520 17505 +14985
Branches 636 4137 +3501
===========================================
+ Hits 2248 11677 +9429
- Misses 270 5825 +5555
- Partials 2 3 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Replace :otel_ctx.get_current/attach/detach in PartialModes with OpenTelemetry.get_current_context/1 and set_current_context/1, matching the pattern already used in shape_log_collector.ex and consumer.ex. The helper pair just propagates the current span + baggage into the new process, which is all these short-lived tasks need — no detach dance required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow the same pattern as the previous commit and shape_log_collector/consumer: use OpenTelemetry.get_current_context/1 and set_current_context/1 helpers instead of raw :otel_ctx. get_current/attach/detach. Drops the detach dance for both the handle_continue entry in Snapshotter and the nested Task in start_streaming_snapshot_from_db, and updates the producer in Shapes.get_or_create_shape_handle to capture the context via the same helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follows up on the reviewer's suggestion: `get_current_context/0` returns
a {span_ctx, baggage} tuple, not a map. Expose an `otel_ctx` @type on
the OpenTelemetry module and reference it from
`Consumer.initialize_shape_opts` so the spec matches the real shape of
the value being carried.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
This PR has been released! 🚀 The following packages include changes from this PR:
Thanks for contributing to Electric! |
Summary
Fixes a bug where telemetry spans defined with
OpenTelemetry.with_child_spaninside spawned tasks (initial-snapshot and move-in code paths) were silently dropped, hiding expected fine-grained spans such asshape_snapshot.execute_for_shape,shape_snapshot.query_fn,shape_snapshot.checkout_wait,shape_snapshot.setup, andshape_snapshot.queryfrom Honeycomb on hosts whose traffic is dominated by initial snapshots.Root cause
with_child_span/4only creates a span when there is already a parent span in the current Erlang process's OTel context.Task.Supervisor.async_nolink/Task.Supervisor.start_childstart new processes that do not inherit the caller's OTel context, soin_span_context?()returnsfalseand the whole span subtree is dropped. Three spawn sites were affected:Electric.Shapes.Consumer.Snapshotter.start_streaming_snapshot_from_db/4Electric.Shapes.PartialModes.query_move_in_async/5Electric.Shapes.PartialModes.query_move_in/5(
PartialModes.query_subset/4is called synchronously from an HTTP-request process that already has a parent span — it was not affected.)Fix
Capture the context via
:otel_ctx.get_current()before each spawn and attach it inside the task closure with:otel_ctx.attach/1(detached inafter). This mirrors the pattern already used forstate.otel_ctxinSnapshotter.handle_continue/2.Test plan
mix compilecleanmix test test/electric/shapes/consumer_test.exs— 29 passingmix test test/electric/shapes/consumer/move_ins_test.exs test/electric/shapes/consumer/initial_snapshot_test.exs— 60 passingname = shape_snapshot.execute_for_shape AND shape.query_reason = "initial_snapshot"returns rows in Honeycomb (previously 0 across all hosts over 24h)Refs: https://github.com/electric-sql/alco-agent-tasks/issues/27
🤖 Generated with Claude Code