Skip to content

Propagate OTel context into spawned snapshot and move-in tasks#4149

Merged
alco merged 5 commits intomainfrom
erik/issue-27-otel-context-propagation
Apr 28, 2026
Merged

Propagate OTel context into spawned snapshot and move-in tasks#4149
alco merged 5 commits intomainfrom
erik/issue-27-otel-context-propagation

Conversation

@alco
Copy link
Copy Markdown
Member

@alco alco commented Apr 22, 2026

vetted by human ready for review

Summary

Fixes a bug where telemetry spans defined with OpenTelemetry.with_child_span inside spawned tasks (initial-snapshot and move-in code paths) were silently dropped, hiding expected fine-grained spans such as shape_snapshot.execute_for_shape, shape_snapshot.query_fn, shape_snapshot.checkout_wait, shape_snapshot.setup, and shape_snapshot.query from Honeycomb on hosts whose traffic is dominated by initial snapshots.

Root cause

with_child_span/4 only creates a span when there is already a parent span in the current Erlang process's OTel context. Task.Supervisor.async_nolink / Task.Supervisor.start_child start new processes that do not inherit the caller's OTel context, so in_span_context?() returns false and the whole span subtree is dropped. Three spawn sites were affected:

  • Electric.Shapes.Consumer.Snapshotter.start_streaming_snapshot_from_db/4
  • Electric.Shapes.PartialModes.query_move_in_async/5
  • Electric.Shapes.PartialModes.query_move_in/5

(PartialModes.query_subset/4 is called synchronously from an HTTP-request process that already has a parent span — it was not affected.)

Fix

Capture the context via :otel_ctx.get_current() before each spawn and attach it inside the task closure with :otel_ctx.attach/1 (detached in after). This mirrors the pattern already used for state.otel_ctx in Snapshotter.handle_continue/2.

Test plan

  • mix compile clean
  • mix test test/electric/shapes/consumer_test.exs — 29 passing
  • mix test test/electric/shapes/consumer/move_ins_test.exs test/electric/shapes/consumer/initial_snapshot_test.exs — 60 passing
  • After deploy: verify name = shape_snapshot.execute_for_shape AND shape.query_reason = "initial_snapshot" returns rows in Honeycomb (previously 0 across all hosts over 24h)

Refs: https://github.com/electric-sql/alco-agent-tasks/issues/27

🤖 Generated with Claude Code

`Task.Supervisor.async_nolink` / `start_child` start new Erlang processes
that do not inherit the caller's OTel context. Spans created inside these
tasks via `with_child_span` (e.g. `shape_snapshot.execute_for_shape` and
its children) were therefore silently dropped on the initial-snapshot and
move-in code paths, because `with_child_span` requires a parent span in
the current process's context.

Capture the context via `:otel_ctx.get_current()` before spawning and
attach it inside the task closure with `:otel_ctx.attach/1`, mirroring
the pattern already used for `state.otel_ctx` in the snapshotter's
`handle_continue`.
@alco alco added the claude label Apr 22, 2026
@claude
Copy link
Copy Markdown

claude Bot commented Apr 22, 2026

Claude Code Review

Summary

Commit 85c491076 adds integration test coverage to otel-export.lux to verify that the OTel context propagation fix actually emits spans from spawned tasks. The PR is complete and ready to merge.

What's Working Well

  • Integration test verifies the exact bug that was fixed: The lux test now checks that shape_snapshot.execute_for_shape spans appear in the OTel collector for both the Snapshotter path (shape.query_reason: initial_snapshot) and the PartialModes path (shape.query_reason: subset_query) — these are the two paths that were broken before this PR.
  • Test order is logical: trigger initial snapshot → wait for span, insert a row → trigger subset query → wait for span, live poll for the change. The sequence creates the conditions needed for each code path.
  • No detach needed in Snapshotter: handle_continue/2 terminates the process ({:stop, :normal, state}), so the OTel process dictionary context is cleaned up on exit — the removed ctx_token/detach was technically correct before but is not needed given this process lifecycle.

Issues Found

Critical (Must Fix)

None.

Important (Should Fix)

None.

Suggestions (Nice to Have)

@moduledoc arity typo in open_telemetry.ex

File: packages/sync-service/lib/electric/telemetry/open_telemetry.ex:22

The @moduledoc bullet says:

See `get_current_context/1` and `set_current_context/1`.

get_current_context takes 0 arguments — the correct reference is `get_current_context/0`. The @typedoc added in b230e901b correctly uses get_current_context/0, so this is an inconsistency in the module doc.

Missing @doc and @spec on get_current_context/0 and set_current_context/1

File: packages/sync-service/lib/electric/telemetry/open_telemetry.ex:305,310

Both functions are public and referenced from the @moduledoc but have only inline comments. Per project convention (@spec on all public functions, @doc for documentation) they should have:

@doc """
Captures the current span and baggage context so it can be propagated to another process.
Use with `set_current_context/1`.
"""
@spec get_current_context() :: otel_ctx()
def get_current_context do
  {current_span_context(), :otel_baggage.get_all()}
end

@doc """
Restores a span and baggage context previously captured by `get_current_context/0`.
Call this at the start of a spawned task to link its spans to the originating trace.
"""
@spec set_current_context(otel_ctx()) :: :ok
def set_current_context({span_ctx, baggage}) do
  :otel_tracer.set_current_span(span_ctx)
  :otel_baggage.set(baggage)
end

Issue Conformance

No linked public issue. The PR description, root cause analysis, and fix approach are clear and complete. The implementation matches what was described.

Previous Review Status

  • All issues from iterations 1–4 resolved. The otel_ctx type (b230e901b) and the integration test verification (85c491076) complete the PR.

Review iteration: 5 | 2026-04-28

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.70%. Comparing base (388ec63) to head (85c4910).
⚠️ Report is 54 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4149       +/-   ##
===========================================
- Coverage   89.20%   66.70%   -22.50%     
===========================================
  Files          25      135      +110     
  Lines        2520    17505    +14985     
  Branches      636     4137     +3501     
===========================================
+ Hits         2248    11677     +9429     
- Misses        270     5825     +5555     
- Partials        2        3        +1     
Flag Coverage Δ
packages/agents 52.91% <ø> (?)
packages/agents-runtime 78.95% <ø> (?)
packages/agents-server 66.42% <ø> (?)
packages/agents-server-ui 0.00% <ø> (?)
packages/electric-ax 31.19% <ø> (?)
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 94.30% <ø> (ø)
packages/y-electric 56.05% <ø> (ø)
typescript 66.70% <ø> (-22.50%) ⬇️
unit-tests 66.70% <ø> (-22.50%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

alco and others added 4 commits April 22, 2026 02:17
Replace :otel_ctx.get_current/attach/detach in PartialModes with
OpenTelemetry.get_current_context/1 and set_current_context/1, matching
the pattern already used in shape_log_collector.ex and consumer.ex.

The helper pair just propagates the current span + baggage into the new
process, which is all these short-lived tasks need — no detach dance
required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow the same pattern as the previous commit and
shape_log_collector/consumer: use OpenTelemetry.get_current_context/1
and set_current_context/1 helpers instead of raw :otel_ctx.
get_current/attach/detach. Drops the detach dance for both the
handle_continue entry in Snapshotter and the nested Task in
start_streaming_snapshot_from_db, and updates the producer in
Shapes.get_or_create_shape_handle to capture the context via the
same helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follows up on the reviewer's suggestion: `get_current_context/0` returns
a {span_ctx, baggage} tuple, not a map. Expose an `otel_ctx` @type on
the OpenTelemetry module and reference it from
`Consumer.initialize_shape_opts` so the spec matches the real shape of
the value being carried.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 28, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 85c4910
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/69f09bf6fe7b440008e13d6a
😎 Deploy Preview https://deploy-preview-4149--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

@robacourt robacourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alco alco merged commit 4da5618 into main Apr 28, 2026
83 of 84 checks passed
@alco alco deleted the erik/issue-27-otel-context-propagation branch April 28, 2026 13:03
@github-actions
Copy link
Copy Markdown
Contributor

This PR has been released! 🚀

The following packages include changes from this PR:

  • @core/sync-service@1.6.1

Thanks for contributing to Electric!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants