Skip to content

Fix ArithmeticError in ServeShapePlug when parse_body halts before telemetry span#3929

Merged
alco merged 6 commits intomainfrom
worktree-replicated-plotting-bunny
Mar 2, 2026
Merged

Fix ArithmeticError in ServeShapePlug when parse_body halts before telemetry span#3929
alco merged 6 commits intomainfrom
worktree-replicated-plotting-bunny

Conversation

@alco
Copy link
Copy Markdown
Member

@alco alco commented Feb 27, 2026

Summary

  • Moved parse_body plug after start_telemetry_span in the pipeline to prevent ArithmeticError when parse_body halts (e.g. oversized body, bad JSON) before the telemetry span is initialized
  • The custom halt/1 calls end_telemetry_span() which tried to compute System.monotonic_time() - nil, crashing the request

Fixes #3919

Test plan

  • Existing serve_shape_plug_test.exs tests pass (31 tests, 0 failures)
  • Add new unit tests that fail without the fix

🤖 Generated with Claude Code

alco and others added 2 commits February 27, 2026 21:18
…meticError

When parse_body halted a request before start_telemetry_span had
executed, the custom halt/1 would call end_telemetry_span which tried
to compute `System.monotonic_time() - nil`, causing an ArithmeticError.

Fixes #3919

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.21%. Comparing base (749fb4d) to head (9725a13).
⚠️ Report is 5 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3929       +/-   ##
===========================================
+ Coverage   75.75%   87.21%   +11.46%     
===========================================
  Files          11       25       +14     
  Lines         693     2394     +1701     
  Branches      174      600      +426     
===========================================
+ Hits          525     2088     +1563     
- Misses        167      304      +137     
- Partials        1        2        +1     
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 91.88% <ø> (?)
packages/y-electric 56.05% <ø> (ø)
typescript 87.21% <ø> (+11.46%) ⬆️
unit-tests 87.21% <ø> (+11.46%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

These tests verify that POST requests with invalid JSON or non-object
JSON bodies return proper 400 responses without crashing. Before the
plug ordering fix, these paths would trigger an ArithmeticError in
end_telemetry_span.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alco alco marked this pull request as ready for review February 27, 2026 21:23
@alco alco self-assigned this Mar 2, 2026
@alco alco added the claude label Mar 2, 2026
@claude
Copy link
Copy Markdown

claude Bot commented Mar 2, 2026

Claude Code Review

Summary

This PR fixes a real production ArithmeticError crash with a clean, minimal one-line reordering: moving plug :parse_body to after plug :start_telemetry_span so that the custom halt/1 override can always call end_telemetry_span() without accessing a nil private map. The fix is correct and the approach is sound.

What's Working Well

  • Surgical fix: The change is exactly one line moved in the pipeline — minimal blast radius with maximum correctness.
  • Comment now matches code: The existing comment # start_telemetry_span needs to always be the first plug after fetching query params. already expressed the correct invariant; this PR finally enforces it.
  • Changeset present: @core/sync-service: patch with a clear description — correctly categorized.
  • Tests demonstrate the fix: Both new tests would crash with ArithmeticError before the fix, confirming they're true regression tests, not just documentation.

Issues Found

Important (Should Fix)

Two of four parse_body error paths are not tested — including the production trigger

The linked issue (#3919) explicitly identifies the body-exceeds-size-limit path ({:more, _, conn}) as the production trigger, and {:error, reason} as another untested path. The PR adds tests for the JSON decode errors (paths 1 and 2) but leaves the actual crash trigger uncovered.

File: packages/sync-service/test/electric/plug/serve_shape_plug_test.exs

Suggested additions to the "parse_body error handling" describe block using Repatch to mock Plug.Conn.read_body/2:

test "returns 413 for oversized body without crashing", ctx do
  Repatch.patch(Plug.Conn, :read_body, fn conn, _opts -> {:more, "partial", conn} end)

  conn =
    Plug.Test.conn(:post, "/?offset=-1", "body")
    |> put_req_header("content-type", "application/json")
    |> call_serve_shape_plug(ctx)

  assert conn.status == 413
  assert %{"error" => "Request body too large"} = Jason.decode!(conn.resp_body)
end

test "returns 400 for body read failure without crashing", ctx do
  Repatch.patch(Plug.Conn, :read_body, fn conn, _opts -> {:error, :timeout, conn} end)

  conn =
    Plug.Test.conn(:post, "/?offset=-1", "body")
    |> put_req_header("content-type", "application/json")
    |> call_serve_shape_plug(ctx)

  assert conn.status == 400
  assert %{"error" => "Failed to read request body"} = Jason.decode!(conn.resp_body)
end

Suggestions (Nice to Have)

1. Defensive nil-check in end_telemetry_span as defense-in-depth

The plug ordering constraint is implicit and comment-only — a future developer could break the invariant again. Adding a guard in end_telemetry_span/2 would prevent the same class of bug from recurring:

# packages/sync-service/lib/electric/plug/serve_shape_plug.ex:283
defp end_telemetry_span(%Conn{assigns: assigns} = conn, _ \\ nil) do
  span = conn.private[:electric_telemetry_span]
  
  OpenTelemetry.execute(
    [:electric, :plug, :serve_shape],
    %{
      count: 1,
      bytes: assigns[:streaming_bytes_sent] || 0,
      monotonic_time: System.monotonic_time(),
      duration: if span, do: System.monotonic_time() - span[:start_time], else: 0
    },
    ...
  )
  ...
end

This is a nice-to-have — the reordering fix is correct and sufficient on its own.

2. put_resp_content_type runs after parse_body — JSON error responses lack Content-Type: application/json

This is a pre-existing issue (not introduced by this PR), but worth noting: parse_body error responses call send_resp(400, Jason.encode!(...)) and halt() before plug :put_resp_content_type, "application/json" runs. So those 400/413 error responses don't carry the correct content-type header. A future PR could move put_resp_content_type before parse_body, or have parse_body set the header explicitly on error paths.

Issue Conformance

Fully addresses #3919. The root cause analysis in the issue is accurate, and the PR's reordering approach directly eliminates the crash path. The PR description matches the implementation. The issue was well-specified with exact reproduction steps, affected versions, and the two commits responsible — nothing is left unaddressed.


Review iteration: 1 | 2026-03-02

Copy link
Copy Markdown
Contributor

@robacourt robacourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@alco
Copy link
Copy Markdown
Member Author

alco commented Mar 2, 2026

@claude Two things:

  1. Add two tests suggested in the Important (Should Fix) section above.
  2. Move the put_resp_content_type plug to be the first plug in the above, above plug :fetch_query_params.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 2, 2026

Claude finished @alco's task in 2m 21s —— View job


Changes implemented

  • Read relevant source files
  • Add test for oversized body (413) without crashing
  • Add test for body read failure (400) without crashing
  • Move put_resp_content_type plug to be first, above fetch_query_params
  • Commit and push changes

Commit: 1beba88

Summary of changes

serve_shape_plug.exput_resp_content_type moved to be the very first plug, so Content-Type: application/json is set on all responses including early parse_body error paths:

plug :put_resp_content_type, "application/json"   # ← moved here (was after parse_body)
plug :fetch_query_params
plug :start_telemetry_span
plug :parse_body

serve_shape_plug_test.exs — Two new tests added to the "parse_body error handling" describe block using Repatch.patch to mock Plug.Conn.read_body/2:

Note: The review suggested {:error, :timeout, conn} for the read-failure mock, but Plug.Conn.read_body/2 returns a 2-tuple {:error, reason} on failure (no conn in the tuple — it's the caller's responsibility to use the original conn). The test uses the correct 2-tuple form.

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 2, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 1beba88
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/69a567099a05d8000898b537
😎 Deploy Preview https://deploy-preview-3929--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

alco and others added 3 commits March 2, 2026 11:52
Ensures all error responses (including those from parse_body) carry the
correct Content-Type: application/json header.

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alco alco force-pushed the worktree-replicated-plotting-bunny branch from 1beba88 to 9725a13 Compare March 2, 2026 10:54
@alco alco merged commit 58c06d8 into main Mar 2, 2026
61 of 64 checks passed
@alco alco deleted the worktree-replicated-plotting-bunny branch March 2, 2026 11:09
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 2, 2026

This PR has been released! 🚀

The following packages include changes from this PR:

  • @core/sync-service@1.4.11

Thanks for contributing to Electric!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ArithmeticError in ServeShapePlug.end_telemetry_span/2 when parse_body halts before telemetry span starts

2 participants