feat(sync-service): cheap admission control (Tier 0) with post-load_shape reclassification#4359
feat(sync-service): cheap admission control (Tier 0) with post-load_shape reclassification#4359alco wants to merge 24 commits into
Conversation
Captures the design and PR-description draft for Tier 0 cheap admission plus post-load_shape reclassification. Plan tracks #4291. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Atomically moves an in-flight permit between buckets. Returns
{:error, :overloaded} when the destination is at cap and leaves the source
unchanged. Used by the upcoming reclassification step in ServeShapePlug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4359 +/- ##
==========================================
+ Coverage 59.18% 59.73% +0.55%
==========================================
Files 305 290 -15
Lines 28444 28579 +135
Branches 7493 7766 +273
==========================================
+ Hits 16834 17073 +239
+ Misses 11596 11489 -107
- Partials 14 17 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Documents the caller-must-hold-from-permit precondition, adds `current` to swap_rejected telemetry metadata for parity with the existing reject event, and renames a local variable for consistency with the rest of the module. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fication Moves :check_admission to the front of ServeShapePlug's pipeline and classifies requests on a single :ets.member/2 of the per-stack shape-meta table. Deletes :resolve_existing_shape — admission no longer touches the SQLite-backed ShapeDb (bottleneck 1 of #4266). Adds :reclassify_admission_kind between :load_shape and :serve_shape_response. Once load_shape returns, the handler atomically swaps its :initial permit for an :existing permit, freeing the :initial slot for the next validate-and-load wave while this request streams. If :existing is at cap, the handler keeps its :initial permit; the next swap attempt will succeed once :existing drains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit added a nil-guard in check_admission to handle api_opts that omit max_concurrent_requests, but that introduced a silent bypass for misconfiguration. Reverts the guard and gives the Api struct a real default so direct plug_opts/1 callers (e.g. secure-mode tests) populate the field automatically. Misconfiguration once again surfaces as a Map.fetch! crash instead of silently disabling admission. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lback logging - Comment the unknown-handle branch in admission_kind/1 so the two :initial arms don't read like a copy-paste oversight. - Logger.debug on the reclassify_admission_kind fallback (try_swap returning :overloaded) so operators can spot the "stuck on :initial" path with debug logging enabled. The rejection itself remains observable via the existing [:electric, :admission_control, :swap_rejected] telemetry emitted by try_swap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 44d35d2c41
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Claude Code ReviewSummaryIncremental review of #4359 (iteration 6). Four new commits, all responses to iteration 5's open suggestions: What Was Addressed Since Iteration 5
New Code Review (delta only)Nothing new to flag — all four commits are surgical, no functional changes. IssuesNone. Suggestions
Issue ConformancePR cites #4291 (primary, "Closes"), #4292 (parent), #4266 (bottleneck 1). Previous Review Status
Review iteration: 6 | 2026-05-20 |
Reverts the struct-level default added in 739ae54. Production already populates max_concurrent_requests before Api.plug_opts/1 is reached (via Electric.Application.api_configuration/1 -> Electric.Config.get_env/1 -> the Defaults map at lib/electric/config.ex:73, which is the real source of truth at %{initial: 300, existing: 10_000}). The struct default was only ever consulted by tests that build Api opts directly without going through the production-config plumbing. Instead, route those test fixtures through Electric.Config.get_env/1 so they see the same defaults production sees: - test/support/component_setup.ex build_router_opts/2 (covers most router tests via the canonical helper). - test/electric/plug/serve_shape_plug_logging_test.exs build_plug_opts/1. - The secure-mode api_opts fixture in router_test.exs. Misconfiguration once again surfaces as a Map.fetch! crash instead of being silently papered over by a hardcoded struct default that did not match the production numbers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves :put_resp_content_type before :check_admission so 503 admission rejections still carry `Content-Type: application/json`. Prior order put admission first, but Api.Response.error/3 -> Api.Response.send/2 does not set the content-type header itself — only the JSON-encoded body. Strict clients, CDNs, and observability tooling could fall back to `application/octet-stream` on the rejection path the PR was optimizing for. Adds a regression assertion in the existing 503 router test. Reported by Claude review on PR #4359. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous implementation used a single ets:update_counter/4 call
with a threshold-clamped increment for the destination column and a
decrement for the source column. That combination has two distinct
concurrency hazards under contention:
1. **Codex's race (rollback under-counts to_kind).** When N callers
race past a saturated to_kind, every caller captures the same
clamped post-increment value `cap + 1` — but only the first
increment actually moved the row. All N rollbacks then decrement
the destination by 1, removing real permits and breaking cap
enforcement for subsequent acquires until something drains.
2. **Claude's race (concurrent acquire on from_kind sees under-count).**
Because from_kind was decremented in the same atomic op as the
to_kind increment, a concurrent try_acquire(from_kind) could
observe a transiently lowered from_kind, succeed, and push
from_kind above its cap on swap rollback.
The fix is to switch to two unclamped ops:
1. Plain increment of to_kind. Each caller's `+1` is real, so each
rejected caller's rollback removes exactly its own contribution.
2. If new_to > cap: atomic rollback (to_kind -= 1). from_kind is
untouched, so the caller keeps its from_kind permit.
3. Otherwise: atomic decrement of from_kind (success).
Cost: 2 atomic ETS ops on both the success and reject paths (vs. 1 on
success / 2 on reject before). The mid-state between the two success
ops shows total = original_total + 1, which can spuriously reject a
concurrent try_acquire(to_kind) but never over-admit — a strictly
safer trade than the previous "from + to invariant" framing, which
turned out to be wrong under the threshold-clamp behaviour.
Adds two regression tests:
- `concurrent rejected swaps preserve the destination counter`
(Codex's race).
- `from_kind cap is preserved against concurrent try_acquire during swap`
(Claude's race).
Also addresses two smaller review notes:
- Adds the missing @SPEC for try_swap/4.
- Switches `swap_rejected` telemetry's `current` measurement from
`cap` to `new_to` (the actual post-increment value the racer
captured), matching the `:reject` event's post-increment pattern.
Reported by Codex and Claude reviews on PR #4359.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tables StatusMonitor, EtsInspector, and LsnTracker each own a read-mostly ETS table that every shape HTTP request hits on the fast path: - StatusMonitor: `service_status/1` runs from `Api.validate_params` -> `hold_until_stack_ready` for every request. - EtsInspector `pg_inspector_table`: `load_relation_oid`, `load_relation_info`, `load_column_info`, `load_supported_features` all do an ETS lookup here before falling back to the GenServer. - LsnTracker: `determine_global_last_seen_lsn` reads it for every non-`offset=:now` request. All three tables were created without `read_concurrency`, so reads contend on the default ETS lock together with the infrequent writes that the owning GenServer makes. The sibling hot-path tables (shape_meta_table, AdmissionControl counter, PureFileStorage stack table, WriteBuffer, ConsumerRegistry) already set the flag; these three are the remaining holes on the per-request read path and the ones most likely to serialize under the concurrency the new admission control allows through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reclassify after the Snapshotter signals snapshot_started, not at load_shape return. Holding :initial across the await_snapshot_start wait makes admission-gate backpressure cover snapshot-pool contention: when Snapshotters queue for a PG connection, requests stay parked in :initial and new requests get rejected at the gate (503 + Retry-After) instead of piling up as an unbounded Postgrex checkout queue. The await result is propagated via request.snapshot_status and consumed by Shapes.get_merged_log_stream, replacing its own await_snapshot_start call. Issuing it twice would race the consumer's stop_and_clean on the snapshot-failure path, turning legitimate SnapshotError 503s into must_refetch 409s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alco active snapshot is still holding a connection so shouldn't it still count towards "things contending for a snapshot-pool connection"? |
magnetised
left a comment
There was a problem hiding this comment.
really appreciate the work that's gone into this. awesome stuff. have a couple of questions, one in the review one on the main pr, but am assuming that I've just not understood something.
🌟
| with {stack_id, :initial} <- Process.get(@admission_permit_key), | ||
| max = Map.fetch!(config[:api].max_concurrent_requests, :existing), | ||
| :ok <- | ||
| Electric.AdmissionControl.try_swap(stack_id, :initial, :existing, max_concurrent: max) do |
There was a problem hiding this comment.
what does it mean if the try_swap fails?
There was a problem hiding this comment.
The call to try_swap fails only if the :existing bucket is already at capacity. An unlikely case in practice since we configure it with a higher value than :initial.
In any case, if the swap fails, the current request keeps holding the :initial permit until it's released in the after clause of the call function.
You're right. With the changes made in this PR, the So the protection of the snapshot pull is more hypothetical than practical, considering that the In other words, it may end up being a moot point in practice, but reclassifying a shape req mid-flight has potential to make new shape creation faster (by not waiting on the full snapshot creation+serving cycle) but doesn't really make it worse in the thundering herd scenario where shape reqs are already timing out on connection pool checkout en masse. |
Summary
Two changes to
ServeShapePlug's pipeline that together make admission control cheap to run and accurately bound the resource it's supposed to protect.:resolve_existing_shapeis deleted. The admission classifier now reads a single ETS table (shape_meta_tablefor the stack) viaShapeStatus.has_shape_handle?/2. No SQLite, no GenServer call. Admission moves to the front of the plug pipeline so rejected requests don't pay for validation, shape lookup, or any of the other per-request setup.:initialto:existing. The:initialcap now bounds requests that are in validate-and-load or contending for a snapshot-pool connection, not requests currently streaming their response.Pipeline change
Before:
After:
Why this is cheaper
:resolve_existing_shapecalled intoElectric.Shapes.fetch_handle_by_shape/2, which falls through toShapeDb.handle_for_shape/2— a SQLite read on the read pool. Issue #4266 calls this out as the first bottleneck on the pre-admission path: every request, including the ones admission is about to reject, performs that read. Under thundering herd, the read pool saturates before admission can shed anything.The new classifier is
:ets.member(shape_meta_table(stack_id), handle). One lookup, no pool, no GenServer.The rescue on
ArgumentErrorcovers the brief window during stack startup where the per-stack meta table hasn't been created yet — those requests classify as:initial, which is the safer bucket.What this PR addresses, what it doesn't
ShapeCache (resolved within this PR). Before this branch, every shape request — handle or not — could end up in a
GenServer.call(ShapeCache, {:create_or_wait_shape_handle, …}, 30_000)on a cache miss, with no upstream cap on concurrent callers. With cheap admission::existingrequests short-circuit on thevalidate_shape_handleETS hash check and never touch ShapeCache; only:initialrequests can hit{:create_or_wait_shape_handle, …}, and their population is hard-capped by the:initialbucket. ShapeCache's max instantaneous concurrent-caller count is nowmax_concurrent_requests.initialrather than "however many clients showed up." The 30s@call_timeoutnow exists only to absorb pathological filesystem tail latencies, not to throttle queue depth.ShapeCache mailbox-overfill on coalesced creates (related concern, tracked in #4372). Even with the
:initialbucket capping the population, all:initialrequests for the same uncached shape still pile into the ShapeCache mailbox: handler 1 does the actual creation, handlers 2..N short-circuit via the SQLite critical fetch, but every caller occupies a mailbox slot for the duration of handler 1's work plus its own dispatch latency. #4372 proposes a GenServer-owned per-shape ETS lock observed by callers (via:ets.memberbefore they send the call) plus aPollWaitfallback — same pattern as #4371, with tighter backoff defaults to match shape-creation timescales.StatusMonitor (out of scope, tracked in #4371).
Api.validate_paramscallsStatusMonitor.wait_untilfor every request. The fast path is an ETS read and is unaffected. The slow path — stack starting, post-deploy, recovering from sleep — funnels every concurrent waiter through the StatusMonitor mailbox and replies serially on the readiness flip. The:initialbucket caps inflight requests across the validate+load_shape span but does not eliminate that single-process reply burst. Acceptable for now; #4371 has the per-process polling design with the adaptive congestion flag.EtsInspector (out of scope, tracked in #4370). Used in
validate_requestto build%Shape{}. Warm cache is in-process ETS. Cold cache and PG-degraded paths serialize every concurrent request through the inspector mailbox, and crucially, failed DB lookups are not cached — so under PG pool exhaustion the inspector re-attempts the same failing query for each queued message. The:initialbucket bounds the herd size but does not stop the inspector from re-burning the DB pool on every retry.Why reclassification happens at snapshot start
release_admission_permit/0runs incall/2'safterclause — i.e., afterApi.Response.send_stream/2has finished draining the body. For an initial snapshot of a non-trivial shape, or a long-poll, that's seconds to minutes.So before this PR,
:initialpermits were held for the entire request lifetime:After this change:
:initialresidency drops from "entire request lifetime" to "validate + load_shape + Snapshotter pool checkout." Two things matter about that last span:1. For coalesced handlers on a popular shape, the wait is essentially free.
ShapeCache.await_snapshot_startshort-circuits onShapeStatus.snapshot_started?(one ETS lookup) when the snapshot is already running. So requests on a warm shape add at most one ETS read betweenload_shapeand the swap.2. For requests that trigger genuine shape creation, the wait is what makes
:initialprotect the snapshot pool. When the Snapshotter is queued waiting for a PG connection from the snapshot pool, the handler stays parked in:initial. Once the Snapshotter actually checks out a connection and casts{:snapshot_started, _}, the handler reclassifies and the next handler can enter. So:initialis no longer just "validate-and-load throughput" — it's also "number of requests that have caused a Snapshotter to either start or wait on the pool." Under PG-pool saturation, new requests get rejected at the admission gate (503 withRetry-After) instead of piling up as an unbounded Postgrex checkout queue.The downstream
get_merged_log_streamcall no longer issues its ownawait_snapshot_start: the plug's result is propagated viarequest.snapshot_statusand consumed inline. This avoids racing the consumer'sstop_and_cleanon the snapshot-failure path, which would otherwise turn legitimateSnapshotError503s intomust_refetch409s.Permit-release safety against handler death
Permits are stashed in
Process.put(@admission_permit_key, …)at acquire/reclassify time and released incall/2'safterclause. Theafterclause covers all in-process termination paths (normal return,raise,throw,halt, mid-stream client disconnect). The remaining concern — being killed mid-flight by an external EXIT signal — is absorbed by Thousand Island's connection handler GenServer, which setsProcess.flag(:trap_exit, true)(deps/thousand_island/lib/thousand_island/handler.ex:343). Trapping converts incoming EXIT signals to mailbox messages, so the handler finishes the current request normally, releases the permit, and only then terminates on the nextreceive. The remaining gaps (:brutal_killafter shutdown timeout, externally-sent:kill) only happen during BEAM shutdown, where the ETS counters reset along with the VM.Related
Closes #4291 — AdmissionControl logic becomes expensive under thundering herd
🤖 Generated with Claude Code