You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Electric.ShapeCache.get_or_create_shape_handle/3 is called from Api.load_shape_info on every shape request that misses the in-process fast path. The fast path checks ShapeStatus.fetch_handle_by_shape (WriteBuffer ETS, then SQLite read connection) followed by fetch_latest_offset (Storage). On miss, the caller falls through to GenServer.call(ShapeCache, {:create_or_wait_shape_handle, …}, 30_000).
When N concurrent requests for the same uncached shape arrive, all of them miss the fast path and all of them enqueue a GenServer.call. Handler 1 does the actual creation work (safe_maybe_create_shape → add_shape + start_shape); handlers 2..N short-circuit via fetch_handle_by_shape_critical finding the shape that handler 1 just created. So the work is deduplicated, but every caller still occupies a mailbox slot for the full duration of handler 1's creation plus its own dispatch latency. Under filesystem tail latency, shape creation reaches multiple seconds and the queue depth is real.
This is a sibling of #4370 (EtsInspector) and #4371 (StatusMonitor) under the thundering-herd umbrella #4266. The cheap admission control (#4292 / #4291 / PR #4359) bounds the concurrent population that can reach ShapeCache at all via the :initial bucket, but does not prevent admitted requests for the same shape from piling into the ShapeCache mailbox.
Goal
Prevent the ShapeCache mailbox from accumulating concurrent calls for the same fresh shape. Only one caller per unique shape-creation event should issue a GenServer.call; every other caller should observe the in-progress state via ETS and poll until the result is ready.
Design
The lock is GenServer-owned for writes, caller-readable for routing decisions. Callers never write to the lock table, eliminating any stale-claim concerns.
shape_has_been_activated?/2 checks the last_used_timestamp that update_last_read_time_to_now/2 sets at the very end of start_shape, so it serves as a "creation fully succeeded" signal. If creation fails, clean_shape removes the row and waiters never observe :ready, eventually timing out.
Properties
Mailbox growth bounded by a single race window. Once a shape's lock is set, all new callers bypass the GenServer entirely. The mailbox grows only during the window between "first caller's call queued" and "handler dequeues and sets the lock" — i.e. mailbox dispatch latency in the common case, the previous shape's creation time in the contended case (and even then, capped at "one entry per fast-path-miss within the window").
No :wait reply needed. Any caller whose call reaches the handler with the lock unset is either the creator (does the work) or finds the shape already exists via the critical fetch (reply with the cached result). There's no third response code.
No stale-lock recovery needed. Callers don't write the lock, so they can't strand it. The GenServer always clears the lock in the same handler invocation that set it. On GenServer restart, init/1 wipes the lock table.
Backoff considerations — PollWait must accept per-caller config
This is the part to be aware of when implementing.
The defaults in #4371 (initial 25ms → doubling → 500ms cap) are tuned for stack-readiness waiting, where the underlying event flips on a second-to-minute timescale and aggressive polling is wasted. ShapeCache is the opposite: a single shape creation completes in tens to low-hundreds of milliseconds in the common case. Polling at a 500ms cap would routinely add hundreds of milliseconds of latency for no benefit.
Schedule: 5 → 10 → 20 → 40 → 80 → 100 → 100 … ms. Most shape creations are answered within the first 2–3 polls.
The PollWait primitive proposed in #4371 already accepts these as opts, so no primitive change is needed — only that the implementation honour the per-call overrides without baking the StatusMonitor defaults into a global config.
Crash safety
If the GenServer crashes mid-creation, its state is lost but the named, public lock table persists. On restart, init/1 runs :ets.delete_all_objects(shape_create_lock(stack_id)) to wipe stale entries. Any in-flight waiters that were polling time out after their existing deadline; new requests start fresh.
Add the per-shape lock ETS table (shape_create_lock:<stack_id>), created in ShapeCache's init/1 with :public, :set, :named_table, read_concurrency: true, write_concurrency: :auto. Cleared in init/1 to handle crash recovery.
Modify get_or_create_shape_handle/3 to add the :ets.member check before GenServer.call, with the polling fallback.
Modify the handle_call({:create_or_wait_shape_handle, …}, …) body to set/clear the lock around safe_maybe_create_shape/2.
Test:
Concurrent fan-out for the same fresh shape: assert mailbox depth is bounded by the brief-window count, and N–1 waiters receive the result via polling.
Crash recovery: kill the GenServer mid-creation, assert lock table is empty after restart, assert new requests succeed.
Backoff: assert poll cadence matches the configured schedule (within jitter).
Background
Electric.ShapeCache.get_or_create_shape_handle/3is called fromApi.load_shape_infoon every shape request that misses the in-process fast path. The fast path checksShapeStatus.fetch_handle_by_shape(WriteBuffer ETS, then SQLite read connection) followed byfetch_latest_offset(Storage). On miss, the caller falls through toGenServer.call(ShapeCache, {:create_or_wait_shape_handle, …}, 30_000).When N concurrent requests for the same uncached shape arrive, all of them miss the fast path and all of them enqueue a
GenServer.call. Handler 1 does the actual creation work (safe_maybe_create_shape→add_shape+start_shape); handlers 2..N short-circuit viafetch_handle_by_shape_criticalfinding the shape that handler 1 just created. So the work is deduplicated, but every caller still occupies a mailbox slot for the full duration of handler 1's creation plus its own dispatch latency. Under filesystem tail latency, shape creation reaches multiple seconds and the queue depth is real.This is a sibling of #4370 (EtsInspector) and #4371 (StatusMonitor) under the thundering-herd umbrella #4266. The cheap admission control (#4292 / #4291 / PR #4359) bounds the concurrent population that can reach ShapeCache at all via the
:initialbucket, but does not prevent admitted requests for the same shape from piling into the ShapeCache mailbox.Goal
Prevent the ShapeCache mailbox from accumulating concurrent calls for the same fresh shape. Only one caller per unique shape-creation event should issue a
GenServer.call; every other caller should observe the in-progress state via ETS and poll until the result is ready.Design
The lock is GenServer-owned for writes, caller-readable for routing decisions. Callers never write to the lock table, eliminating any stale-claim concerns.
Caller flow
Server handler (replaces today's body)
Poll predicate
shape_has_been_activated?/2checks thelast_used_timestampthatupdate_last_read_time_to_now/2sets at the very end ofstart_shape, so it serves as a "creation fully succeeded" signal. If creation fails,clean_shaperemoves the row and waiters never observe:ready, eventually timing out.Properties
:waitreply needed. Any caller whose call reaches the handler with the lock unset is either the creator (does the work) or finds the shape already exists via the critical fetch (reply with the cached result). There's no third response code.init/1wipes the lock table.:timeoutafter the deadline. Same trade-off as EtsInspector: prevent mailbox overload and avoid serving orphaned replies #4370 / StatusMonitor: replace mailbox-based wait_until with adaptive per-process polling #4371.Backoff considerations — PollWait must accept per-caller config
This is the part to be aware of when implementing.
The defaults in #4371 (initial 25ms → doubling → 500ms cap) are tuned for stack-readiness waiting, where the underlying event flips on a second-to-minute timescale and aggressive polling is wasted. ShapeCache is the opposite: a single shape creation completes in tens to low-hundreds of milliseconds in the common case. Polling at a 500ms cap would routinely add hundreds of milliseconds of latency for no benefit.
ShapeCache's caller should pass its own backoff:
Schedule: 5 → 10 → 20 → 40 → 80 → 100 → 100 … ms. Most shape creations are answered within the first 2–3 polls.
The
PollWaitprimitive proposed in #4371 already accepts these as opts, so no primitive change is needed — only that the implementation honour the per-call overrides without baking the StatusMonitor defaults into a global config.Crash safety
If the GenServer crashes mid-creation, its state is lost but the named, public lock table persists. On restart,
init/1runs:ets.delete_all_objects(shape_create_lock(stack_id))to wipe stale entries. Any in-flight waiters that were polling time out after their existing deadline; new requests start fresh.Suggested execution
Electric.PollWaitlanding as part of StatusMonitor: replace mailbox-based wait_until with adaptive per-process polling #4371.shape_create_lock:<stack_id>), created in ShapeCache'sinit/1with:public, :set, :named_table, read_concurrency: true, write_concurrency: :auto. Cleared ininit/1to handle crash recovery.get_or_create_shape_handle/3to add the:ets.membercheck beforeGenServer.call, with the polling fallback.handle_call({:create_or_wait_shape_handle, …}, …)body to set/clear the lock aroundsafe_maybe_create_shape/2.