-
Notifications
You must be signed in to change notification settings - Fork 317
Description
Summary
When Electric restarts and restores shapes with subqueries from disk, the first WAL event(s) that arrive trigger RuntimeError: Materializer for stack "..." and handle "..." is not available errors. This is a race condition between eager filter restoration and lazy consumer/materializer startup.
Root cause
On startup, two things happen in sequence:
-
ShapeLogCollectorrestores filters eagerly for all persisted shapes, including shapes with subquery dependencies. After this, the filter knows about all shapes and can try to evaluate subquery WHERE clauses. -
Consumer and Materializer processes are started lazily — only when the first event arrives for a shape (via
ConsumerRegistry).
When a WAL event arrives, the flow is:
ShapeLogCollector.handle_call
→ EventRouter.route_change
→ Filter.affected_shapes
→ WhereCondition.other_shapes_affected
→ refs_fun.(shape)
→ Materializer.get_all_as_refs
→ GenServer.call(materializer) ← process doesn't exist yet!
The filter needs the Materializer to resolve subquery link values, but the Materializer is only started after Filter.affected_shapes returns and ConsumerRegistry.publish triggers lazy consumer startup.
The error is caught by Filter.affected_shapes which falls back to returning all shape IDs (safe but noisy).
Reproduction
- Create a table with some data:
CREATE TABLE hot_rows (
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
grp_a integer NOT NULL,
grp_b integer NOT NULL,
grp_c integer NOT NULL,
payload text NOT NULL DEFAULT '',
updated_at timestamptz NOT NULL DEFAULT now()
);
-- Insert rows with random group values- Create a shape with a subquery WHERE clause:
where=id IN (SELECT id FROM hot_rows WHERE grp_a = 1)
OR id IN (SELECT id FROM hot_rows WHERE grp_b = 1)
OR id IN (SELECT id FROM hot_rows WHERE grp_c = 1)
-
Restart Electric (shapes restore from disk).
-
Update any row in the table — without any HTTP client connecting first:
UPDATE hot_rows SET payload = md5(random()::text), updated_at = now()
WHERE id = (SELECT id FROM hot_rows ORDER BY random() LIMIT 1);- Observe the error flood — one error per subquery dependency shape, per change in the transaction. Increasing the number of modified rows in a single transaction or the number of shapes proportionally increases the number of logged errors.
Example output
Startup completes normally, shapes and filters are restored:
13:03:52.356 [notice] Restored publication filters in 3ms
13:03:52.380 [notice] Restored filters for 4 shapes in 28ms
13:03:52.380 [notice] Consumers ready in 39ms (4 shapes)
A single-row UPDATE arrives ~2 seconds later. The filter tries to evaluate subquery WHERE clauses but the Materializer processes don't exist yet — one error per dependency shape (3 subqueries = 2 errors, one for the old_record and one for the record evaluation in UpdatedRecord handling):
13:03:54.827 [info] Received relation "public"."hot_rows"
13:03:54.919 [error] Unexpected error in Filter.WhereCondition.affected_shapes:
** (RuntimeError) Materializer for stack "single_stack" and handle "16096921-1772539368604093" is not available
(electric 1.4.11) lib/electric/shapes/consumer/materializer.ex:98: Electric.Shapes.Consumer.Materializer.genserver_get_link_values/1
(electric 1.4.11) lib/electric/shapes/consumer/materializer.ex:111: anonymous fn/2 in Electric.Shapes.Consumer.Materializer.get_all_as_refs/2
...
(electric 1.4.11) lib/electric/shapes/filter/where_condition.ex:210: anonymous fn/6 in Electric.Shapes.Filter.WhereCondition.other_shapes_affected/4
...
(electric 1.4.11) lib/electric/shapes/event_router.ex:149: Electric.Shapes.EventRouter.find_affected_shapes_for_change/2
13:03:54.942 [error] Unexpected error in Filter.WhereCondition.affected_shapes:
** (RuntimeError) Materializer for stack "single_stack" and handle "16096921-1772539368604093" is not available
...
Only after the errors do consumer and materializer processes get lazily started:
13:03:54.943 [debug] Starting Electric.Shapes.Consumer for 16096921-1772539368604093
13:03:54.954 [debug] Started consumer for existing handle 16096921-1772539368604093
13:03:54.954 [debug] Starting Electric.Shapes.Consumer for 79160600-1772539368614088
13:03:54.955 [debug] Started consumer for existing handle 79160600-1772539368614088
13:03:54.955 [debug] Starting Electric.Shapes.Consumer for 95336946-1772539368614654
13:03:54.956 [debug] Started consumer for existing handle 95336946-1772539368614654
...
13:03:55.022 [debug] Starting Electric.Shapes.Consumer.Materializer for 16096921-1772539368604093
13:03:55.023 [debug] Starting Electric.Shapes.Consumer.Materializer for 79160600-1772539368614088
13:03:55.023 [debug] Starting Electric.Shapes.Consumer.Materializer for 95336946-1772539368614654
13:03:55.023 [debug] Starting Electric.Shapes.Consumer for 76149289-1772539368615417
Impact
- The catch-all in
Filter.affected_shapesreturns all shape IDs when the error occurs, so no events are lost — but every shape gets notified of every change during the race window, which is wasteful. - Each change in the WAL produces one error per subquery dependency, so a multi-row transaction causes an error flood.
- Rare in production because HTTP clients typically reconnect after a restart before the first WAL event arrives, which triggers eager consumer startup. Observed twice on the
hertzinstance (ELECTRIC-5D7, ELECTRIC-5PF).