-
Notifications
You must be signed in to change notification settings - Fork 307
Description
Summary
On 27 Jan 2026 at ~18:09-18:11 UTC, the ShapeLogCollector on the maxwell production instance grew to 3.5GB of memory, causing the server to crash. Telemetry confirms the ShapeLogCollector was the primary memory consumer.
Timeline
- 18:10:09.998 - Two new shapes with subquery dependencies created:
97489818-1769537409997837(offers shape with projects dependency)30042127-1769537409998609(offer_items shape depending on offers)
- 18:10:10.437 - Materializer for shape
97489818crashes with "Key already exists" error - 18:10:10.449 - ShapeLogCollector crashes with
FunctionClauseErrorinDependencyLayers.add_after_dependencies/3 - 18:10:10.45x - Cascade of 80+ Materializer/Consumer shutdowns begins
- 18:10:29.702 - Memory alarm triggered:
{:process_memory_high_watermark, #PID<0.179627.0>} - 18:12 - Container restarts
Root Cause Analysis
Bug 1: Materializer "Key already exists" race condition
The Materializer received a NewRecord for a key that already existed in its index. This happened because:
- Shape
97489818was created forpublic.offerswith a subquery dependency onpublic.projects - The Materializer started, subscribed to the Consumer, then began reading from storage
- A transaction arrived with a move-in event for offer
d3c8d8a5-5060-4a36-a67d-240de0c95a88 - The record was already in the snapshot (matched via
is_template = trueOR the subquery), AND was delivered via replication withmove_tags - The Materializer's
apply_changesraised at line 317:
if is_map_key(index, key), do: raise("Key #{key} already exists")The race window exists because the Materializer subscribes to the Consumer BEFORE reading from storage:
# In handle_continue(:start_materializer, ...)
Consumer.subscribe_materializer(stack_id, shape_handle, self()) # <- Subscribes first
{:noreply, state, {:continue, {:read_stream, shape_storage}}} # <- Then reads storageBug 2: DependencyLayers missing function clause
When shape 30042127 (which depends on 97489818) tried to register after 97489818 crashed:
Electric.Shapes.DependencyLayers.add_after_dependencies([], "30042127-...", MapSet.new(["97489818-..."]))
The function has no clause for when layers is empty but deps_to_find is NOT empty:
# This clause only matches when deps_to_find is empty
defp add_after_dependencies([], shape_handle, deps_to_find) when map_size(deps_to_find) == 0 do
[MapSet.new([shape_handle])]
end
# Missing: clause for when layers is empty but deps_to_find is notMemory Growth Hypothesis
The 3.5GB growth in ShapeLogCollector is likely caused by:
-
Message queue accumulation - During the cascade failure, messages piled up faster than processing:
- Shape registration updates
- Transaction fragments from replication
- Flush notifications
- Shutdown/down notifications
-
State accumulation during failed recovery - The system spent ~2 minutes (18:10:10 to 18:12) in a failed state with continuous "Stack not ready" errors, potentially accumulating state.
-
Binary heap growth - JSON-encoded log entries and transaction data accumulating without GC during the cascade.
Evidence
Crash logs
18:10:10.437 [error] GenServer Materializer "97489818-..." terminating
** (RuntimeError) Key "public"."offers"/"d3c8d8a5-5060-4a36-a67d-240de0c95a88" already exists
18:10:10.449 [error] GenServer ShapeLogCollector terminating
** (FunctionClauseError) no function clause matching in Electric.Shapes.DependencyLayers.add_after_dependencies/3
Transaction details
%Electric.Replication.Changes.NewRecord{
relation: {"public", "offers"},
record: %{"id" => "d3c8d8a5-5060-4a36-a67d-240de0c95a88"},
key: "\"public\".\"offers\"/\"d3c8d8a5-5060-4a36-a67d-240de0c95a88\"",
move_tags: ["e12422d3af57a36d01a50b4645a517e4"] # <- Move-in event
}Proposed Fixes
Fix 1: Make Materializer idempotent
In lib/electric/shapes/consumer/materializer.ex, skip duplicates instead of raising:
%Changes.NewRecord{key: key, record: record, move_tags: move_tags},
{{index, tag_indices}, counts_and_events} ->
if is_map_key(index, key) do
# Already exists - skip duplicate (can happen during snapshot/replication race)
{{index, tag_indices}, counts_and_events}
else
{value, original_string} = cast!(record, state)
index = Map.put(index, key, value)
# ...rest of logic
endFix 2: Add missing DependencyLayers clause
In lib/electric/shapes/dependency_layers.ex:
# Handle case where dependency shapes haven't been added yet
defp add_after_dependencies([], shape_handle, deps_to_find) when map_size(deps_to_find) > 0 do
# Log warning about missing dependencies
Logger.warning("Adding shape #{shape_handle} but dependencies #{inspect(deps_to_find)} not found in layers")
[MapSet.new([shape_handle])]
endFix 3: Reorder Materializer startup (optional)
Subscribe to Consumer AFTER reading from storage to minimize the race window.
Environment
- Instance: maxwell (eu-west-1)
- Version: Electric 1.3.3
- Stack ID:
2a649dc5-b661-4918-b283-06999429a156
Related
- Crash dump not available (ephemeral storage wiped on restart)
- See also: issue for crash dump persistence