FlushTracker stall: consumer killed without terminate callback (OOM, :kill signal)

Parent: #3980

## Scenario

A consumer process is killed in a way that bypasses the `terminate/2` callback:

- **`:kill` signal**: Even with `Process.flag(:trap_exit, true)` (consumer.ex:108), a `:kill` signal terminates the process immediately without calling `terminate`. From Erlang docs: "If the process receives a kill signal, it terminates, regardless of the trap_exit flag."
- **OOM killer**: The OS kills the BEAM process or the process is killed by the VM's memory limits.
- **`:brutal_kill` supervisor shutdown**: If a supervisor is configured with `shutdown: :brutal_kill`, child processes receive `:kill`.

## What happens

1. A transaction arrives. `ShapeLogCollector.publish` → `ConsumerRegistry.publish` → `broadcast` delivers the event to the consumer.
2. The consumer processes the event, replies `:ok`. `FlushTracker.handle_txn_fragment` records the shape in `last_flushed` and `min_incomplete_flush_tree`.
3. The consumer is killed (`:kill` signal, OOM, etc.) before the storage flush callback fires and `notify_flushed` is called.
4. `terminate/2` does NOT run. No cleanup happens. No `handle_writer_termination`, no `remove_shape_async`, no `FlushTracker.handle_shape_removed`.
5. The shape's entry in FlushTracker becomes the permanent minimum, blocking `last_global_flushed_offset` from advancing.

## Why existing fixes don't help

- **#3864** (fixing `handle_writer_termination` clause 3): Irrelevant — `terminate` never runs, so `handle_writer_termination` is never called.
- **#3975** (broadcast-time recovery): Only detects dead consumers at the next `publish` call. The FlushTracker entry from the *previous* successful delivery remains stuck. Even if a replacement consumer is started for the next transaction, `notify_flushed` for the old transaction's offset will never arrive.

## Fix

This scenario can only be addressed by an active detection mechanism in ShapeLogCollector (see parent issue #3980). The terminate callback path is insufficient by definition — no amount of improvement to `terminate` or `handle_writer_termination` can help when the callback doesn't execute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlushTracker stall: consumer killed without terminate callback (OOM, :kill signal) #3981

Scenario

What happens

Why existing fixes don't help

Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FlushTracker stall: consumer killed without terminate callback (OOM, :kill signal) #3981

Description

Scenario

What happens

Why existing fixes don't help

Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions