Skip to content

fix: erase persistent_term leak in GRPC.Client.Connection on disconnect#509

Merged
sleipnir merged 1 commit into
elixir-grpc:masterfrom
ryochin:fix/persistent-term-leak-on-disconnect
Mar 11, 2026
Merged

fix: erase persistent_term leak in GRPC.Client.Connection on disconnect#509
sleipnir merged 1 commit into
elixir-grpc:masterfrom
ryochin:fix/persistent-term-leak-on-disconnect

Conversation

@ryochin
Copy link
Copy Markdown
Contributor

@ryochin ryochin commented Mar 11, 2026

Problem

GRPC.Client.Connection.init/1 stores a persistent_term entry keyed by the channel's ref:

# init/1
:persistent_term.put(
  {__MODULE__, :lb_state, state.virtual_channel.ref},
  state.virtual_channel
)

On disconnect, handle_call({:disconnect, _}) drops the virtual_channel from state via Map.drop/2, then stops the GenServer via {:continue, :stop}:

# handle_call({:disconnect, ...})
keys_to_delete = [:real_channels, :virtual_channel]
new_state = Map.drop(state, keys_to_delete)
{:reply, resp, new_state, {:continue, :stop}}

However, the persistent_term entry is never erased -- neither in the disconnect handler nor in terminate/2 (which is a no-op: def terminate(_reason, _state), do: :ok).

By default, each connect/2 call generates a fresh ref via make_ref(), every connect/disconnect cycle permanently leaks one persistent_term entry.

Impact

Applications that create short-lived connections (e.g. connect -> RPC -> disconnect per request) accumulate persistent_term entries indefinitely. Unlike regular process memory, persistent_term entries are not garbage-collected and persist for the lifetime of the BEAM node, causing steady memory growth with no upper bound.

Proposed fix

Erase the persistent_term entry in handle_call({:disconnect, ...}), before Map.drop removes the ref from state:

def handle_call({:disconnect, %Channel{adapter: adapter} = channel}, _from, state) do
    resp = {:ok, %Channel{channel | adapter_payload: %{conn_pid: nil}}}
    :persistent_term.erase({__MODULE__, :lb_state, channel.ref})
    ...

Additionally, terminate/2 should erase the entry as a safety net for abnormal termination paths where disconnect is never called:

def terminate(_reason, %{virtual_channel: %{ref: ref}}) do
    :persistent_term.erase({__MODULE__, :lb_state, ref})
rescue
    _ -> :ok
end
def terminate(_reason, _state), do: :ok

Note: terminate/2 alone is insufficient because Map.drop(state, [:real_channels, :virtual_channel]) removes the ref from state before terminate is called in the normal disconnect path. Both locations are needed.

Reproduction

Requires a gRPC server listening on localhost:50051 (any service will do).

# If GRPC.Client.Supervisor is not already running:
{:ok, _} = DynamicSupervisor.start_link(strategy: :one_for_one, name: GRPC.Client.Supervisor)

count_lb_entries = fn ->
  :persistent_term.get()
  |> Enum.count(fn
    {{GRPC.Client.Connection, :lb_state, _}, _} -> true
    _ -> false
  end)
end

before = count_lb_entries.()

for _ <- 1..100 do
  {:ok, ch} = GRPC.Stub.connect("localhost:50051")
  GRPC.Stub.disconnect(ch)
end

after_ = count_lb_entries.()
IO.puts("leaked entries: #{after_ - before}")
# => leaked entries: 100 (expected: 0)

Each connect/disconnect cycle leaks one persistent_term entry because
the entry is never erased. Since persistent_term is not garbage-collected,
this causes unbounded memory growth on long-running nodes.
@sleipnir
Copy link
Copy Markdown
Collaborator

Thank you @ryochin

@sleipnir sleipnir merged commit 50877ae into elixir-grpc:master Mar 11, 2026
7 checks passed
cgreeno added a commit to cgreeno/grpc that referenced this pull request Apr 28, 2026
…ycles

Mirrors the regression pattern from PR elixir-grpc#509, which added a 100-cycle test
to prove the persistent_term leak fix. Same idea, applied to the tables
the refactor introduced: the shared Registry and the per-LB ETS tables.

Snapshots :ets.all() and the Registry size before the loop, cycles 500
connect+disconnect pairs, asserts the Registry returns to its starting
size and no more than a handful of new tables exist (VM may create a
few incidentally during the run).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants