The following was reported by Magnus Egeland.
Multicast with IgnoreErrors() registers entries in MessageRouter.pending that are never removed. One-way handlers return nil, nil, so HandleRequest sends no response, and the pending entry is never cleaned up. The map grows without bound.
Impact: At 1000 req/s with 4 nodes over 10s, pending reaches 20,000+ entries/node. Unbounded memory growth; any requeuePendingMsgs() call becomes catastrophically expensive.
func TestMulticastIgnoreErrorsLeaksRouterPending(t *testing.T) {
systems := gorums.TestSystems(t, 3)
for _, sys := range systems {
sys.RegisterService(nil, func(srv *gorums.Server) {
srv.RegisterHandler(mock.TestMethod, func(_ gorums.ServerCtx, _ *gorums.Message) (*gorums.Message, error) {
return nil, nil
})
})
}
gorums.WaitForConfigCondition(t, systems[0].Config, func(cfg gorums.Configuration) bool {
return cfg.Size() == 3
})
cfg := systems[0].OutboundConfig()
for i := range 1000 {
ctx := gorums.TestContext(t, 5*time.Second)
gorums.Multicast(cfg.Context(ctx), pb.String(fmt.Sprintf("mc-%d", i)),
mock.TestMethod, gorums.IgnoreErrors())
}
time.Sleep(500 * time.Millisecond)
for _, node := range cfg.Nodes() {
if pc := node.PendingCount(); pc > 0 {
t.Errorf("node %d: pending = %d; expected 0", node.ID(), pc)
}
}
}
node 1: pending = 0 ← self-node (local dispatch)
node 2: pending = 1000 ← LEAK
node 3: pending = 1000 ← LEAK
--- FAIL
There exists an easy fix and a better fix. I will soon post the "better" fix, even though it touches more parts of the code.
The following was reported by Magnus Egeland.
Multicast with IgnoreErrors() registers entries in MessageRouter.pending that are never removed. One-way handlers return nil, nil, so HandleRequest sends no response, and the pending entry is never cleaned up. The map grows without bound.
Impact: At 1000 req/s with 4 nodes over 10s, pending reaches 20,000+ entries/node. Unbounded memory growth; any requeuePendingMsgs() call becomes catastrophically expensive.
There exists an easy fix and a better fix. I will soon post the "better" fix, even though it touches more parts of the code.