Skip to content

bug: Multicast with IgnoreErrors() leaks router pending entries #326

@meling

Description

@meling

The following was reported by Magnus Egeland.

Multicast with IgnoreErrors() registers entries in MessageRouter.pending that are never removed. One-way handlers return nil, nil, so HandleRequest sends no response, and the pending entry is never cleaned up. The map grows without bound.

Impact: At 1000 req/s with 4 nodes over 10s, pending reaches 20,000+ entries/node. Unbounded memory growth; any requeuePendingMsgs() call becomes catastrophically expensive.

func TestMulticastIgnoreErrorsLeaksRouterPending(t *testing.T) {
    systems := gorums.TestSystems(t, 3)
    for _, sys := range systems {
        sys.RegisterService(nil, func(srv *gorums.Server) {
            srv.RegisterHandler(mock.TestMethod, func(_ gorums.ServerCtx, _ *gorums.Message) (*gorums.Message, error) {
                return nil, nil
            })
        })
    }
    gorums.WaitForConfigCondition(t, systems[0].Config, func(cfg gorums.Configuration) bool {
        return cfg.Size() == 3
    })
    cfg := systems[0].OutboundConfig()
    for i := range 1000 {
        ctx := gorums.TestContext(t, 5*time.Second)
        gorums.Multicast(cfg.Context(ctx), pb.String(fmt.Sprintf("mc-%d", i)),
            mock.TestMethod, gorums.IgnoreErrors())
    }
    time.Sleep(500 * time.Millisecond)
    for _, node := range cfg.Nodes() {
        if pc := node.PendingCount(); pc > 0 {
            t.Errorf("node %d: pending = %d; expected 0", node.ID(), pc)
        }
    }
}
node 1: pending = 0     ← self-node (local dispatch)
node 2: pending = 1000  ← LEAK
node 3: pending = 1000  ← LEAK
--- FAIL

There exists an easy fix and a better fix. I will soon post the "better" fix, even though it touches more parts of the code.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions