Skip to content

Throughput regression from capacity-wake suppression in #216 #223

@hardbyte

Description

@hardbyte

Summary

A focused 128-worker bisect shows PR #216 (ec7d9ae, "Reduce empty capacity wake claims") is the load-bearing throughput regression between alpha.3 and current main.

Commit Label Run 1 Run 2 Mean @ 128w
d7748d0 alpha.3 baseline 4,973 4,766 4,870
8ebcbc5 after #214, claimer cap 5,316 5,012 5,164
d930e72 after #215, striped claims 4,676 4,997 4,837
ec7d9ae after #216, empty wake 2,921 2,254 2,588
5a8db21 after #220, current 2,869 2,590 2,729

Reading

Likely cause

The capacity-wake suppression introduced in #216 is globally too aggressive. Under high concurrency, completion-driven capacity wakes appear to be useful work-dispatch signals even after prior empty claims. Suppressing those wakes leaves workers under-refilled and shifts too much responsibility to the slower fallback poll path.

Candidate fix

Revert or rewrite the capacity-release wake suppression. A narrower candidate is to keep the exponential fallback poll backoff from #216, but restore alpha.3-style behavior where every capacity release wake drains the dispatcher.

A probe branch exists at brian/probe-unsuppress-capacity-wakes for that shape.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions