Skip to content

testing: parallel benchmark results are poorly documented #31884

@aclements

Description

@aclements

The way parallel benchmark results are reported is easy to misinterpret, and the correct way to interpret them is not documented. Typically this makes perfectly-scalable benchmarks appear to perform much worse at low parallelism, and makes not-at-all-scalable benchmarks appear to be doing fine. For example, this came up in #31820 (comment)

Specifically, the "ns/op" reported is not CPU-ns/op, it's wall-ns/op. For example, suppose each op takes exactly 100 ns, regardless of parallelism. If the single-threaded benchmark runs for 1 sec, it will execute 10,000,000 ops, so ns/op = 1s/10,000,000 ops = 100 ns/op. But if the same benchmark runs 4-way parallel for 1 sec, it will execute 40,000,000 ops, so ns/op = 1s/40,000,000 = 25 ns/op. (I really wish it didn't work this way...)

Interpreting the results of CPU-bound parallel benchmarks is further complicated by hyper-threading (though this isn't the fault of the testing package).

I don't think we can change the reported ns/op at this point. We could perhaps introduce a new metric for parallel benchmarks. At the very least, we should document this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocumentationIssues describing a change to documentation.NeedsFixThe path to resolution is known, but the work has not been done.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions