Skip to content

cmd/compile: reorganize associative computation to allow superscalar execution #49331

@josharian

Description

@josharian

The compiler currently compiles a+b+c+d as a+(b+(c+d)). It should use (a+b)+(c+d) instead, because the latter can be executed out of order.

More broadly, we should balance trees of associative computation.

Doing this in the compiler with a rule tailored for a single computation type in round 4 of md5block.go yielded a 15% throughput improvement. (See below for details.)

It's not obvious to me whether we can do this with carefully crafted rewrite rules or whether a dedicated pass would be better. But it looks like there may be significant performance wins available on tight mathematical code.

I don't plan to work on this further, but I really hope someone else picks it up.

cc @randall77 @martisch @FiloSottile @mmcloughlin @mdempsky


To reproduce the md5 results, disable the optimized assembly routines, and add this rewrite rule:

(Add32 <t> c:(Const32) (Add32 d:(Load _ _) (Add32 x:(Xor32 _ _) a:(Add32 _ _)))) => (Add32 (Add32 <t> c d) (Add32 <t> x a))

and disable this one to avoid an infinite loop:

(Add32 (Add32 i:(Const32 <t>) z) x) && (z.Op != OpConst32 && x.Op != OpConst32) => (Add32 i (Add32 <t> z x))

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Triage Backlog

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions