Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: reorganize associative computation to allow superscalar execution #49331

Open
josharian opened this issue Nov 4, 2021 · 3 comments
Open

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Nov 4, 2021

The compiler currently compiles a+b+c+d as a+(b+(c+d)). It should use (a+b)+(c+d) instead, because the latter can be executed out of order.

More broadly, we should balance trees of associative computation.

Doing this in the compiler with a rule tailored for a single computation type in round 4 of md5block.go yielded a 15% throughput improvement. (See below for details.)

It's not obvious to me whether we can do this with carefully crafted rewrite rules or whether a dedicated pass would be better. But it looks like there may be significant performance wins available on tight mathematical code.

I don't plan to work on this further, but I really hope someone else picks it up.

cc @randall77 @martisch @FiloSottile @mmcloughlin @mdempsky


To reproduce the md5 results, disable the optimized assembly routines, and add this rewrite rule:

(Add32 <t> c:(Const32) (Add32 d:(Load _ _) (Add32 x:(Xor32 _ _) a:(Add32 _ _)))) => (Add32 (Add32 <t> c d) (Add32 <t> x a))

and disable this one to avoid an infinite loop:

(Add32 (Add32 i:(Const32 <t>) z) x) && (z.Op != OpConst32 && x.Op != OpConst32) => (Add32 i (Add32 <t> z x))
@seebs
Copy link
Contributor

@seebs seebs commented Nov 4, 2021

	a := float64(1<<53)
	b := float64(1)
	c := float64(1)
	d := float64(0)
	e := (a+(b+(c+d)))
	f := (a+b)+(c+d)

It's probably good for integers, because I think that, with the guarantees Go gives, addition is associative in Go, but for float values, it isn't always.

Loading

@josharian
Copy link
Contributor Author

@josharian josharian commented Nov 5, 2021

@seebs the SSA backend has different ops for different data types exactly to avoid this kind of issue. (We have @dr2chase to thank for that.) Relatedly this kind of optimization is also not appropriate for adding pointers, as intermediate results may be invalid pointers.

Loading

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Nov 5, 2021

Relatedly this kind of optimization is also not appropriate for adding pointers, as intermediate results may be invalid pointers.

Note that it is safe (AFAICT) to rewrite (ptr + x1) + x2 into ptr + (x1 + x2), just not the other way around. I'm skeptical this is useful in practice though.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants