Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: do more optimizations for ARMv6/V7 #19843

Closed
benshi001 opened this issue Apr 5, 2017 · 8 comments

Comments

Projects
None yet
4 participants
@benshi001
Copy link
Member

commented Apr 5, 2017

More ARMv6 & ARMv7 instructions are already merged into branch master. The compiler can use them to do further optimization.

Such as
(Mul16 x y) -> (MULBB x y)
(ADD z (MULBB x y) -> MULABB(x y z)
(SUB x (MUL y z) -> MULS(x y z) and so on.

I have prepared a optimization CL for all these MULA like instructions. But the final form depends on
how issue #19141 is fixed.

@bradfitz bradfitz changed the title More Optimization for ARMv6/V7 cmd/compile: do more optimizations for ARMv6/V7 Apr 5, 2017

@bradfitz bradfitz added the Performance label Apr 5, 2017

@bradfitz bradfitz added this to the Go1.9Maybe milestone Apr 5, 2017

@benshi001

This comment has been minimized.

Copy link
Member Author

commented May 5, 2017

Unfortunately the go1 benchmark shows no improvement with such optimization.

(Mul16 x y) -> (MULBB x y)
(ADD z (MULBB x y) -> MULABB(x y z)
(SUB z (MUL x y) -> MULS(x y z)

@randall77

This comment has been minimized.

Copy link
Contributor

commented May 5, 2017

go1 benchmarks (or any benchmark, really) aren't always useful. Their performance is a useful signal when:

  1. You think your change should make them faster, and it doesn't.
  2. You think your change should not make them slower, and it does.

For instance, I suspect the go1 benchmarks never do a 16-bit multiply. So only number 2 is in play.

If you have an optimization you think is useful but doesn't affect the go1 benchmarks, write your own benchmark which is helped by your change.

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2017

  1. MULBB, MULABB
  2. MULS
  3. MMUL, MMULA, MMULS

Try them again in Go1.10.

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2017

Please also change milestone to go1.10

@randall77 randall77 modified the milestones: Go1.10, Go1.9Maybe Jun 5, 2017

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2017

  1. MULAF, MULSF, MULAD, MULSD // float point version MLA/MLS
@benshi001

This comment has been minimized.

Copy link
Member Author

commented Aug 28, 2017

MULBB/MULABB have no improvement compared to MUL/MULA.

@gopherbot

This comment has been minimized.

Copy link

commented Sep 5, 2017

Change https://golang.org/cl/61150 mentions this issue: cmd/compile: optimize ARM code with efficient FP instructions

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Sep 8, 2017

MULAF, MULSF, MULAD, MULSD did not always improve code quality, since
they required resultInArg0 set to true. So they benefited
a = a + b * c
but not
a = b + c * d especially b was still in use.

@gopherbot gopherbot closed this in 2899c3e Sep 11, 2017

@golang golang locked and limited conversation to collaborators Sep 11, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.