Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: arm64 multiplication with constant optimization #67575

Open
egonelbre opened this issue May 22, 2024 · 3 comments
Open

cmd/compile: arm64 multiplication with constant optimization #67575

egonelbre opened this issue May 22, 2024 · 3 comments
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@egonelbre
Copy link
Contributor

egonelbre commented May 22, 2024

While looking into optimizing ed25119 verification on ARM64, I noticed that x * 19 is not optimized into shifted adds.

func multiply19(x uint64) uint64 {
    return x * 19
}
// which compiles to:
MOVD    $19, R1
MUL     R1, R0, R0

This can be reduced to:

ADD     R0<<3, R0, R1
ADD     R1<<1, R0, R0

The general form seems to be:

x * c, where
    c = 1 + 2^N + 2^(N+M), N != M;
    N > 1, M > 1 // not sure whether this restriction is necessary

Then the multiplication can be rewritten as:
    x + (x + x << M) << N

This can be checked with:
    (c-1)&1 == 0 && bits.OnesCount(c - 1) == 2

Which holds for numbers like:
    7, 11, 13, 19, 21, 25, 35, 37, 41, 49, 67, 69, 73, 81, 97, 131, 133, 137, 145, 161, 193...

There is also a similar reduction, that can be done:

x * c, where
    c = 1 + 2^N + 2^M + 2^(N+M), N < M
    N > 1  // not sure whether this restriction is necessary

Then the multiplication can be rewritten as:
    x = x + x<<N; x = x + x<<M

This can be checked with:
    (c-1)&1 == 0 && bits.OnesCount(c - 1) == 3 && highbit - lowbit = midbit

Which holds for numbers like:
    15, 23, 27, 29, 39, 43, 51, 53, 57, 71, 75, 83, 89, 99, 101, 135, 139, 147, 163, 169, 177, 195...

I didn't verify, but this might be useful on amd64 as well.


I can send a CL about this, but I'm not sure whether there are some corner cases with high c values that I'm not thinking of. Similarly, I wasn't able to figure out how to write the second reduction in SSA rules.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label May 22, 2024
@randall77
Copy link
Contributor

amd64 already reduces *19 like you suggest.

This all sounds reasonable to me.
The second reduction can be done using rules by just repeating the inner operation - the repeated part will get CSEd subsequently. (Some of the amd64 rules do exactly that.)

@randall77 randall77 added this to the Unplanned milestone May 22, 2024
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 22, 2024
@egonelbre
Copy link
Contributor Author

egonelbre commented May 23, 2024

Some additional notes. It seems there are few additional variants that follow similar patterns:

c = 1 + 2^H - 2^L, where L < H
  => (x + x<<L) - x<<H

c = 2^H - 2^L, where L < H
  => (x<<L) - x<<H

c = 2^L + 2^H, where L < H
  => (x<<L) + x<<H

PS: does anyone know whether these rules have some standard naming?

@cooler-SAI
Copy link

Some additional notes. It seems there are few additional variants that follow similar patterns:

c = 1 + 2^H - 2^L, where L < H
  => (x + x<<L) - x<<H

c = 2^H - 2^L, where L < H
  => (x<<L) - x<<H

c = 2^L + 2^H, where L < H
  => (x<<L) + x<<H

PS: does anyone know whether these rules have some standard naming?

nope, looks like it can be readable for anyone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

7 participants