New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: lower % using multiply and subtract when the result of / is known #16416

Open
josharian opened this Issue Jul 18, 2016 · 8 comments

Comments

Projects
None yet
4 participants
@josharian
Contributor

josharian commented Jul 18, 2016

CL 25004 allows amd64 to compute a % b and a / b with a single instruction. However, rewriting code to use it currently risks a regression on architectures with a divmod instruction. We could probably do the rewrite at a generic level instead, and lower a % b to a - b * (a / b) on architectures without divmod. Then we could use the clearer, faster formulation everywhere.

cc @randall77 @MichaelTJones @martisch @bradfitz

@josharian josharian added this to the Unplanned milestone Jul 18, 2016

@randall77

This comment has been minimized.

Contributor

randall77 commented Jul 19, 2016

The tricky part is doing a % b -> a - b * (a/b) only when the divide is also used in the function. You don't want to do this rewrite if the divide isn't used elsewhere.

@MichaelTJones

This comment has been minimized.

Contributor

MichaelTJones commented Jul 19, 2016

Would SSA phi handling not kill the path to an unused result?

From: Keith Randall notifications@github.com
Reply-To: golang/go reply@reply.github.com
Date: Monday, July 18, 2016 at 5:04 PM
To: golang/go go@noreply.github.com
Cc: Michael Jones michael.jones@gmail.com, Mention mention@noreply.github.com
Subject: Re: [golang/go] cmd/compile: lower % using multiply and subtract when the result of / is known (#16416)

The tricky part is doing a % b -> a - b * (a/b) only when the divide is also used in the function. You don't want to do this rewrite if the divide isn't used elsewhere.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@randall77

This comment has been minimized.

Contributor

randall77 commented Jul 19, 2016

SSA does delete unused code. I don't think that's relevant here. For the function

func f(x, y int) int {
   return x % y
}

We want to compile it to a MOD instruction, not a DIV, MUL, and SUB. In contrast,

func g(x, y int) (int, int) {
   return x/y, x%y
}

We want to compile it to a DIV, MUL, and SUB instead of a DIV and MOD.

This assumes we have no DIVMOD instruction, like x86 has. In that case, we want to use that instruction regardless (unless it is for some reason slower than DIV or MOD by itself).

@minux

This comment has been minimized.

Member

minux commented Jul 19, 2016

We already has the concept of multiple results (for ARM 64-bit arithmetic),
could we introduce divmod SSA Op and always use it when converting both
ODIV and OMOD to SSA. CSE will eliminate multiple uses and then when
lowering, use DIV/MOD/DIVMOD or DIV+MUL+SUB instruction as appropriate
depending on which results are being used?

@randall77

This comment has been minimized.

Contributor

randall77 commented Aug 31, 2016

@minux, we could do that but it's a bit of special-case magic.

The other option is to always rewrite x % y -> x - y *(x/y), then rewrite it back to mod during lowering if there's only one use of x/y. That would require some coordination with CSE to make sure we count the uses of x/y correctly. Still some magic, maybe.

@josharian

This comment has been minimized.

Contributor

josharian commented Sep 1, 2016

Thinking out loud, here's a possibly terrible, half-baked idea. Introduce a Cheaper Op. (Choose? Any?) It takes a variable number of arguments, all of which must contain equivalent values. Rewrite x%y to Cheaper(x%y, x-y*(x/y)). During arch-specific optimizations, as appropriate, rewrite Cheaper(Select(...), ...) to the Select and afterwards (lower in file), rewrite Cheaper(v, w) to w.

@MichaelTJones

This comment has been minimized.

Contributor

MichaelTJones commented Sep 1, 2016

Isn’t this pretty much what the assembler-time instruction selection is doing? Just riffing on the open-thinking…

We could have some Mod-like compile-time pseudo instructions

N DIV D => Q,R

N MOD D => R

With the logic like this:

PseudoDiv(N,D) => Phi(Q), Phi(R)

Where what gets used later determines what instructions gets scheduled up front. This is the same idea as Josh’s but using (abusing?) the existing phi() mechanism of SSA.

From: Josh Bleecher Snyder notifications@github.com
Reply-To: golang/go reply@reply.github.com
Date: Wednesday, August 31, 2016 at 11:16 PM
To: golang/go go@noreply.github.com
Cc: Michael Jones michael.jones@gmail.com, Mention mention@noreply.github.com
Subject: Re: [golang/go] cmd/compile: lower % using multiply and subtract when the result of / is known (#16416)

Thinking out loud, here's a possibly terrible, half-baked idea. Introduce a Cheaper Op. (Choose? Any?) It takes a variable number of arguments, all of which must contain equivalent values. Rewrite x%y to Cheaper(x%y, x-y*(x/y)). During arch-specific optimizations, as appropriate, rewrite Cheaper(Select(...), ...) to the Select and afterwards (lower in file), rewrite Cheaper(v, w) to w.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@josharian

This comment has been minimized.

Contributor

josharian commented Sep 5, 2016

Poked at this a bit. Two observations:

  • For constant denominator (n / 10, n % 10), which is probably the most common case, we already use magic multiplication and then a - b * (a / b) on all platforms. I benchmarked and this appears 5x faster on x86 than using IDIVQ (2ns vs 10ns). There's a little bit of extra overhead with the IDIVQ checking for div by -1, but that branch would have been perfectly predictable in my benchmark.
  • For ARM, where DIV and MOD are lowered into runtime calls, adapting the existing amd64 approach and adding new ADIVMOD(U) calls and matching runtime functions would have high yield. But this only matters for the non-const case.

Before continuing on the original issue, I think it's worth knowing how common non-const divmod is. And even then, adding ADIVMOD(U) to ARM is probably the place to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment