Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power op ^ is not optimized for a: int; echo a ^ 2 case (minor bug) #10910

Closed
StefanSalewski opened this issue Mar 27, 2019 · 14 comments

Comments

Projects
None yet
6 participants
@StefanSalewski
Copy link

commented Mar 27, 2019

Generally I prefer writing a ^ 2 instead of a * a when a is an int, and I indeed recommend that.

From code size compiled with -d:release and inspired by implementation of ^ operator I have the strong impression that a * a gives much better code than a ^ 2. Code size difference is more than 50 bytes. Really not an important issue, but maybe easy to fix?

Example

import math

proc main =
  let a = 3
  echo a ^ 2

main()
@zevv

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2019

These are fundamentally different operations. a * a is a fixed product, while the ^ proc can raise to any power. I guess there could be optimization for the case a ^2, but then again, should a ^ 3 be optimized as well?

@StefanSalewski

This comment has been minimized.

Copy link
Author

commented Mar 27, 2019

Yes, a ^ 2 and a ^ 3 would be nice. Note that we generally have expressions like

(last.x - curr.x) ^ 2 # last.x and curr.x are int field here. Do not want to type (last.x - curr.x) * (last.x - curr.x)

(OK, we may create and strongly advertise proc sqr() for this use case instead, but that will not prevent people from using ^)

@dawkot

This comment has been minimized.

Copy link

commented Mar 28, 2019

I guess you could make ^ a macro that would generate the optimal code when the power is a literal or a constant.

@krux02

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Well, you can use pow, as it will generate a call to pow in C. Modern C/C++ compilers will optimize that one to a muliplication without any call into a function or loop, when the exponent is a constant integer.

proc main =
  let a = 3.0
  echo pow(a, 2)

main()

Currently I am a bit puzzled by the fact that we have both math.pow and math.`^` as they should do exactly the same(?).

@StefanSalewski

This comment has been minimized.

Copy link
Author

commented Mar 29, 2019

Well, you can use pow

I was concerned about use of ^ for plain integer arguments! As in

import math, random
proc main =

  var a, b: int
  a = rand(7)
  # b = (a + 1) ^ 2 # bigger executable
  # b = sqr(a + 1) # sqr() not available 
  b = (a + 1) * (a + 1) # fine but much typing work
  echo b

main()

Executable code size is 59416 vs 59328 with -d:release. So I strongly assume performance is concerned too.

When ^ works for ints, people generally use it. So it should give optimal code, or work not at all. (No one would try b = int((a.float + 1) ^ 2) and expect same code as for (a + 1) * (a + 1) )

Fixing this issue may be some fun, but maybe it is harder than it appears, and I am not sure if my skills are good enough for a good fix already. But I may try eventually.

@krux02

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

I am tagging this as low priority, as nobody is working on it. It is not a bug nor do I see a significant performance problem. And since you can define your own local sqr to square a number (with an even better name in my opinion) I don't see a problem for the pregramming language here.

@krux02 krux02 added the Low Priority label Mar 29, 2019

@StefanSalewski

This comment has been minimized.

Copy link
Author

commented Mar 29, 2019

Yes, low priority is fine, I would have tagged it that myself, but I have never managed tagging.

Performance impact is indeed not that big:

import random, math

proc main =
  var s, j: int
  for i in 0 .. 10000000:
    j = rand(7)
    s += j * j # j ^ 2
    # s += j ^ 2
  echo s

main()

with -d:release runtime is 0.190s vs 0.1590s

But my conclusion is still, it was a bad idea to enable ^ for ints without optimizing it well.

@mratsim

This comment has been minimized.

Copy link
Collaborator

commented Mar 29, 2019

You can use a term-rewriting macro in a helper file to do the rewrite for you today.
This is described for multiplication unrolling into addition in the manual: https://nim-lang.org/docs/manual.html#term-rewriting-macros

Otherwise I think the easiest way to implement this request is to have a static int overload.

proc `^`(x: int, y: static int): int =
  when y = 2: x * x
  when y = 3: x * x * x
  else: powImpl(x, y)
@StefanSalewski

This comment has been minimized.

Copy link
Author

commented Mar 29, 2019

mratsim, thank you very much for the suggestions. Sound both good, will try that eventually.

@krux02

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

@StefanSalewski I would like to warn you though, term-rewriting macros have significant impact on compile times the compiler tries to match them everywhere.

@StefanSalewski

This comment has been minimized.

Copy link
Author

commented May 9, 2019

I just came back to this issue. I tried hard to avoid ^ op for performance critical code, but then I have expressions like

if d < ((e.org.point.x - e.dest.point.x) * (e.org.point.x - e.dest.point.x) + (e.org.point.y - e.dest.point.y) * (e.org.point.y - e.dest.point.y)) * 1e-12:

Not that nice.

So I considered making a ^ proc variant for the plain case where exponent is a plain static int in range 2, 3, 4. That should cover 99% of all use cases.

But the funny fact is: Current ^ op is only slow, because it is not inlined! Latest test with latest Nim devel and gcc 9.1 gives perfect timing when inlined. I still wonder how gcc can optimize the Nim loop that well, as exponent is not a constant.

So what shall we do? Just add inline pragma? Or add variant with small static exponent?

Here my test code, perfect timing with the added inline pragma and compiled with -d:release of course:

# nim c -d:release k.nim
import random, math

proc `^`*[T](x: T, y: Natural): T {.inline.} =
  ## Computes ``x`` to the power ``y``.
  ##
  ## Exponent ``y`` must be non-negative, use
  ## `pow proc <#pow,float64,float64>`_ for negative exponents.
  ##
  ## See also:
  ## * `pow proc <#pow,float64,float64>`_ for negative exponent or
  ##   floats
  ## * `sqrt proc <#sqrt,float64>`_
  ## * `cbrt proc <#cbrt,float64>`_
  ##
  ## .. code-block:: nim
  ##  echo 2^3  # 8
  ##  echo -2^3 # -8
  when compiles(y >= T(0)):
    assert y >= T(0)
  else:
    assert T(y) >= T(0)
  var (x, y) = (x, y)
  result = 1

  while true:
    if (y and 1) != 0:
      result *= x
    y = y shr 1
    if y == 0:
      break
    x *= x

proc main =
  var s, j: int
  for i in 0 .. 10000000:
    j = rand(7)
    #s += j * j # j ^ 2
    s += j ^ 2
  echo s

main()
@StefanSalewski

This comment has been minimized.

Copy link
Author

commented May 9, 2019

This is the C and Assemly code for ^ without inline pragma:

N_LIB_PRIVATE N_NIMCALL(NI, roof__07l2KPpVguQjVV5JTOdtEg)(NI x, NI y) {
	NI result;
	tyTuple_1v9bKyksXWMsm0vNwmZ4EuQ T1_;
	NI x_2;
	NI y_2;
	result = (NI)0;
	T1_.Field0 = x;
	T1_.Field1 = y;
	x_2 = T1_.Field0;
	y_2 = T1_.Field1;
	result = ((NI) 1);
	{
		while (1) {
			{
				if (!!(((NI)(((NI) (y_2)) & ((NI) 1)) == ((NI) 0)))) goto LA6_;
				stareq__ogcC1Md4c289bEhAZWpmZUwsystem((&result), x_2);
			}
			LA6_: ;
			y_2 = ((NI) ((NI)((NU64)(((NI) (y_2))) >> (NU64)(((NI) 1)))));
			{
				if (!(((NI) (y_2)) == ((NI) 0))) goto LA10_;
				goto LA2;
			}
			LA10_: ;
			stareq__ogcC1Md4c289bEhAZWpmZUwsystem((&x_2), x_2);
		}
	} LA2: ;
	return result;
}
roof__07l2KPpVguQjVV5JTOdtEg:
.LFB16:
	.cfi_startproc
	movl	$1, %eax
	jmp	.L15
	.p2align 4,,10
	.p2align 3
.L20:
	imulq	%rdi, %rdi
.L15:
	testb	$1, %sil
	je	.L13
	imulq	%rdi, %rax
.L13:
	shrq	%rsi
	jne	.L20
	ret
	.cfi_endproc
.LFE16:
	.size	roof__07l2KPpVguQjVV5JTOdtEg, .-roof__07l2KPpVguQjVV5JTOdtEg
	.p2align 4
	.globl	main_r7VpEd9b9aQPUbk4pk8k4iZQ
	.hidden	main_r7VpEd9b9aQPUbk4pk8k4iZQ
	.type	main_r7VpEd9b9aQPUbk4pk8k4iZQ, @function
@StefanSalewski

This comment has been minimized.

Copy link
Author

commented May 9, 2019

My current suggestion would be to add one of the following two procs to math module for static exponent. The second one has less source code and gcc unrolls the loop perfectly, so that may be the better solution?

proc `^`*[T](x: T, y: static[Natural]): T {.inline.} =
  when y < 7:
    when y == 0:
      result = T(1)
    when y == 1:
      result = x
    when y == 2:
      result = x * x
    when y == 3:
      result = x * x * x
    when y == 4:
      result = x * x
      result *= result
    when y == 5:
      result = x * x
      result *= (result * x)
    when y == 6:
      result = x * x
      result *= (result * result)
  else:
    result = math.`^`(x, y)

proc `^`*[T](x: T, y: static[Natural]): T {.inline.} =
  when y < 10:
    result = T(1)
    var i = y
    while i > 0:
      result *= x
      dec(i)
  else:
    result = math.`^`(x, y)
@Araq

This comment has been minimized.

Copy link
Member

commented May 10, 2019

I just came back to this issue. I tried hard to avoid ^ op for performance critical code, but then I have expressions like

proc sqr*[T](x: T): T {.inline.} = x * x

narimiran added a commit to narimiran/Nim that referenced this issue May 21, 2019

narimiran added a commit to narimiran/Nim that referenced this issue May 21, 2019

@narimiran narimiran closed this in 9bd4347 May 21, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.