Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance downgrade of Range.foreach compared to Scala2 #13402

Closed
lesiak opened this issue Aug 27, 2021 · 6 comments
Closed

Performance downgrade of Range.foreach compared to Scala2 #13402

lesiak opened this issue Aug 27, 2021 · 6 comments

Comments

@lesiak
Copy link

lesiak commented Aug 27, 2021

Compiler version

Scala3: 3.0.0, 3.0.1
Scala2: 2.13.0
JVM: openjdk 11.0.9.1
Scalameter: 0.21

Minimized code

package scalashop

import org.scalameter.{Key, Warmer, config}


object TestDoubleLoopPerformance {
  val standardConfig = config(
    Key.exec.minWarmupRuns := 5,
    Key.exec.maxWarmupRuns := 10,
    Key.exec.benchRuns := 10,
    // Key.verbose := true
  ) withWarmer(Warmer.Default())


  def main(args: Array[String]): Unit = {
    val whileLoopTime = standardConfig measure {
      for (iter <- 0 to 100) {
        sumWithWhileLoop(1000, 2000)
      }
    }
    println(s"while loop time: $whileLoopTime")

    val rangeLoopTime = standardConfig measure {
      for (iter <- 0 to 100) {
        sumWithRange(1000, 2000)
      }
    }
    println(s"range loop time: $rangeLoopTime")
  }

  private def sumWithRange(xEnd: Int, yEnd: Int) = {
    var sum = 0
    for {
      x <- 0 until xEnd
      y <- 0 until yEnd
    } sum += 1
    sum
  }

  private def sumWithWhileLoop(xEnd: Int, yEnd: Int) = {
    var sum = 0
    var x = 0
    while (x < xEnd) {
      var y = 0
      while (y < yEnd) {
        sum += 1
        y += 1
      }
      x += 1
    }
    sum
  }
}

Output

Scala2:
while loop time: 0.015882699999999996 ms
range loop time: 0.20659249999999996 ms

Scala3:
while loop time: 0.011921900000000001 ms
range loop time: 464.97765689999994 ms

while loop benchmark is roughly comparable, there is a massive downgrade in performance for Range.

I decompiled the code using IntelliJ
Scala2:

private int sumWithRange(final int xEnd, final int yEnd) {
    IntRef sum = IntRef.create(0);
    scala.runtime.RichInt..MODULE$.until$extension(scala.Predef..MODULE$.intWrapper(0), xEnd).foreach$mVc$sp((x) -> {
        scala.runtime.RichInt..MODULE$.until$extension(scala.Predef..MODULE$.intWrapper(0), yEnd).foreach$mVc$sp((y) -> {
            ++sum.elem;
        });
    });
    return sum.elem;
}

Scala3:

private int sumWithRange(final int xEnd, final int yEnd) {
    IntRef sum = IntRef.create(0);
    scala.runtime.RichInt..MODULE$.until$extension(scala.Predef..MODULE$.intWrapper(0), xEnd).foreach((x) -> {
        scala.runtime.RichInt..MODULE$.until$extension(scala.Predef..MODULE$.intWrapper(0), yEnd).foreach((y) -> {
            int var3 = sum.elem + 1;
            sum.elem = var3;
        });
    });
    return sum.elem;
}

And javap output:

Scala2:

private int sumWithRange(int, int);
    Code:
       0: iconst_0
       1: invokestatic  #187                // Method scala/runtime/IntRef.create:(I)Lscala/runtime/IntRef;
       4: astore_3
       5: getstatic     #192                // Field scala/runtime/RichInt$.MODULE$:Lscala/runtime/RichInt$;
       8: getstatic     #62                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
      11: iconst_0
      12: invokevirtual #196                // Method scala/Predef$.intWrapper:(I)I
      15: iload_1
      16: invokevirtual #200                // Method scala/runtime/RichInt$.until$extension:(II)Lscala/collection/immutable/Range;
      19: iload_2
      20: aload_3
      21: invokedynamic #210,  0            // InvokeDynamic #2:apply$mcVI$sp:(ILscala/runtime/IntRef;)Lscala/runtime/java8/JFunction1$mcVI$sp;
      26: invokevirtual #214                // Method scala/collection/immutable/Range.foreach$mVc$sp:(Lscala/Function1;)V
      29: aload_3
      30: getfield      #218                // Field scala/runtime/IntRef.elem:I
      33: ireturn

Scala3:

private int sumWithRange(int, int);
    Code:
       0: iconst_0
       1: invokestatic  #185                // Method scala/runtime/IntRef.create:(I)Lscala/runtime/IntRef;
       4: astore_3
       5: getstatic     #190                // Field scala/runtime/RichInt$.MODULE$:Lscala/runtime/RichInt$;
       8: getstatic     #144                // Field scala/Predef$.MODULE$:Lscala/Predef$;
      11: iconst_0
      12: invokevirtual #194                // Method scala/Predef$.intWrapper:(I)I
      15: iload_1
      16: invokevirtual #198                // Method scala/runtime/RichInt$.until$extension:(II)Lscala/collection/immutable/Range;
      19: aload_0
      20: iload_2
      21: aload_3
      22: invokedynamic #209,  0            // InvokeDynamic #2:apply$mcVI$sp:(Lscalashop/TestDoubleLoopPerformance$;ILscala/runtime/IntRef;)Lscala/runtime/java8/JFunction1$mcVI$sp;
      27: invokevirtual #213                // Method scala/collection/immutable/Range.foreach:(Lscala/Function1;)V
      30: aload_3
      31: getfield      #217                // Field scala/runtime/IntRef.elem:I
      34: ireturn

Conclusions

Scala2 uses Range.foreach$mVc$sp
Scala3 uses Range.foreach

See https://www.javadoc.io/doc/org.scala-lang.modules/scala-java8-compat_2.11/0.8.0-RC1/scala/runtime/java8/JFunction1$mcVI$sp.html#apply:DmcVI:Dsp-int-

Original SO post:
https://stackoverflow.com/questions/68941194/range-benchmark-sluggish-after-migrating-to-scala3

Expectation

Performance of Scala3 code is on par with Scala2

@LPTK
Copy link
Contributor

LPTK commented Aug 27, 2021

This is a failure of specialization. Maybe @liufengyun has an idea why it fails, given he implemented #10452.

@liufengyun
Copy link
Contributor

liufengyun commented Aug 27, 2021

This is a failure of specialization. Maybe @liufengyun has an idea why it fails, given he implemented #10452.

This is because we only implement function specialization, but not method specialization and class specialization. For this particular example, method specialization will ensure that the call f(i) in Range.foreach will call the specialized apply, thus avoid the boxing.

Method specialization and class specialization are up to debate, due to their complications (Method specialization is of less concern compared to class specialization). In Scala 3, marking Range.foreach as inline will have the same effect as method specialization.

As a side note, JVM inlining & optimization seem to be not good enough in the presence of lambdas. The loss of the optimization opportunity might be related to the complex underlying mechanism for creating and representing lambdas.

@smarter
Copy link
Member

smarter commented Aug 27, 2021

Duplicate of #8880

@smarter smarter marked this as a duplicate of #8880 Aug 27, 2021
@smarter smarter closed this as completed Aug 27, 2021
@lesiak
Copy link
Author

lesiak commented Aug 27, 2021

@smarter While the motivation of this issue is the same as #8880, I believe this one is much smaller in scope.
Range benchmark was sluggish even under Scala2 (and #8880 seems to focus on that), but the lack of specialization for methods made Scala3 even worse.

@smarter
Copy link
Member

smarter commented Aug 27, 2021

Reimplementing method specialization is not trivial and not necessarily simpler than some of the alternatives discussed in #8880, in any case what's missing here is someone to take charge of finding a solution to the problem.

@bblfish
Copy link

bblfish commented Sep 14, 2021

This is a failure of specialization. Maybe @liufengyun has an idea why it fails, given he implemented #10452.

This is because we only implement function specialization, but not method specialization and class specialization. For this particular example, method specialization will ensure that the call f(i) in Range.foreach will call the specialized apply, thus avoid the boxing.

To help us folks who are just zooming past, can someone show how the code looks like when it is using the right form of specialisation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants