Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String pattern matching slower than Java String switches, doesn't generate a tableswitch #11740

Closed
lihaoyi-databricks opened this issue Sep 17, 2019 · 9 comments · Fixed by scala/scala#8451

Comments

@lihaoyi-databricks
Copy link

lihaoyi-databricks commented Sep 17, 2019

The following Scala code:

package app
object App1{
  def main(args: Array[String]): Unit = {
    val start: Long = System.currentTimeMillis()
    var count: Int = 0
    var total: Int = 0
    val items = Array("i", "am", "cow", "hear", "me", "moo", "i", "weigh", "twice", "as", "much", "as", "you")
    val num = items.length
    while(count < 1000000000){
      count += 1
      items(count % num) match{
        case "i" => total += 1
        case "am" => total += 2
        case "cow" => total += 3
        case "hear" => total += 4
        case "me" => total += 5
        case "moo" => total += 6
        case "weigh" => total += 7
        case "twice" => total += 8
        case "as" => total += 9
        case "much" => total += 10
        case "you" => total += 11
      }
    }

    println(total)
    println((System.currentTimeMillis() - start) + "s")
  }
}

Runs about 15-20% slower than the equivalent Java code:

package app;
public class App2{
    public static void main(String[] args){
       long start = System.currentTimeMillis();
       int count = 0;
       int total = 0;
       String[] items = {"i", "am", "cow", "hear", "me", "moo", "i", "weigh", "twice", "as", "much", "as", "you"};
       int num = items.length;
       while(count < 1000000000){
           count += 1;
           switch(items[count % num]){
               case "i": total += 1; break;
               case "am": total += 2; break;
               case "cow": total += 3; break;
               case "hear": total += 4; break;
               case "me": total += 5; break;
               case "moo": total += 6; break;
               case "weigh": total += 7; break;
               case "twice": total += 8; break;
               case "as": total += 9; break;
               case "much": total += 10; break;
               case "you": total += 11; break;
           }
       }
       System.out.println(total);
       System.out.println((System.currentTimeMillis() - start) + "s");
    }
}

The Scala code takes 10600 ± 500 milliseconds, while the Java code takes 8600 ± 200ms (on my 2015 Macbook Pro running Java 1.8.0_131 Scala 2.13.0), a difference of 20% or so. This is because the Scala code generates a naive sequence of cascading if-else checks, while the Java code generates first a tableswitch instruction on the pre-computed hash-codes of the various case string literals, and only after looking up the hash code does it perform an equality check for confirmation (more than one check in the case of a hash collision)

I don't see any reason why the Scala code should not run just as fast as the Java equivalent, and anyway it's surprising that the roughly-equivalent source code doesn't generate roughly-equivalent bytecode.

@lihaoyi-databricks
Copy link
Author

lihaoyi-databricks commented Sep 17, 2019

I also got a smaller speedup (~5%) using the same technique pattern matching on a sealed trait of 20 case classes, using item.getClass.getName.hashCode to populate the initial switch before falling back to isInstanceOf checking. 5% is probably not fussing over, but the 20% speedup for string switches seems significant

@plokhotnyuk
Copy link

plokhotnyuk commented Sep 17, 2019

@lihaoyi-databricks it depends on number of cases and a pattern of input data for the switch.

@martijnhoekstra
Copy link

martijnhoekstra commented Sep 17, 2019

I tried this once in a different context and ran into the problem that this is really difficult to benchmark, because String memoizes its hashcode -- checking the same string in a loop is not a good benchmark, you have to re-create the string instances, and then to properly compare take a benchmark where you only loop through and re-create the string instances.

EDIT: depending on the string, it's likely still faster. The context I tried this in at the time was case class matching on class name, but Class also memoizes its name (which then memoizes its hashcode), so there is a time/space tradeoff there that's not obvious in the benchmark.

@eed3si9n
Copy link
Member

For a potential contributor: The relevant code might be around

https://github.com/scala/scala/blob/43e040ff7e4ba92ccf223e77540580b32c1473c0/src/compiler/scala/tools/nsc/transform/patmat/MatchOptimization.scala#L507-L513

      // Constant folding sets the type of a constant tree to `ConstantType(Constant(folded))`
      // The tree itself can be a literal, an ident, a selection, ...
      object SwitchablePattern { def unapply(pat: Tree): Option[Tree] = pat.tpe match {
        case const: ConstantType if const.value.isIntRange =>
          Some(Literal(Constant(const.value.intValue))) // TODO: Java 7 allows strings in switches
        case _ => None
      }}

The todo traces itself back to virtual pat mat PR (scala/scala#202).

@sjrd
Copy link
Member

sjrd commented Sep 17, 2019

I think I said somewhere, back in the day, that this optimization should be performed by the backend, because it is very platform-dependent. I'm pretty sure the strategy would yield worse performance on JS (it's just a guess, though).

@martijnhoekstra
Copy link

Right, that's a different issue. Emitting a lookup switch by detecting and re-writing branching string equality checks directly to bytecode is not easy.

@sjrd
Copy link
Member

sjrd commented Sep 18, 2019

You do it in cleanup if you want. That phase runs after Scala.js IR generation, so it only applies to the JVM.

@hrhino hrhino self-assigned this Sep 19, 2019
@lihaoyi-databricks
Copy link
Author

@sjrd I have confirmed that hashcode-based switching performs worse in Scala.js (Chrome 77.0.3865.90) than Scala-JVM (Java 1.8.0_131):

Scala.js Scala-JVM
hashCode string matching 22444s 8108s
raw string matching 6559s 10282s

Surprisingly the raw string matching in JS performs best of all, even better than the hashcode-based matching on the JVM; I assume that the JS optimizer has some special casing to recognize this as a special form and provide bespoke optimizations for it

@sjrd
Copy link
Member

sjrd commented Sep 29, 2019

Thanks for doing the test!

hrhino added a commit to hrhino/scala that referenced this issue Oct 1, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740
@hrhino hrhino added the has PR label Oct 1, 2019
@hrhino hrhino modified the milestones: Backlog, 2.13.2 Oct 1, 2019
hrhino added a commit to hrhino/scala that referenced this issue Oct 1, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740
retronym pushed a commit to retronym/scala that referenced this issue Oct 4, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740
retronym pushed a commit to retronym/scala that referenced this issue Oct 4, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740
hrhino added a commit to hrhino/scala that referenced this issue Oct 4, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
hrhino added a commit to hrhino/scala that referenced this issue Oct 7, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
hrhino added a commit to hrhino/scala that referenced this issue Oct 8, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
hrhino added a commit to hrhino/scala that referenced this issue Oct 10, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
hrhino added a commit to hrhino/scala that referenced this issue Oct 11, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
harpocrates added a commit to harpocrates/dotty that referenced this issue Mar 29, 2021
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.
harpocrates added a commit to harpocrates/dotty that referenced this issue Mar 29, 2021
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.
harpocrates added a commit to harpocrates/dotty that referenced this issue Aug 18, 2021
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.
harpocrates added a commit to harpocrates/dotty that referenced this issue Aug 18, 2021
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.

Fixes scala#11923
olsdavis pushed a commit to olsdavis/dotty that referenced this issue Apr 4, 2022
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.

Fixes scala#11923
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants