Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String pattern matching slower than Java String switches, doesn't generate a tableswitch #11740

Open
lihaoyi-databricks opened this issue Sep 17, 2019 · 9 comments · May be fixed by scala/scala#8451

Comments

@lihaoyi-databricks
Copy link

commented Sep 17, 2019

The following Scala code:

package app
object App1{
  def main(args: Array[String]): Unit = {
    val start: Long = System.currentTimeMillis()
    var count: Int = 0
    var total: Int = 0
    val items = Array("i", "am", "cow", "hear", "me", "moo", "i", "weigh", "twice", "as", "much", "as", "you")
    val num = items.length
    while(count < 1000000000){
      count += 1
      items(count % num) match{
        case "i" => total += 1
        case "am" => total += 2
        case "cow" => total += 3
        case "hear" => total += 4
        case "me" => total += 5
        case "moo" => total += 6
        case "weigh" => total += 7
        case "twice" => total += 8
        case "as" => total += 9
        case "much" => total += 10
        case "you" => total += 11
      }
    }

    println(total)
    println((System.currentTimeMillis() - start) + "s")
  }
}

Runs about 15-20% slower than the equivalent Java code:

package app;
public class App2{
    public static void main(String[] args){
       long start = System.currentTimeMillis();
       int count = 0;
       int total = 0;
       String[] items = {"i", "am", "cow", "hear", "me", "moo", "i", "weigh", "twice", "as", "much", "as", "you"};
       int num = items.length;
       while(count < 1000000000){
           count += 1;
           switch(items[count % num]){
               case "i": total += 1; break;
               case "am": total += 2; break;
               case "cow": total += 3; break;
               case "hear": total += 4; break;
               case "me": total += 5; break;
               case "moo": total += 6; break;
               case "weigh": total += 7; break;
               case "twice": total += 8; break;
               case "as": total += 9; break;
               case "much": total += 10; break;
               case "you": total += 11; break;
           }
       }
       System.out.println(total);
       System.out.println((System.currentTimeMillis() - start) + "s");
    }
}

The Scala code takes 10600 ± 500 milliseconds, while the Java code takes 8600 ± 200ms (on my 2015 Macbook Pro running Java 1.8.0_131 Scala 2.13.0), a difference of 20% or so. This is because the Scala code generates a naive sequence of cascading if-else checks, while the Java code generates first a tableswitch instruction on the pre-computed hash-codes of the various case string literals, and only after looking up the hash code does it perform an equality check for confirmation (more than one check in the case of a hash collision)

I don't see any reason why the Scala code should not run just as fast as the Java equivalent, and anyway it's surprising that the roughly-equivalent source code doesn't generate roughly-equivalent bytecode.

@lihaoyi-databricks

This comment has been minimized.

Copy link
Author

commented Sep 17, 2019

I also got a smaller speedup (~5%) using the same technique pattern matching on a sealed trait of 20 case classes, using item.getClass.getName.hashCode to populate the initial switch before falling back to isInstanceOf checking. 5% is probably not fussing over, but the 20% speedup for string switches seems significant

@plokhotnyuk

This comment has been minimized.

Copy link

commented Sep 17, 2019

@lihaoyi-databricks it depends on number of cases and a pattern of input data for the switch.

@martijnhoekstra

This comment has been minimized.

Copy link

commented Sep 17, 2019

I tried this once in a different context and ran into the problem that this is really difficult to benchmark, because String memoizes its hashcode -- checking the same string in a loop is not a good benchmark, you have to re-create the string instances, and then to properly compare take a benchmark where you only loop through and re-create the string instances.

EDIT: depending on the string, it's likely still faster. The context I tried this in at the time was case class matching on class name, but Class also memoizes its name (which then memoizes its hashcode), so there is a time/space tradeoff there that's not obvious in the benchmark.

@SethTisue SethTisue added this to the Backlog milestone Sep 17, 2019
@eed3si9n

This comment has been minimized.

Copy link
Member

commented Sep 17, 2019

For a potential contributor: The relevant code might be around

https://github.com/scala/scala/blob/43e040ff7e4ba92ccf223e77540580b32c1473c0/src/compiler/scala/tools/nsc/transform/patmat/MatchOptimization.scala#L507-L513

      // Constant folding sets the type of a constant tree to `ConstantType(Constant(folded))`
      // The tree itself can be a literal, an ident, a selection, ...
      object SwitchablePattern { def unapply(pat: Tree): Option[Tree] = pat.tpe match {
        case const: ConstantType if const.value.isIntRange =>
          Some(Literal(Constant(const.value.intValue))) // TODO: Java 7 allows strings in switches
        case _ => None
      }}

The todo traces itself back to virtual pat mat PR (scala/scala#202).

@sjrd

This comment has been minimized.

Copy link
Member

commented Sep 17, 2019

I think I said somewhere, back in the day, that this optimization should be performed by the backend, because it is very platform-dependent. I'm pretty sure the strategy would yield worse performance on JS (it's just a guess, though).

@martijnhoekstra

This comment has been minimized.

Copy link

commented Sep 18, 2019

Right, that's a different issue. Emitting a lookup switch by detecting and re-writing branching string equality checks directly to bytecode is not easy.

@sjrd

This comment has been minimized.

Copy link
Member

commented Sep 18, 2019

You do it in cleanup if you want. That phase runs after Scala.js IR generation, so it only applies to the JVM.

@hrhino hrhino self-assigned this Sep 19, 2019
@lihaoyi-databricks

This comment has been minimized.

Copy link
Author

commented Sep 29, 2019

@sjrd I have confirmed that hashcode-based switching performs worse in Scala.js (Chrome 77.0.3865.90) than Scala-JVM (Java 1.8.0_131):

Scala.js Scala-JVM
hashCode string matching 22444s 8108s
raw string matching 6559s 10282s

Surprisingly the raw string matching in JS performs best of all, even better than the hashcode-based matching on the JVM; I assume that the JS optimizer has some special casing to recognize this as a special form and provide bespoke optimizations for it

@sjrd

This comment has been minimized.

Copy link
Member

commented Sep 29, 2019

Thanks for doing the test!

hrhino added a commit to hrhino/scala that referenced this issue Oct 1, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740
@hrhino hrhino added the has PR label Oct 1, 2019
@hrhino hrhino modified the milestones: Backlog, 2.13.2 Oct 1, 2019
hrhino added a commit to hrhino/scala that referenced this issue Oct 1, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740
retronym added a commit to retronym/scala that referenced this issue Oct 4, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740
retronym added a commit to retronym/scala that referenced this issue Oct 4, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740
hrhino added a commit to hrhino/scala that referenced this issue Oct 4, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
hrhino added a commit to hrhino/scala that referenced this issue Oct 7, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
hrhino added a commit to hrhino/scala that referenced this issue Oct 8, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
hrhino added a commit to hrhino/scala that referenced this issue Oct 10, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
hrhino added a commit to hrhino/scala that referenced this issue Oct 11, 2019
Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.