Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.12.9 in phase jvm takes ~200 secs, up from ~1.4 sec #11667

Closed
yoohaemin opened this issue Aug 7, 2019 · 22 comments
Closed

2.12.9 in phase jvm takes ~200 secs, up from ~1.4 sec #11667

yoohaemin opened this issue Aug 7, 2019 · 22 comments

Comments

@yoohaemin
Copy link

yoohaemin commented Aug 7, 2019

We have a sbt build with 10 subprojects, and one of them started taking a lot more time to compile after bumping to 2.12.9. We do not have a small reproducible build, but only a thread dump of scalac and output from -Ystatistics at this point (of 2.12.9 and 2.12.8 for comparison). You can find them here.

In the file, you can notice that phase jvm took 1411.476ms (3.1%) with 2.12.8, but 197014.957ms (83.9%) with 2.12.9. Using a profiler shows that scalac is stuck in scala.tools.asm.tree.analysis.Frame.merge during the whole time.

Tried on GraalVM EE (macos) and openjdk11 with jvmci on (nixos), with consistent results.

@SethTisue
Copy link
Member

@lrytz any idea what the next investigation steps here ought to be?

@lrytz
Copy link
Member

lrytz commented Aug 7, 2019

@yoohaemin thanks for the report. Did GC show up in the profiler / could you give it a try with a larger -Xmx?

@lrytz
Copy link
Member

lrytz commented Aug 7, 2019

Looking at the git history of the backend, I don't see an obvious candidate that could be the cause of the regression. Maybe scala/scala#7780.

@yoohaemin
Copy link
Author

yoohaemin commented Aug 7, 2019

I just ran with -Xmx4g, but heap size doesn't seem to go over 1.5G at all, and GC kicks in during compilation multiple times but it is really short and quick.

@plokhotnyuk
Copy link

@yoohaemin have you tried to attach async-profiler and plot flame graphs? possible it will help to spot some new details...

@lrytz
Copy link
Member

lrytz commented Aug 7, 2019

OK, thanks. Do you have the optimizer enabled (with -opt)? If not, the only optimization that runs is dead code elimination. You could disable that with -opt:l:none.

I just realized that DCE doesn't bail out on large methods (https://github.com/scala/scala/blob/v2.12.8/src/compiler/scala/tools/nsc/backend/jvm/opt/LocalOpt.scala#L294), it should check sizeOKForBasicValue. But that was the same in 2.12.8, so the regression must be caused by something else.

I'm not sure how to diagnose this from the outside. I could most likely identify the issue quickly if I had access to the soruce code. @yoohaemin would it be possible for you to share it (privately)?

@ghostdogpr
Copy link

(I'm working with @yoohaemin)

@lrytz We didn't have the optimizer enabled. I just tried with -opt:l:none and it's better but still far from 2.12.8 numbers:

  • 2.12.8 jvm phase: 1594.854ms
  • 2.12.8 jvm phase with -opt:l:none: 1606.082ms
  • 2.12.9 jvm phase: 162922.174ms
  • 2.12.9 jvm phase with -opt:l:none: 35246.268ms

Unfortunately we can't share our code, and this project is pretty big so it's not easy to isolate the problem into a smaller build.

We haven't tried the async profiler yet, just VisualVM sampling that shows it is stuck on scala.tools.asm.tree.analysis.BasicInterpreter.merge(BasicInterpreter.java:50):

 at scala.tools.asm.tree.analysis.BasicInterpreter.merge(BasicInterpreter.java:50)
        at scala.tools.asm.tree.analysis.Frame.merge(Frame.java:682)
        at scala.tools.asm.tree.analysis.Analyzer.merge(Analyzer.java:541)
        at scala.tools.asm.tree.analysis.Analyzer.analyze(Analyzer.java:194)
        at scala.tools.nsc.backend.jvm.analysis.BackendUtils$AsmAnalyzer.<init>(BackendUtils.scala:99)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.removeUnreachableCodeImpl(LocalOpt.scala:509)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.removalRound$2(LocalOpt.scala:294)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.methodOptimizations(LocalOpt.scala:371)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.$anonfun$methodOptimizations$1(LocalOpt.scala:209)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.$anonfun$methodOptimizations$1$adapted(LocalOpt.scala:208)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt$$Lambda$16801/324387393.apply(Unknown Source)
        at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce$$Lambda$13966/448664299.apply(Unknown Source)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:158)
        at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:108)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.methodOptimizations(LocalOpt.scala:208)
        at scala.tools.nsc.backend.jvm.PostProcessor.$anonfun$localOptimizations$1(PostProcessor.scala:130)
        at scala.tools.nsc.backend.jvm.PostProcessor.localOptimizations(PostProcessor.scala:130)
        at scala.tools.nsc.backend.jvm.PostProcessor.sendToDisk(PostProcessor.scala:64)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.$anonfun$postProcessUnit$3(GeneratedClassHandler.scala:123)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.$anonfun$postProcessUnit$2(GeneratedClassHandler.scala:123)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler$$Lambda$16088/261666137.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.tools.nsc.backend.jvm.PostProcessorFrontendAccess$PostProcessorFrontendAccessImpl.withThreadLocalReporter(PostProcessorFrontendAccess.scala:225)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.$anonfun$postProcessUnit$1(GeneratedClassHandler.scala:122)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler$$Lambda$16084/1543680797.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler$$Lambda$16085/620050906.apply(Unknown Source)
        at scala.util.Success.$anonfun$map$1(Try.scala:255)
        at scala.util.Success.map(Try.scala:213)
        at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
        at scala.concurrent.Future$$Lambda$16086/289061864.apply(Unknown Source)
        at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
        at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
        at scala.concurrent.impl.Promise$$Lambda$16087/1619313601.apply(Unknown Source)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$SyncWritingClassHandler$$anonfun$$lessinit$greater$1.execute(GeneratedClassHandler.scala:177)
        at scala.concurrent.impl.ExecutionContextImpl.execute(ExecutionContextImpl.scala:24)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
        at scala.concurrent.impl.Promise$KeptPromise$Kept.onComplete(Promise.scala:372)
        at scala.concurrent.impl.Promise$KeptPromise$Kept.onComplete$(Promise.scala:371)
        at scala.concurrent.impl.Promise$KeptPromise$Successful.onComplete(Promise.scala:379)
        at scala.concurrent.impl.Promise.transform(Promise.scala:33)
        at scala.concurrent.impl.Promise.transform$(Promise.scala:31)
        at scala.concurrent.impl.Promise$KeptPromise$Successful.transform(Promise.scala:379)
        at scala.concurrent.Future.map(Future.scala:292)
        at scala.concurrent.Future.map$(Future.scala:292)
        at scala.concurrent.impl.Promise$KeptPromise$Successful.map(Promise.scala:379)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.postProcessUnit(GeneratedClassHandler.scala:118)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.process(GeneratedClassHandler.scala:111)
        at scala.tools.nsc.backend.jvm.CodeGen.genUnit(CodeGen.scala:73)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.apply(GenBCode.scala:74)
        at scala.tools.nsc.Global$GlobalPhase.applyPhase(Global.scala:453)
        at scala.tools.nsc.Global$GlobalPhase.run(Global.scala:399)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.super$run(GenBCode.scala:80)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.$anonfun$run$1(GenBCode.scala:80)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.run(GenBCode.scala:78)
        at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1503)
        at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1487)
        at scala.tools.nsc.Global$Run.compileSources(Global.scala:1480)
        at scala.tools.nsc.Global$Run.compile(Global.scala:1606)

@ghostdogpr
Copy link

With -opt:l:none it's still spending time in scala.tools.asm.Frame.merge but with a different call stack:

at scala.tools.asm.Frame.merge(Frame.java:1175)
        at scala.tools.asm.MethodWriter.computeAllFrames(MethodWriter.java:1617)
        at scala.tools.asm.MethodWriter.visitMaxs(MethodWriter.java:1553)
        at scala.tools.asm.tree.MethodNode.accept(MethodNode.java:810)
        at scala.tools.asm.tree.MethodNode.accept(MethodNode.java:690)
        at scala.tools.asm.tree.ClassNode.accept(ClassNode.java:426)
        at scala.tools.nsc.backend.jvm.PostProcessor.serializeClass(PostProcessor.scala:140)
        at scala.tools.nsc.backend.jvm.PostProcessor.sendToDisk(PostProcessor.scala:72)

@lrytz
Copy link
Member

lrytz commented Aug 7, 2019

With -opt:l:none it's still spending time in scala.tools.asm.Frame.merge but with a different call stack.

That's interesting. It really could be due to the ASM upgrade.

I suspect that there is a specific bytecode shape that trips up the ASM analyzer.

You could add https://scala-ci.typesafe.com/artifactory/scala-integration as a resolver and test with Scala versions 2.12.9-bin-b63ccf2 and 2.12.9-bin-ce9a857. That's before and after scala/scala#7780.

You also could try to:

  • git clone git@github.com:scala/scala.git
  • git checkout v2.12.9
  • sbt intellij
  • open src/intellij/scala.ipr in IntelliJ
  • run your sbt build with sbt -J-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
  • Connect the IntelliJ debugger
  • Try to suspend at a breakpoint once the compiler is stuck in Frame.merge
  • Go up the stack to a place where the MethodNode is in scope
  • Use "evaluate expression" and call AsmUtils.traceMethod(methodNode). This would print the ASM representation of the bytecode to the console

@ghostdogpr
Copy link

Interesting, it is fast with both snapshots. We'll try to find the commit that introduces the problem.

@lrytz
Copy link
Member

lrytz commented Aug 7, 2019

Here's a git bisect script that you could adapt and use: https://gist.github.com/lrytz/f92e5db08c6bad14aaf0b890ec5d3b4d

You'd have to measure the time somehow since the build doesn't fail, just takes a long time.

@ghostdogpr
Copy link

It seems to be caused by scala/scala#8061
2.12.9-bin-d386f96 is fast, 2.12.9-bin-197ff13 is slow.

@SethTisue SethTisue added this to the 2.12.10 milestone Aug 8, 2019
@SethTisue
Copy link
Member

attn @retronym

yoohaemin added a commit to yoohaemin/scala-bug-11667-reproduction that referenced this issue Aug 8, 2019
yoohaemin added a commit to yoohaemin/scala-bug-11667-reproduction that referenced this issue Aug 8, 2019
@yoohaemin
Copy link
Author

yoohaemin commented Aug 8, 2019

I brought a minimal reproduction after we investigated this issue following advice from @lrytz. Thanks everyone for the help.

https://github.com/yoohaemin/scala-bug-11667-reproduction

@SethTisue
Copy link
Member

believed fixed by scala/scala#8327

@yoohaemin @ghostdogpr would appreciate if you could test 2.12.10-bin-de413b9 and verify

@SethTisue
Copy link
Member

I tried it with https://github.com/yoohaemin/scala-bug-11667-reproduction just now and this looks good:

[info]   jvm                         : 1 spans, ()2205.651ms (8.2%)

@ghostdogpr
Copy link

@SethTisue we will be able to test on our project on Monday. But based on the repro project it looks good!

@plokhotnyuk
Copy link

@ghostdogpr it looks that in runtime your http-handler burns a lot of CPU cycles in the routing match expression too... will you consider to use some kind of prefix tree which supports wildcard for path elements instead?

@retronym
Copy link
Member

Indeed... the generated code is likely too large to be JIT compiled by the JVM, so you have the overhead of interpretation on top of any issues with cascading if/else to get down the matching handler.

@yoohaemin
Copy link
Author

Splitting the routes was indeed on our todo list (for both compile-time and runtime performance), but was a low priority since performance was not exactly our biggest concern at the moment. In any case, that big pattern match did help spot a compiler regression ;)

@yoohaemin
Copy link
Author

@SethTisue 2.12.10-bin-de413b9 works well 👍

@SethTisue
Copy link
Member

cool, thanks for confirming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants