Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.12.9 in phase jvm takes ~200 secs, up from ~1.4 sec #11667

Closed
yoohaemin opened this issue Aug 7, 2019 · 22 comments

Comments

@yoohaemin
Copy link

commented Aug 7, 2019

We have a sbt build with 10 subprojects, and one of them started taking a lot more time to compile after bumping to 2.12.9. We do not have a small reproducible build, but only a thread dump of scalac and output from -Ystatistics at this point (of 2.12.9 and 2.12.8 for comparison). You can find them here.

In the file, you can notice that phase jvm took 1411.476ms (3.1%) with 2.12.8, but 197014.957ms (83.9%) with 2.12.9. Using a profiler shows that scalac is stuck in scala.tools.asm.tree.analysis.Frame.merge during the whole time.

Tried on GraalVM EE (macos) and openjdk11 with jvmci on (nixos), with consistent results.

@SethTisue

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

@lrytz any idea what the next investigation steps here ought to be?

@lrytz

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

@yoohaemin thanks for the report. Did GC show up in the profiler / could you give it a try with a larger -Xmx?

@lrytz

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

Looking at the git history of the backend, I don't see an obvious candidate that could be the cause of the regression. Maybe scala/scala#7780.

@yoohaemin

This comment has been minimized.

Copy link
Author

commented Aug 7, 2019

I just ran with -Xmx4g, but heap size doesn't seem to go over 1.5G at all, and GC kicks in during compilation multiple times but it is really short and quick.

@plokhotnyuk

This comment has been minimized.

Copy link

commented Aug 7, 2019

@yoohaemin have you tried to attach async-profiler and plot flame graphs? possible it will help to spot some new details...

@lrytz

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

OK, thanks. Do you have the optimizer enabled (with -opt)? If not, the only optimization that runs is dead code elimination. You could disable that with -opt:l:none.

I just realized that DCE doesn't bail out on large methods (https://github.com/scala/scala/blob/v2.12.8/src/compiler/scala/tools/nsc/backend/jvm/opt/LocalOpt.scala#L294), it should check sizeOKForBasicValue. But that was the same in 2.12.8, so the regression must be caused by something else.

I'm not sure how to diagnose this from the outside. I could most likely identify the issue quickly if I had access to the soruce code. @yoohaemin would it be possible for you to share it (privately)?

@ghostdogpr

This comment has been minimized.

Copy link

commented Aug 7, 2019

(I'm working with @yoohaemin)

@lrytz We didn't have the optimizer enabled. I just tried with -opt:l:none and it's better but still far from 2.12.8 numbers:

  • 2.12.8 jvm phase: 1594.854ms
  • 2.12.8 jvm phase with -opt:l:none: 1606.082ms
  • 2.12.9 jvm phase: 162922.174ms
  • 2.12.9 jvm phase with -opt:l:none: 35246.268ms

Unfortunately we can't share our code, and this project is pretty big so it's not easy to isolate the problem into a smaller build.

We haven't tried the async profiler yet, just VisualVM sampling that shows it is stuck on scala.tools.asm.tree.analysis.BasicInterpreter.merge(BasicInterpreter.java:50):

 at scala.tools.asm.tree.analysis.BasicInterpreter.merge(BasicInterpreter.java:50)
        at scala.tools.asm.tree.analysis.Frame.merge(Frame.java:682)
        at scala.tools.asm.tree.analysis.Analyzer.merge(Analyzer.java:541)
        at scala.tools.asm.tree.analysis.Analyzer.analyze(Analyzer.java:194)
        at scala.tools.nsc.backend.jvm.analysis.BackendUtils$AsmAnalyzer.<init>(BackendUtils.scala:99)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.removeUnreachableCodeImpl(LocalOpt.scala:509)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.removalRound$2(LocalOpt.scala:294)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.methodOptimizations(LocalOpt.scala:371)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.$anonfun$methodOptimizations$1(LocalOpt.scala:209)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.$anonfun$methodOptimizations$1$adapted(LocalOpt.scala:208)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt$$Lambda$16801/324387393.apply(Unknown Source)
        at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce$$Lambda$13966/448664299.apply(Unknown Source)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:158)
        at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:108)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.methodOptimizations(LocalOpt.scala:208)
        at scala.tools.nsc.backend.jvm.PostProcessor.$anonfun$localOptimizations$1(PostProcessor.scala:130)
        at scala.tools.nsc.backend.jvm.PostProcessor.localOptimizations(PostProcessor.scala:130)
        at scala.tools.nsc.backend.jvm.PostProcessor.sendToDisk(PostProcessor.scala:64)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.$anonfun$postProcessUnit$3(GeneratedClassHandler.scala:123)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.$anonfun$postProcessUnit$2(GeneratedClassHandler.scala:123)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler$$Lambda$16088/261666137.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.tools.nsc.backend.jvm.PostProcessorFrontendAccess$PostProcessorFrontendAccessImpl.withThreadLocalReporter(PostProcessorFrontendAccess.scala:225)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.$anonfun$postProcessUnit$1(GeneratedClassHandler.scala:122)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler$$Lambda$16084/1543680797.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler$$Lambda$16085/620050906.apply(Unknown Source)
        at scala.util.Success.$anonfun$map$1(Try.scala:255)
        at scala.util.Success.map(Try.scala:213)
        at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
        at scala.concurrent.Future$$Lambda$16086/289061864.apply(Unknown Source)
        at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
        at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
        at scala.concurrent.impl.Promise$$Lambda$16087/1619313601.apply(Unknown Source)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$SyncWritingClassHandler$$anonfun$$lessinit$greater$1.execute(GeneratedClassHandler.scala:177)
        at scala.concurrent.impl.ExecutionContextImpl.execute(ExecutionContextImpl.scala:24)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
        at scala.concurrent.impl.Promise$KeptPromise$Kept.onComplete(Promise.scala:372)
        at scala.concurrent.impl.Promise$KeptPromise$Kept.onComplete$(Promise.scala:371)
        at scala.concurrent.impl.Promise$KeptPromise$Successful.onComplete(Promise.scala:379)
        at scala.concurrent.impl.Promise.transform(Promise.scala:33)
        at scala.concurrent.impl.Promise.transform$(Promise.scala:31)
        at scala.concurrent.impl.Promise$KeptPromise$Successful.transform(Promise.scala:379)
        at scala.concurrent.Future.map(Future.scala:292)
        at scala.concurrent.Future.map$(Future.scala:292)
        at scala.concurrent.impl.Promise$KeptPromise$Successful.map(Promise.scala:379)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.postProcessUnit(GeneratedClassHandler.scala:118)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.process(GeneratedClassHandler.scala:111)
        at scala.tools.nsc.backend.jvm.CodeGen.genUnit(CodeGen.scala:73)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.apply(GenBCode.scala:74)
        at scala.tools.nsc.Global$GlobalPhase.applyPhase(Global.scala:453)
        at scala.tools.nsc.Global$GlobalPhase.run(Global.scala:399)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.super$run(GenBCode.scala:80)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.$anonfun$run$1(GenBCode.scala:80)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.run(GenBCode.scala:78)
        at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1503)
        at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1487)
        at scala.tools.nsc.Global$Run.compileSources(Global.scala:1480)
        at scala.tools.nsc.Global$Run.compile(Global.scala:1606)
@ghostdogpr

This comment has been minimized.

Copy link

commented Aug 7, 2019

With -opt:l:none it's still spending time in scala.tools.asm.Frame.merge but with a different call stack:

at scala.tools.asm.Frame.merge(Frame.java:1175)
        at scala.tools.asm.MethodWriter.computeAllFrames(MethodWriter.java:1617)
        at scala.tools.asm.MethodWriter.visitMaxs(MethodWriter.java:1553)
        at scala.tools.asm.tree.MethodNode.accept(MethodNode.java:810)
        at scala.tools.asm.tree.MethodNode.accept(MethodNode.java:690)
        at scala.tools.asm.tree.ClassNode.accept(ClassNode.java:426)
        at scala.tools.nsc.backend.jvm.PostProcessor.serializeClass(PostProcessor.scala:140)
        at scala.tools.nsc.backend.jvm.PostProcessor.sendToDisk(PostProcessor.scala:72)
@lrytz

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

With -opt:l:none it's still spending time in scala.tools.asm.Frame.merge but with a different call stack.

That's interesting. It really could be due to the ASM upgrade.

I suspect that there is a specific bytecode shape that trips up the ASM analyzer.

You could add https://scala-ci.typesafe.com/artifactory/scala-integration as a resolver and test with Scala versions 2.12.9-bin-b63ccf2 and 2.12.9-bin-ce9a857. That's before and after scala/scala#7780.

You also could try to:

  • git clone git@github.com:scala/scala.git
  • git checkout v2.12.9
  • sbt intellij
  • open src/intellij/scala.ipr in IntelliJ
  • run your sbt build with sbt -J-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
  • Connect the IntelliJ debugger
  • Try to suspend at a breakpoint once the compiler is stuck in Frame.merge
  • Go up the stack to a place where the MethodNode is in scope
  • Use "evaluate expression" and call AsmUtils.traceMethod(methodNode). This would print the ASM representation of the bytecode to the console
@ghostdogpr

This comment has been minimized.

Copy link

commented Aug 7, 2019

Interesting, it is fast with both snapshots. We'll try to find the commit that introduces the problem.

@lrytz

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

Here's a git bisect script that you could adapt and use: https://gist.github.com/lrytz/f92e5db08c6bad14aaf0b890ec5d3b4d

You'd have to measure the time somehow since the build doesn't fail, just takes a long time.

@ghostdogpr

This comment has been minimized.

Copy link

commented Aug 8, 2019

It seems to be caused by scala/scala#8061
2.12.9-bin-d386f96 is fast, 2.12.9-bin-197ff13 is slow.

@SethTisue SethTisue added this to the 2.12.10 milestone Aug 8, 2019

@SethTisue

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

attn @retronym

@SethTisue SethTisue added the blocker label Aug 8, 2019

yoohaemin added a commit to yoohaemin/scala-bug-11667-reproduction that referenced this issue Aug 8, 2019

yoohaemin added a commit to yoohaemin/scala-bug-11667-reproduction that referenced this issue Aug 8, 2019

@SethTisue SethTisue referenced this issue Aug 8, 2019
0 of 1 task complete
@yoohaemin

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

I brought a minimal reproduction after we investigated this issue following advice from @lrytz. Thanks everyone for the help.

https://github.com/yoohaemin/scala-bug-11667-reproduction

@SethTisue

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

believed fixed by scala/scala#8327

@yoohaemin @ghostdogpr would appreciate if you could test 2.12.10-bin-de413b9 and verify

@SethTisue SethTisue closed this Aug 16, 2019

@SethTisue

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

I tried it with https://github.com/yoohaemin/scala-bug-11667-reproduction just now and this looks good:

[info]   jvm                         : 1 spans, ()2205.651ms (8.2%)
@ghostdogpr

This comment has been minimized.

Copy link

commented Aug 16, 2019

@SethTisue we will be able to test on our project on Monday. But based on the repro project it looks good!

@plokhotnyuk

This comment has been minimized.

Copy link

commented Aug 16, 2019

@ghostdogpr it looks that in runtime your http-handler burns a lot of CPU cycles in the routing match expression too... will you consider to use some kind of prefix tree which supports wildcard for path elements instead?

@retronym

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

Indeed... the generated code is likely too large to be JIT compiled by the JVM, so you have the overhead of interpretation on top of any issues with cascading if/else to get down the matching handler.

@yoohaemin

This comment has been minimized.

Copy link
Author

commented Aug 19, 2019

Splitting the routes was indeed on our todo list (for both compile-time and runtime performance), but was a low priority since performance was not exactly our biggest concern at the moment. In any case, that big pattern match did help spot a compiler regression ;)

@yoohaemin

This comment has been minimized.

Copy link
Author

commented Aug 19, 2019

@SethTisue 2.12.10-bin-de413b9 works well 👍

@SethTisue

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

cool, thanks for confirming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.