Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted compiler optimizations #6209

Merged
merged 6 commits into from Mar 21, 2018
Merged

Conversation

retronym
Copy link
Member

@retronym retronym commented Dec 4, 2017

No description provided.

@scala-jenkins scala-jenkins added this to the 2.13.0-M3 milestone Dec 4, 2017
@retronym
Copy link
Member Author

retronym commented Dec 4, 2017

Selected from #6115

@retronym retronym closed this Dec 4, 2017
@retronym retronym changed the title Assorted compiler optimizations Assorted compiler optimizations [ci:last-only] Dec 4, 2017
@retronym retronym reopened this Dec 4, 2017
@retronym retronym changed the title Assorted compiler optimizations [ci:last-only] Assorted compiler optimizations [ci: last-only] Dec 4, 2017
@retronym retronym force-pushed the faster/december branch 4 times, most recently from 05fc487 to 0094ced Compare December 4, 2017 23:54
@retronym
Copy link
Member Author

retronym commented Dec 5, 2017

Here are the benchmark charts for the recent history before this PR, and for the commits in this PR.

The aggregate improvement appears to be ~6%.

Only the final commit is benchmarked so far. The other points will fill in over the coming hours.

@retronym retronym added the WIP label Dec 5, 2017
@retronym
Copy link
Member Author

retronym commented Dec 6, 2017

0094cede424bc392c10d38eb88206f10a84111bf,  33550.76,    797.87,    321.75,    176.17,optimize enclosingRootClass
527611907fa0a8f67c8831b3efcc05f624b24916,  33680.53,    800.64,    321.64,    174.08,Optimize generic sig parser
80cb8e8df19eb176a099534cbfa771b9b308b0bf,  34035.56,    799.24,    325.25,    176.28,Optimize nested class collection
351fcb02f37b71f3adef39f717400d17c5bb7baa,  33931.40,    799.52,    323.17,    177.42,Use AnyRefMap in hot parts of the compiler.
a56056fa02f78a90c4e5ca9caa837dc75110087a,  34766.59,    813.61,    328.78,    176.12,Avoid nonEmpty in hot paths
79e9a6840133b946dfd96a76766b5cbdcadf8f80,  34577.24,    817.09,    326.97,    177.02,Optimize uncurry info transform
9282310887da4d11b0174c1b4b96c002e647e63c,      0.00,      0.00,      0.00,      0.00,Assert that uncurry transform is redundantly expanding aliases twice
b9452ef78b12390aed0f707174fc49714fb0cee6,  34676.71,    822.57,    329.30,    177.18,Optimize IndexedSeqOptimized.toList
05d409f07f9a55875a6d036e42d22ef34708b27f,  35669.56,    838.38,    331.44,    180.34,Merge pull request #6060 from retronym/faster/virtual-transform-minimal
abfeaa43b7bfc090feb5a17303a48e6f18ba9332,      0.00,    843.46,    343.54,    184.59,Merge pull request #6123 from joroKr21/hk-existential-bounds
f0b952d2e4772b120de7caad259a8098203120d6,  35813.36,    852.75,    341.03,    183.67,Merge pull request #6016 from som-snytt/issue/9750

@retronym
Copy link
Member Author

retronym commented Dec 6, 2017

The biggest wins appear to be

  • Optimize IndexedSeqOptimized.toList
  • Use AnyRefMap in hot parts of the compiler.
  • Optimize generic sig parser

The toList change could be deemed slightly controversial as it calls apply in reverse order. This would change the order of side effects in views over IndexedSeqOptimized.

The uncurry change needs some more investigation on the correctness front, the intermediate commit I added to assert what I'd assumed in the refactoring failed. It does not appear to be a major performance win, so I'll defer that change until I have time to investigate.

"optimize enclosingRootClass" needed some last minute changes to be able to distinguish JavaMirror based symbol tables (reflect.runtime.universe and mkToolBox) from the those built on a traditional classpath. There is a chance that the isJavaMirrorUniverse call I needed to add adds some cost that outweighs the benefit of not walking the owner chain. I'll play with alternatives and microbenchmark before commiting to a change.

@retronym
Copy link
Member Author

retronym commented Dec 6, 2017

I've pushed the trimmed down version discussed above. For posterity, the original sequence of commits are in branch faster/december-take1.

@retronym retronym changed the title Assorted compiler optimizations [ci: last-only] Assorted compiler optimizations Dec 6, 2017
@retronym
Copy link
Member Author

retronym commented Dec 6, 2017

/sync

@adriaanm adriaanm modified the milestones: 2.13.0-M3, 2.13.0-M4 Dec 13, 2017
@retronym retronym requested a review from lrytz December 20, 2017 07:01
@retronym retronym removed the WIP label Dec 20, 2017
Copy link
Member

@jvican jvican left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to th statistics infrastructure LGTM. I would also welcome a more aggressive move of some other compiler-specific (and not potentially interesting to users) to the hot mode.

Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit surprising that nonEmpty -> !isEmpty makes a measurable difference.. Do you have an explanation? Could we add an override somewhere in the collections?

i -= 1
}
result
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @szeiger - does that need to be ported to collection-strawman? i couldn't find WrappedArray there.

To avoid the reversed iteration order, we could use a version that mutates ::.tl

  override def toList: List[A] = {
    if (length == 0) Nil
    else {
      val res = immutable.::(apply(0), Nil)
      var cur = res
      var i = 1
      while (i < length) {
        val next = immutable.::(apply(i), Nil)
        cur.tl = next
        cur = next
        i += 1
      }
      res
    }
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably local val length as well

@retronym retronym added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Feb 1, 2018
@jvican
Copy link
Member

jvican commented Feb 11, 2018

I've applied b175218 in my scalac fork to compile Scalatest's test suite with statistics enabled, and I'm seeing a 20s improvements (from 111s to 90s). The Scalatest test suite has around 300.000 LOC.

Copy link
Contributor

@mkeskells mkeskells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the effect with statistics disabled. It seems that this includes removal of lots of statistics capture but I am only looking at a quick summary on my phone

My understanding is that the statistics is low to nil overhead with statistics disabled and that is the case that we are optimising for. Statistics are useful to identify the bottlenecks but the only performance figures that matter are in the production use case IMO

It would be good to see a breakdown of the different changes for this pr rather that just a summary. I have some tooling to assist with that is you need it contact me or @rorygraves for details

@retronym
Copy link
Member Author

The only contentious part of this is 84890f3. @szeiger perhaps you can think about whether reversing the order of calls to IndexedSeqOptimized.apply in toList is okay, or if we should refactor to use tail mutation (perhaps via a ListBuffer). Merging in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance the need for speed. usually compiler performance, sometimes runtime performance.
Projects
None yet
6 participants