Prune polymorphic implicits more aggressively #6580

milessabin · 2018-04-30T18:36:17Z

In rankImplicits, before we attempt to fully typecheck the pending candidate implicit, we first attempt to partially instantiate type variables in both the candidate and the target type and check for compatibility. If the compatibility check fails we can immediately prune the the candidate without having to fully typecheck it.

In the kinds of implicit searches typical of the inductive style found in shapeless and related libraries this can result in a drastic reduction in the search space and a corresponding reduction in compile times.

This commit is much simpler, more generally applicable, and less invasive than my earlier work on inductive implicits in #6481 which was doing similar pruning almost by accident. It turns out that almost all of the speedups in that earlier PR were due to the pruning rather than to any special casing of inductive patterns.

The compilation benchmark (a shapeless-style "select element by type from a large HList") from that PR is carried over here (in test/induction) and shows the same performance improvements,

1) baseline - scalac 2.13.x
2)            scalac 2.13.x with matchesPtInst

               (1)   (2)
  HList Size
   50           4     3
  100           7     3
  150          15     4
  200          28     4
  250          48     5
  300          81     6
  350         126     8
  400         189    11
  450         322    13
  500         405    16

           Compile time in seconds

As an added bonus users of shapeless and shapeless-based libraries which use shapeless's Lazy type will see benefits immediately without needing to wait for and port to byname implicit arguments.

This is a backport to 2.12.x of, scala#6580 In rankImplicits, before we attempt to fully typecheck the pending candidate implicit, we first attempt to partially instantiate type variables in both the candidate and the target type and check for compatibility. If the compatibility check fails we can immediately prune the the candidate without having to fully typecheck it. In the kinds of implicit searches typical of the inductive style found in shapeless and related libraries this can result in a drastic reduction in the search space and a corresponding reduction in compile times. This commit is much simpler, more generally applicable, and less invasive than my earlier work on inductive implicits in, scala#6481 which was doing similar pruning almost by accident. It turns out that almost all of the speedups in that earlier PR were due to the pruning rather than to any special casing of inductive patterns. The compilation benchmark (a shapeless-style "select element by type from a large HList") from that PR is carried over here (in test/induction) and shows the same performance improvements, 1) baseline - scalac 2.12.5 2) scalac 2.12.5 with matchesPtInst (1) (2) HList Size 50 4 3 100 7 4 150 14 4 200 25 5 250 46 5 300 74 6 350 118 8 400 183 10 450 298 12 500 421 15 Compile time in seconds As an added bonus users of shapeless and shapeless based libraries which use shapeless's Lazy type will see benefits immediately without needing to wait for and port to byname implicit arguments.

dwijnand · 2018-04-30T20:14:15Z

test/induction/inductive-implicits-bench.scala

+    Int ::
+    Int ::
+    Int ::
+    Int ::


you could write these 10 per line..
easier to read, easier to count, lose the every-10-lines separator.. win win win

(edit: if you rebase or otherwise need to re-write this commit)

Fair enough, but 55j works pretty well in vim ;-)

milessabin · 2018-04-30T21:22:59Z

Ported to Dotty here: scala/scala3#4426.

adriaanm

Very exciting! I'd like to collect some more data, and have a few suggestions for code tweaks.

adriaanm · 2018-05-01T10:37:20Z

src/compiler/scala/tools/nsc/typechecker/Implicits.scala

+      // can't be solved for. Rather than attempt to patch things up later we just skip those cases
+      // altogether.
+      def isInstantiable =
+        !isView || (wildPt.isGround && !wildPt.exists { case _: BoundedWildcardType => true ; case _ => false })


I'd pull out the isInstantiable check to the matchesPtInst call below, and cache whether wildPt contains TypeVars (!isGround) or BoundedWildcardTypes. I believe that test could be written more directly as !wildPt.exists { case _: BoundedWildcardType | _: TypeVar => true ; case _ => false }.

adriaanm · 2018-05-01T10:39:44Z

src/compiler/scala/tools/nsc/typechecker/Implicits.scala

+          try {
+            val tvars = allUndetparams map freshVar
+            val tpInstantiated = tp.instantiateTypeParams(allUndetparams, tvars)
+            if(!matchesPt(tpInstantiated, wildPt, allUndetparams)) false


I'd like to see some statistics of which branch prunes the most in a few projects. Maybe we don't even need to solve types to still get some nice pruning? Could we first (1) check whether tp.instantiateTypeParams(allUndetparams, allUndetparams map (_ => WildcardType)) matches wildPt, then (2) check with type vars, and (3) finally solve?

Would like to see hit counts for branches (1), (2), (3).

I tried various subsets of what's being done here to see if I could reduce the amount of work being done. I have one more thing to try, but I'm confident that the solving is required. I did some logging of the branches and anecdotally I'd say that pruning occurred mainly post-solving and then about equally between the checkbounds call and the final instantiated matchesPt. Agreed that it'd be useful to quantify this, but obviously it's very dependent on the code being compiled ... it'd be good if we could agree on a suitable corpus.

adriaanm · 2018-05-01T10:40:47Z

test/induction/inductive-implicits-bench.scala

@@ -0,0 +1,619 @@
+// Compiled with ./build/pack/bin/scalac -J-Xss4M -J-Xmx2G test/induction/inductive-implicits-bench.scala


let's move this to our compiler benchmark project. I don't want tests that are not run in CI -- they will just bitrot.

Sure. Whereabouts?

https://github.com/scala/compiler-benchmark

Got the project, unsure of the form in which to include it. Currently asking here ... pointers welcome.

milessabin · 2018-05-01T16:59:39Z

I added some statistics counters to the 2.12.x branch and tested a few projects,

The top-level count is the number of calls of matchesPtInst and the subcounts are of mismatches at the first matchesPt, the checkbounds and the final matchesPt respectively.

test/induction/inductive.sh: // One big induction
implicits instantiated for pruning: 1002
  instantiated mismatches     : 500 (49.9%)

shapeless/coreJVM: compile // Library code, mostly explicit
implicits instantiated for pruning: 9801
  immediate mismatches        : 15 (0.2%)
  checkbounds failures        : 6732 (68.7%)
  instantiated mismatches     : 20 (0.2%)

shapeless/coreJVM: test:compile // Lots of small inductions
implicits instantiated for pruning: 80510
  immediate mismatches        : 6666 (8.3%)
  checkbounds failures        : 5436 (6.8%)
  instantiated mismatches     : 4840 (6.0%)

cats/coreJVM: compile // Library code, mostly explicit
implicits instantiated for pruning: 2208
  immediate mismatches        : 8 (0.4%)
  checkbounds failures        : 43 (1.9%)
  instantiated mismatches     : 4 (0.2%)

cats/coreJVM: test:compile // Surprisingly explicit
implicits instantiated for pruning: 2671
  immediate mismatches        : 28 (1.0%)
  instantiated mismatches     : 1 (0.0%)

@adriaanm I can add the counters on this branch if you think that would be valuable.

milessabin · 2018-05-02T14:09:13Z

Benchmark moved to the compiler-benchmark project here. Also incorporated @dwijnand's layout suggestion ;-)

When narrowing a constraint, we check if its bounds are now equal to remove it from the constraint set. This allows the corresponding type variable to be instantiated in TyperState#gc instead of accumulating uninstantiated type variables that we traverse again and again in Inferencing#interpolateTypeVars. This reduces the running time of the added testcase (copied from scala/scala#6580) from ~6 seconds to ~4 seconds. If we fully uncomment the testcase (and increase the JVM stack size), the running time goes from > 300 seconds to ~24 seconds. This is not as good as what scala/scala#6580 but is already much more reasonable that what we did before.

When narrowing a constraint, we check if its bounds are now equal to remove it from the constraint set. This allows the corresponding type variable to be instantiated in TyperState#gc instead of accumulating uninstantiated type variables that we traverse again and again in Inferencing#interpolateTypeVars. This reduces the running time of the added testcase (copied from scala/scala#6580) from ~6 seconds to ~4 seconds. If we fully uncomment the testcase (and increase the JVM stack size), the running time goes from > 300 seconds to ~24 seconds. This is not as good as what scala/scala#6580 achieves but is already much more reasonable that what we did before.

This is a backport to 2.12.x of, scala#6580 In rankImplicits, before we attempt to fully typecheck the pending candidate implicit, we first attempt to partially instantiate type variables in both the candidate and the target type and check for compatibility. If the compatibility check fails we can immediately prune the the candidate without having to fully typecheck it. In the kinds of implicit searches typical of the inductive style found in shapeless and related libraries this can result in a drastic reduction in the search space and a corresponding reduction in compile times. This commit is much simpler, more generally applicable, and less invasive than my earlier work on inductive implicits in, scala#6481 which was doing similar pruning almost by accident. It turns out that almost all of the speedups in that earlier PR were due to the pruning rather than to any special casing of inductive patterns. The compilation benchmark (a shapeless-style "select element by type from a large HList") from that PR is carried over here (in test/induction) and shows the same performance improvements, 1) baseline - scalac 2.12.5 2) scalac 2.12.5 with matchesPtInst (1) (2) HList Size 50 4 3 100 7 4 150 14 4 200 25 5 250 46 5 300 74 6 350 118 8 400 183 10 450 298 12 500 421 15 Compile time in seconds As an added bonus users of shapeless and shapeless based libraries which use shapeless's Lazy type will see benefits immediately without needing to wait for and port to byname implicit arguments.

Addressed issues raised in review.

milessabin · 2018-05-03T18:40:50Z

Rebased, addressed issues raised in review, moved benchmark to compiler-benchmark and fixed an issue with Specs2 revealed in a 2.12.x community build run.

smarter · 2018-05-03T20:40:32Z

This PR puzzled me: in the benchmark file, the only optimization this can possibly be doing is skipping the typechecking of 499 inHead calls, surely that can't be that expensive?

I'm not a JVM performance expert but I know how to use jstack, that was enough to realize that we spent most of our time not typechecking, or doing implicit searches, or anything useful like that. It turns out the real culprit is computing error messages!. These messages will never be displayed since some other branch of the implicit search will succeed, but they're eagerly computed (in Dotty, error messages are always lazy to avoid this sort of things).

As a proof of concept, here's a one-liner fix that gives the same speed-up as this PR: smarter@4b9b95c

milessabin · 2018-05-03T20:57:41Z

@smarter blimey!

Your one-liner doesn't account for all of the speed up of this PR. For me it takes the time down from > 400s to about 25s, whereas this PR takes it down to 14s. But as you say, it does appear that the error message generation accounts for a remarkably large proportion of the time!

smarter · 2018-05-03T21:17:17Z

This was fun. For the record I got a very similar speed-up in Dotty by fixing a completely different issue: scala/scala3#4447 :)

smarter · 2018-05-03T21:40:19Z

Recommendation for an actual fix without a huge refactoring of how error messages are handled: change Implicits#fail and Implicits#failure to take a => String instead of a String.

mkeskells · 2018-05-03T21:59:33Z

Ah - 94% of the CPU taken generating 'helpful' details.
@smarter 🥇 That must be a record or some form in the ratio of smallest fix to largest benefit.
@milessabin 🥈 at another 40% of what is left that a pretty healthy second place

I am very happy if I can get a solid 1% or 2% in a PR 😢 . I could not find a medal that low to compare

smarter · 2018-05-03T22:03:18Z

I am very happy if I can get a solid 1% or 2% in a PR cry

I think the secret is finding an unusual test case that stresses the compiler in some way :).

milessabin · 2018-05-03T22:06:28Z

I can't reproduce the Jenkins failure locally. Finger's X'd it's spurious ...

adriaanm · 2018-08-03T14:39:06Z

Ping? I don't think this is crucial for M5. It can happen post 2.13.0 as far as I'm concerned.

milessabin · 2018-08-03T19:04:40Z

I'm still trying to track down an issue which showed up in the 2.12.x community build. I'll update here as soon as I have a fix there.

SethTisue · 2018-08-07T01:30:27Z

(this has now missed M5, but maybe it could still make 2.13.0-RC1. regardless, officially milestoning as 2.13.x based on Adriaan's remark)

milessabin · 2018-08-07T11:33:29Z

Can you give me until COB tomorrow? I think I've fixed the issue revealed in the 2.12.x community build, but just need a little more time to confirm that.

adriaanm · 2018-08-07T12:02:34Z

Sure, ping me when you feel this is good to go and we'll try to get it in.

milessabin · 2018-08-07T18:55:15Z

@adriaanm this is ready to go now.

It roughly halves compilation time for the large hlist select benchmark which is now in scala/compiler-benchmark here. On my machine the compile time is ~32s compared with ~61s on the prior commit.

milessabin · 2018-08-07T22:06:31Z

@adriaan FYI, if you're wondering why this is still quite a bit slower than I was reporting earlier, so was I ...

I've just done a rather laborious git bisect and discovered that it's down to #6530. Currently on plain 2.13.x the benchmark compiles in 22s with #6530 reverted, and this PR compiles the benchmark in 14s with #6530 reverted, which is back to the numbers I originally reported.

That slowdown is completely orthogonal to this PR ... I'll see if there's anything I can do about it.

milessabin · 2018-08-07T22:54:24Z

The slowdown caused by #6530 is fixed in #7012, and combined with this PR the "select from large hlist" benchmark compiles in ~16s again.

adriaanm · 2018-08-08T11:14:19Z

src/compiler/scala/tools/nsc/typechecker/Implicits.scala

+      // Nothings which can't be solved for. Rather than attempt to patch things up later we
+      // just skip those cases altogether.
+      lazy val wildPtIsInstantiable =
+        !wildPt.exists { case _: BoundedWildcardType | _: TypeVar => true ; case tp if typeIsNothing(tp) => true; case _ => false }


I'm not convinced we should look for Nothing here. Also, we could remove the double negation (it's only used as !wildPtIsInstantiable) and call this wildPtNotFullyDefined. That then leads to another simplification: the RHS could be just !isFullyDefined(wildPt).

Point taken on the double negation ... I'll fix.

We really do need the check for Nothing though: see test/files/pos/prune-poly-infer-nothing.scala ... without the check for Nothing the correctly inferred Nothing for O would be retracted resulting in a false negative. That test was minimized from a failure in http4s in the 2.12 community build.

Aside from that, isFullyDefined is too restrictive ... it rules out cases which should be allowed (eg. any case where the pt contains a vanilla wildcard).

In rankImplicits, before we attempt to fully typecheck the pending candidate implicit, we first attempt to partially instantiate type variables in both the candidate and the target type and check for compatibility. If the compatibility check fails we can immediately prune the the candidate without having to fully typecheck it. In the kinds of implicit searches typical of the inductive style found in shapeless and related libraries this can result in a drastic reduction in the search space and a corresponding reduction in compile times. As an added bonus users of shapeless and shapeless based libraries which use shapeless's Lazy type will see benefits immediately without needing to wait for and port to byname implicit arguments.

milessabin · 2018-08-08T12:23:41Z

Also rebased.

SethTisue · 2018-08-09T20:14:35Z

does this need a community build before merging?

milessabin · 2018-08-09T22:47:46Z

@SethTisue It wouldn't hurt, but I'm confident that I worked through the issues on the 2.12.x community build.

scala-jenkins added this to the 2.13.0-M5 milestone Apr 30, 2018

milessabin added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Apr 30, 2018

milessabin modified the milestones: 2.13.0-M5, 2.13.0-M4 Apr 30, 2018

milessabin requested review from adriaanm and retronym April 30, 2018 18:36

milessabin mentioned this pull request Apr 30, 2018

Topic/inductive implicits 2.13.x #6481

Closed

milessabin mentioned this pull request Apr 30, 2018

Prune polymorphic implicits more aggressively (2.12.x backport) #6582

Closed

dwijnand reviewed Apr 30, 2018

View reviewed changes

adriaanm previously requested changes May 1, 2018

View reviewed changes

milessabin mentioned this pull request May 2, 2018

Topic/implicit poly prune scala/compiler-benchmark#64

Merged

smarter mentioned this pull request May 2, 2018

More efficient garbage collection of type variables scala/scala3#4447

Merged

milessabin force-pushed the topic/implicit-poly-prune-2.13.x branch from 765b3ed to 75e7803 Compare May 3, 2018 18:36

smarter added a commit to smarter/scala that referenced this pull request May 3, 2018

Alternative fix for scala#6580

4b9b95c

SethTisue modified the milestones: 2.13.0-M5, 2.13.x Aug 7, 2018

milessabin force-pushed the topic/implicit-poly-prune-2.13.x branch from 1f44f82 to ff6fe9a Compare August 7, 2018 18:45

This was referenced Aug 7, 2018

Speed up implicit resolution by avoiding allocations when traversing TypeRefs in core #7012

Merged

Avoid compiler hang in implicit resolution #6530

Merged

milessabin modified the milestones: 2.13.1, 2.13.0-M5 Aug 7, 2018

adriaanm mentioned this pull request Aug 8, 2018

Slow, exponential compile times for HLists > 200 elements scala/bug#8478

Open

adriaanm reviewed Aug 8, 2018

View reviewed changes

milessabin added 2 commits August 8, 2018 13:00

We don't need no double negations.

ce3b37c

milessabin force-pushed the topic/implicit-poly-prune-2.13.x branch from ff6fe9a to ce3b37c Compare August 8, 2018 12:23

adriaanm approved these changes Aug 10, 2018

View reviewed changes

adriaanm merged commit b8e609b into scala:2.13.x Aug 10, 2018

SethTisue added the release-notes worth highlighting in next release notes label Aug 10, 2018

milessabin mentioned this pull request Aug 14, 2018

[REVERTED] Prune polymorphic implicits more aggressively #7067

Merged

This was referenced Nov 4, 2018

Byname implicits parameters with recursive dictionaries scala/scala-dev#419

Closed

Faster compilation of inductive implicits scala/scala-dev#421

Closed

milessabin mentioned this pull request Mar 28, 2019

Change implicit resolution rule wrt implicit parameters scala/scala3#6071

Merged

		@@ -0,0 +1,619 @@
		// Compiled with ./build/pack/bin/scalac -J-Xss4M -J-Xmx2G test/induction/inductive-implicits-bench.scala

Prune polymorphic implicits more aggressively #6580

Prune polymorphic implicits more aggressively #6580

Conversation

milessabin commented Apr 30, 2018 • edited Loading

dwijnand Apr 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milessabin commented Apr 30, 2018

adriaanm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milessabin commented May 1, 2018

milessabin commented May 2, 2018

milessabin commented May 3, 2018

smarter commented May 3, 2018

milessabin commented May 3, 2018

smarter commented May 3, 2018

smarter commented May 3, 2018

mkeskells commented May 3, 2018

smarter commented May 3, 2018

milessabin commented May 3, 2018

adriaanm commented Aug 3, 2018

milessabin commented Aug 3, 2018

SethTisue commented Aug 7, 2018

milessabin commented Aug 7, 2018

adriaanm commented Aug 7, 2018

milessabin commented Aug 7, 2018

milessabin commented Aug 7, 2018

milessabin commented Aug 7, 2018

Choose a reason for hiding this comment

milessabin Aug 8, 2018 • edited Loading

Choose a reason for hiding this comment

milessabin commented Aug 8, 2018

SethTisue commented Aug 9, 2018

milessabin commented Aug 9, 2018

milessabin commented Apr 30, 2018 •

edited

Loading

dwijnand Apr 30, 2018 •

edited

Loading

milessabin Aug 8, 2018 •

edited

Loading