-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prune polymorphic implicits more aggressively #6580
Prune polymorphic implicits more aggressively #6580
Conversation
This is a backport to 2.12.x of, scala#6580 In rankImplicits, before we attempt to fully typecheck the pending candidate implicit, we first attempt to partially instantiate type variables in both the candidate and the target type and check for compatibility. If the compatibility check fails we can immediately prune the the candidate without having to fully typecheck it. In the kinds of implicit searches typical of the inductive style found in shapeless and related libraries this can result in a drastic reduction in the search space and a corresponding reduction in compile times. This commit is much simpler, more generally applicable, and less invasive than my earlier work on inductive implicits in, scala#6481 which was doing similar pruning almost by accident. It turns out that almost all of the speedups in that earlier PR were due to the pruning rather than to any special casing of inductive patterns. The compilation benchmark (a shapeless-style "select element by type from a large HList") from that PR is carried over here (in test/induction) and shows the same performance improvements, 1) baseline - scalac 2.12.5 2) scalac 2.12.5 with matchesPtInst (1) (2) HList Size 50 4 3 100 7 4 150 14 4 200 25 5 250 46 5 300 74 6 350 118 8 400 183 10 450 298 12 500 421 15 Compile time in seconds As an added bonus users of shapeless and shapeless based libraries which use shapeless's Lazy type will see benefits immediately without needing to wait for and port to byname implicit arguments.
Int :: | ||
Int :: | ||
Int :: | ||
Int :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could write these 10 per line..
easier to read, easier to count, lose the every-10-lines separator.. win win win
(edit: if you rebase or otherwise need to re-write this commit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, but 55j
works pretty well in vim ;-)
Ported to Dotty here: scala/scala3#4426. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very exciting! I'd like to collect some more data, and have a few suggestions for code tweaks.
// can't be solved for. Rather than attempt to patch things up later we just skip those cases | ||
// altogether. | ||
def isInstantiable = | ||
!isView || (wildPt.isGround && !wildPt.exists { case _: BoundedWildcardType => true ; case _ => false }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd pull out the isInstantiable
check to the matchesPtInst
call below, and cache whether wildPt
contains TypeVar
s (!isGround
) or BoundedWildcardType
s. I believe that test could be written more directly as !wildPt.exists { case _: BoundedWildcardType | _: TypeVar => true ; case _ => false }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
try { | ||
val tvars = allUndetparams map freshVar | ||
val tpInstantiated = tp.instantiateTypeParams(allUndetparams, tvars) | ||
if(!matchesPt(tpInstantiated, wildPt, allUndetparams)) false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see some statistics of which branch prunes the most in a few projects. Maybe we don't even need to solve types to still get some nice pruning? Could we first (1) check whether tp.instantiateTypeParams(allUndetparams, allUndetparams map (_ => WildcardType))
matches wildPt
, then (2) check with type vars, and (3) finally solve?
Would like to see hit counts for branches (1), (2), (3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried various subsets of what's being done here to see if I could reduce the amount of work being done. I have one more thing to try, but I'm confident that the solving is required. I did some logging of the branches and anecdotally I'd say that pruning occurred mainly post-solving and then about equally between the checkbounds
call and the final instantiated matchesPt
. Agreed that it'd be useful to quantify this, but obviously it's very dependent on the code being compiled ... it'd be good if we could agree on a suitable corpus.
@@ -0,0 +1,619 @@ | |||
// Compiled with ./build/pack/bin/scalac -J-Xss4M -J-Xmx2G test/induction/inductive-implicits-bench.scala |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move this to our compiler benchmark project. I don't want tests that are not run in CI -- they will just bitrot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Whereabouts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got the project, unsure of the form in which to include it. Currently asking here ... pointers welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ...
I added some statistics counters to the 2.12.x branch and tested a few projects, The top-level count is the number of calls of
@adriaanm I can add the counters on this branch if you think that would be valuable. |
When narrowing a constraint, we check if its bounds are now equal to remove it from the constraint set. This allows the corresponding type variable to be instantiated in TyperState#gc instead of accumulating uninstantiated type variables that we traverse again and again in Inferencing#interpolateTypeVars. This reduces the running time of the added testcase (copied from scala/scala#6580) from ~6 seconds to ~4 seconds. If we fully uncomment the testcase (and increase the JVM stack size), the running time goes from > 300 seconds to ~24 seconds. This is not as good as what scala/scala#6580 but is already much more reasonable that what we did before.
When narrowing a constraint, we check if its bounds are now equal to remove it from the constraint set. This allows the corresponding type variable to be instantiated in TyperState#gc instead of accumulating uninstantiated type variables that we traverse again and again in Inferencing#interpolateTypeVars. This reduces the running time of the added testcase (copied from scala/scala#6580) from ~6 seconds to ~4 seconds. If we fully uncomment the testcase (and increase the JVM stack size), the running time goes from > 300 seconds to ~24 seconds. This is not as good as what scala/scala#6580 achieves but is already much more reasonable that what we did before.
This is a backport to 2.12.x of, scala#6580 In rankImplicits, before we attempt to fully typecheck the pending candidate implicit, we first attempt to partially instantiate type variables in both the candidate and the target type and check for compatibility. If the compatibility check fails we can immediately prune the the candidate without having to fully typecheck it. In the kinds of implicit searches typical of the inductive style found in shapeless and related libraries this can result in a drastic reduction in the search space and a corresponding reduction in compile times. This commit is much simpler, more generally applicable, and less invasive than my earlier work on inductive implicits in, scala#6481 which was doing similar pruning almost by accident. It turns out that almost all of the speedups in that earlier PR were due to the pruning rather than to any special casing of inductive patterns. The compilation benchmark (a shapeless-style "select element by type from a large HList") from that PR is carried over here (in test/induction) and shows the same performance improvements, 1) baseline - scalac 2.12.5 2) scalac 2.12.5 with matchesPtInst (1) (2) HList Size 50 4 3 100 7 4 150 14 4 200 25 5 250 46 5 300 74 6 350 118 8 400 183 10 450 298 12 500 421 15 Compile time in seconds As an added bonus users of shapeless and shapeless based libraries which use shapeless's Lazy type will see benefits immediately without needing to wait for and port to byname implicit arguments.
765b3ed
to
75e7803
Compare
Rebased, addressed issues raised in review, moved benchmark to compiler-benchmark and fixed an issue with Specs2 revealed in a 2.12.x community build run. |
This PR puzzled me: in the benchmark file, the only optimization this can possibly be doing is skipping the typechecking of 499 I'm not a JVM performance expert but I know how to use As a proof of concept, here's a one-liner fix that gives the same speed-up as this PR: smarter@4b9b95c |
@smarter blimey! Your one-liner doesn't account for all of the speed up of this PR. For me it takes the time down from > 400s to about 25s, whereas this PR takes it down to 14s. But as you say, it does appear that the error message generation accounts for a remarkably large proportion of the time! |
This was fun. For the record I got a very similar speed-up in Dotty by fixing a completely different issue: scala/scala3#4447 :) |
Recommendation for an actual fix without a huge refactoring of how error messages are handled: change |
Ah - 94% of the CPU taken generating 'helpful' details. I am very happy if I can get a solid 1% or 2% in a PR 😢 . I could not find a medal that low to compare |
I think the secret is finding an unusual test case that stresses the compiler in some way :). |
I can't reproduce the Jenkins failure locally. Finger's X'd it's spurious ... |
Ping? I don't think this is crucial for M5. It can happen post 2.13.0 as far as I'm concerned. |
I'm still trying to track down an issue which showed up in the 2.12.x community build. I'll update here as soon as I have a fix there. |
(this has now missed M5, but maybe it could still make 2.13.0-RC1. regardless, officially milestoning as 2.13.x based on Adriaan's remark) |
Can you give me until COB tomorrow? I think I've fixed the issue revealed in the 2.12.x community build, but just need a little more time to confirm that. |
Sure, ping me when you feel this is good to go and we'll try to get it in. |
1f44f82
to
ff6fe9a
Compare
@adriaan FYI, if you're wondering why this is still quite a bit slower than I was reporting earlier, so was I ... I've just done a rather laborious git bisect and discovered that it's down to #6530. Currently on plain 2.13.x the benchmark compiles in 22s with #6530 reverted, and this PR compiles the benchmark in 14s with #6530 reverted, which is back to the numbers I originally reported. That slowdown is completely orthogonal to this PR ... I'll see if there's anything I can do about it. |
// Nothings which can't be solved for. Rather than attempt to patch things up later we | ||
// just skip those cases altogether. | ||
lazy val wildPtIsInstantiable = | ||
!wildPt.exists { case _: BoundedWildcardType | _: TypeVar => true ; case tp if typeIsNothing(tp) => true; case _ => false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we should look for Nothing
here. Also, we could remove the double negation (it's only used as !wildPtIsInstantiable
) and call this wildPtNotFullyDefined
. That then leads to another simplification: the RHS could be just !isFullyDefined(wildPt)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point taken on the double negation ... I'll fix.
We really do need the check for Nothing
though: see test/files/pos/prune-poly-infer-nothing.scala
... without the check for Nothing
the correctly inferred Nothing
for O
would be retracted resulting in a false negative. That test was minimized from a failure in http4s in the 2.12 community build.
Aside from that, isFullyDefined
is too restrictive ... it rules out cases which should be allowed (eg. any case where the pt
contains a vanilla wildcard).
In rankImplicits, before we attempt to fully typecheck the pending candidate implicit, we first attempt to partially instantiate type variables in both the candidate and the target type and check for compatibility. If the compatibility check fails we can immediately prune the the candidate without having to fully typecheck it. In the kinds of implicit searches typical of the inductive style found in shapeless and related libraries this can result in a drastic reduction in the search space and a corresponding reduction in compile times. As an added bonus users of shapeless and shapeless based libraries which use shapeless's Lazy type will see benefits immediately without needing to wait for and port to byname implicit arguments.
ff6fe9a
to
ce3b37c
Compare
Also rebased. |
does this need a community build before merging? |
@SethTisue It wouldn't hurt, but I'm confident that I worked through the issues on the 2.12.x community build. |
In
rankImplicits
, before we attempt to fully typecheck the pending candidate implicit, we first attempt to partially instantiate type variables in both the candidate and the target type and check for compatibility. If the compatibility check fails we can immediately prune the the candidate without having to fully typecheck it.In the kinds of implicit searches typical of the inductive style found in shapeless and related libraries this can result in a drastic reduction in the search space and a corresponding reduction in compile times.
This commit is much simpler, more generally applicable, and less invasive than my earlier work on inductive implicits in #6481 which was doing similar pruning almost by accident. It turns out that almost all of the speedups in that earlier PR were due to the pruning rather than to any special casing of inductive patterns.
The compilation benchmark (a shapeless-style "select element by type from a large HList") from that PR is carried over here (in test/induction) and shows the same performance improvements,
As an added bonus users of shapeless and shapeless-based libraries which use shapeless's
Lazy
type will see benefits immediately without needing to wait for and port to byname implicit arguments.