Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a return type slows execution down #2573

Closed
lizmat opened this issue Dec 29, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@lizmat
Copy link
Contributor

commented Dec 29, 2018

Adding a return type to a sub has significant execution impact:

$ perl6 -e 'sub a(--> Int:D) { 42 }; for ^1_000_000 -> int $_ { a }; say now - INIT now'
0.25032401

$ perl6 -e 'sub a() { 42 }; for ^1_000_000 -> int $_ { a }; say now - INIT now'
0.12778614

Which apparently is the same as specifically specifying --> Mu:

$ perl6 -e 'sub a(--> Mu) { 42 }; for ^1_000_000 -> int $_ { a }; say now - INIT now'
0.1243782

So, are we cutting corners here when the return signature is Mu? Or is type checking anything that isn't Mu so much more expensive? Or is there an optimization opportunity there?

FWIW, the --profile output on the two initial examples are identical, except for the amount of CPU used by sub a and the number of calls to identity (2x as many for the non-Mu case).

The reason I'm making this an issue as that I recently started to add a lot of missing return signatures to the core. Which had a very definite impact on the size of CORE.setting.moarvm (14064864 -> 14118208) and an impact on spectest (363 -> 381 seconds). So I'm now considering reverting all of that work :-(

@lizmat

This comment has been minimized.

Copy link
Contributor Author

commented Dec 29, 2018

Oddly enough, there is a benefit to specifying types on parameters:

$ perl6 -e 'sub a(Int:D $a) { }; for ^10_000_000 -> int $_ { a 42 }; say now - INIT now'
0.0878005

$ perl6 -e 'sub a($a) { }; for ^10_000_000 -> int $_ { a 42 }; say now - INIT now'
0.2394543
@jnthn

This comment has been minimized.

Copy link
Member

commented Dec 29, 2018

So, are we cutting corners here when the return signature is Mu?

Yes, we're taking that obvious easy optimization opportunity and just stripping out doing the check at all when the return type is Mu.

Or is type checking anything that isn't Mu so much more expensive?

If we removed the optimization I just linked, I expect we'd see the same kind of numbers for Mu also.

Or is there an optimization opportunity there?

The obvious one for this particular case would be for the optimizer to spot that the value being returned will always match, and just not emit the type check bytecode sequence, much as it does in the Mu case. However, I got curious why it's so costly, given that we should be optimizing this pretty well.

It turns out that if you add an argument to the sub, then the difference narrows quite a bit:

$ perl6 -e 'sub a($) { 42 }; for ^10_000_000 -> int $_ { a 1 }; say now - INIT now'
0.1267724
$ perl6 -e 'sub a($ --> Int:D) { 42 }; for ^10_000_000 -> int $_ { a 1 }; say now - INIT now'
0.16717362

The reason seems to be that spesh mishandles zero-argument frames, producing a certain specialization rather than an observed type specialization, which in turn means the spesh plugin for return handling doesn't get fully optimized out. In an otherwise very tight loop, that will add up.

FWIW, the --profile output on the two initial examples are identical, except for the amount of CPU used by sub a and the number of calls to identity (2x as many for the non-Mu case).

I think the significant thing about the identity calls will be the inline rate, which I expect will be 50% in the non-Mu case. There will also be extra cost will be in evaluating the spesh plugin guard tree every call, because it isn't being optimized out. (Also, identity calls tend to optimize down to just the profile entry/exit instructions, so the numbers measure profiler overhead.)

Which had a very definite impact on the size of CORE.setting.moarvm (14064864 -> 14118208)

Returns were originally a sequence of two calls to Perl 6 extension ops: one for decontainerization, and one after that for return type checking. Those were both replaced a while back with a spesh plugin usage each, which is a short sequence of bytecode instructions, but certainly adds up when done in numerous places. This change let us fix longstanding bugs (the Proxy one) and post-specialization performance, since in numerous cases the specializer can reduce them to nothing or just a simple guard, and even when it does have some work to do (on decontainerization) we can actually optimize and JIT it into a simple pointer deref or two.

In theory, it's possible to collapse the two return-handling spesh plugin resolutions/applications down into a single one, giving reduced bytecode size. In practice, it's not quite obvious how to do that and at the same time retain the static optimizations. It'll need some thought and experimentation. That'd give us a path to smaller code, though.

an impact on spectest (363 -> 381 seconds)

While we know spectest is about the worst case for optimization (very little code becomes hot), 4% is indeed a bit more than I'd have expected there. I wonder if the change impacted startup time a bit? I'll see if I can reproduce that effect locally once I get back to working on stuff after the New Year break. :-)

So I'm now considering reverting all of that work :-(

Let's keep it in there; we need to do it at some point anyway, and I think we'll be able to gain back a good amount of the performance (or it'll just get lost in wider improvements, which is also fine). It's useful for introspection purposes, and that is good for exploration, as well as for folks building things like auto-complete in IDEs that benefit from more complete type information in CORE.setting.

@MasterDuke17

This comment has been minimized.

Copy link
Contributor

commented Dec 29, 2018

FWIW, benchable (and confirmed by committable) found a significant slowdown introduced at a41c37c. sub a(--> Int:D) { 42 }; for ^1_000_000 -> int $_ { a; a; a; a; a }; say now - INIT now went from 0.1s before to 1.3s after. That's an NQP bump and the MoarVM commits (MoarVM/MoarVM@2018.06-391-g91d2878...2018.06-395-g0c5f6e5) seem like they might be relevant.

@lizmat

This comment has been minimized.

Copy link
Contributor Author

commented Dec 29, 2018

$ perl6 -e 'class A { method a(--> Int:D) { 42 } }; for ^10_000_000 -> int $_ { A.a }; say now - INIT now'
0.151725

$ perl6 -e 'class A { method a() { 42 } }; for ^10_000_000 -> int $_ { A.a }; say now - INIT now'
0.126501

Which appears to confirm the size of the slowdown is a lot less if there are arguments passed to the block, as there always are with methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.