Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many improvements to smartmatching and given/when #4653

Merged
merged 13 commits into from Dec 27, 2021

Conversation

vrurg
Copy link
Member

@vrurg vrurg commented Nov 27, 2021

  • Junctions are now welcome on RHS and as when topic
  • when now uses optimized code unless the topic is a Junction. Falls back to ACCEPTS+Bool in the latter case.
  • smartmatch optimization is now more elaborate about its arguments. It implicitly Bool-ifies ACCEPTS return value unless its LHS is a regex or negation is to be done.
  • more optimizations for literal and type matching

Overall, with this PR it must not be a problem if ACCEPTS gets autothreaded and results in a junctionized return. Either it will be collapsed with boolification where smartmatch semantics is expected, or junctionized outcome of a regex match is expected.

@vrurg
Copy link
Member Author

vrurg commented Nov 27, 2021

PR resolves Raku/problem-solving#297. Fixes #2676 and, perhaps, a few more issues (yet to be investigated).

@vrurg
Copy link
Member Author

vrurg commented Nov 27, 2021

A couple of implementation notes.

To get junctions work as topics with when I had to add additional istype check into the optimized code @MasterDuke17 produced to fall back to ACCEPTS when appropriate. Then I compared this branch with 2021.10 release. And it seems that the branch is even somewhat faster – likely because of all the MoarVM optimizations done since the release.

As I have noticed in the commit comment, Range will not work with custom user-provided cmp operator. This was always the case and I don't feel ok about trying to fix it as it would require looking for client's context and locating the operator proto in there. Yet, considering that we also always had Range.ACCEPTS with Mu-typed topic, I decided there is no loss in keeping it this way but adding the code for operator lookup as otherwise it makes no sense. If this approach is considered redundant, not only the corresponding candidate could be removed, but there is a chance for slight optimization of the multi-dispatch by removing Junction candidate and providing Range with own proto method ACCEPTS.

@vrurg
Copy link
Member Author

vrurg commented Nov 28, 2021

Turning this into draft because there is some more work in progress on optimizing when and fixing cases where class defines custom ACCEPTS.

@vrurg vrurg force-pushed the problem-solving-297 branch 2 times, most recently from 0e1efd8 to bbbef1b Compare December 9, 2021 03:31
@vrurg vrurg marked this pull request as ready for review December 23, 2021 03:07
@vrurg
Copy link
Member Author

vrurg commented Dec 23, 2021

I think I'm done with this. The PR passes spectests, including the additions from Raku/roast#774. Aside of the fix for junctions on LHS and as when topic, I couldn't resist optimizing some most common cases. The optimization tries to cover cases, where compile time values are known, sometimes replacing a smartmatch with a plain boolean constant. For example, this would happen for:

sub foo(--> Num) { ... }
if foo() ~~ Numeric { ... }

Note that it means no call to foo() would ever take place.

Overall, the basic principles of optimization I tried to follow were:

  • where possible, get rid of routine calls by replacing their code with nqp ops
  • reduce allocations by unwrapping instantiation ASTs and pulling out any constant data available
  • calculate op outcomes at compile time where possible

Because the above involves a lot of analyzing of value ASTs, I added a new class Operand to the optimizer code which does all this work.

Unfortunately, I barely can provide many benchmark results as dispatching is seemingly taking so good care of the cases I tried, that sometimes a test loop happens to be even faster, than a base-line loop with no smartmatch in it. But at least in one case of smartmatching against pairs I did observe a speedup of 49%:

constant COUNT=1_000_000;
my $base = 0;
my $s = now;
for ^COUNT {
    my $v = $foo.bar1 === True;
}
$base = now - $s;

note "Base: ", $base;

my $total = 0;
$s = now;
for ^COUNT {
    my $v = $foo ~~ :bar1;
}
$total = now - $s;

note "Total: ", $total, ", actual: ", $total - $base;

@vrurg
Copy link
Member Author

vrurg commented Dec 23, 2021

Update. With JVM fixed I've benchmarked it too. 88% for pair, 92% for $v ~~ Num.

@vrurg vrurg changed the title Improve handling of junctions on LHS and by given/when Many improvements to smartmatching and given/when Dec 26, 2021
vrurg added a commit to vrurg/roast that referenced this pull request Dec 26, 2021
- `when` now uses optimized code unless the topic is a Junction. Falls
  back to `ACCEPTS`+`Bool` in the latter case.
  - smartmatch optimization is now more elaborate about its arguments.
  It
    implicitly `Bool`-ifies `ACCEPTS` return value unless its LHS is a
      regex or negation is to be done.

      Overall, with this commit it must not be a problem if `ACCEPTS`
      gets
      autothreaded and results in a junctionized return. Either it will
      be
      collapsed with boolification where smartmatch semantics is
      expected, or
      junctionized outcome of a regex match is expected.
It would manually thread over topic if it's a junction.

The default `ACCEPTS` method now expects `Any`-typed topic. Since we
don't have `cmp` operator for `Mu`, `Range`, in order to handle any
user-created direct `Mu` child, tries to find user-defined
`&infix:<cmp>` and throws with `X::Range::Incomparable` if can't find
any appropriate.

There are still two problems here. First, there is no support for custom
user `cmp` operator for `Any`-inheriting classes. Second, custom
operator couldn't be found for `Mu`-inheriting classes if they're part
of a junction. In the latter case the problem is about the fact that
`CLIENT::` namespace would point into `Junction` class.

Both issues can only be fixed at the cost of performance loss which is
barely acceptable considering the wide use of ranges across the
codebase.
It is expected to return the Match instance itself no matter what
argument it is given.
It doesn't have a candidate for non-`Any` children, thus declaring the
proto with `Any` as the first positional makes full sense.
This is only related to typematching kind of method. Since all versions
of ACCEPTS methods in core doing this are reducable to simple
`nqp::istype` we can still use this optimization. But if a non-core
class introduces its own version we must use it for smartmatching.
- Moved `when` optimization from Actions into Optimizer
- Unified `when` and smartmatch optimiztions
- Extended optimizations of some static cases
  For example, reducing typematching to `nqp::istype` if `ACCEPTS`
  method is not overriden or no new candidates added by user code
- Added class `Operand` which does its best to provide information about
  an expression operand
Topicalization is not needed in typematch case if RHS uses the default
ACCEPTS method.

Also, when we know variable/routine type and can ensure it cannot be a
Regexp then the code can be simplified to just `ACCEPTS`+`Bool` method
calls.

Aside of this, smartmatch optimizator depends on operand nodes to be
pre-optimized. This lets it do better analysis. But it was also
resulting in incorrect statistics collected about localizable variables
whenever the smartmatching node is replaced entirely.

Fixing this required a new approach with two-pass optimization where the
first pass is done without collecting the variable statistics and
localization pass. The statistic collection is now prevented by wrapping
the `@!block_var_stack` attribute into a specialized `BlockVarStack`
class. All method calls used to be done on the stack top element must
now be made via `BlockVarStack` `do` method. The class supports dry run
mode when method invocations via `do` are ignored.

Smartmatch optimizer uses dry run for first-pass optimization of its
operands.

Some code re-arrangements took place to improve re-usability.
Use compile-time known setting-only `Pair` typeobject.
Not that I mentioned any visible speedup on core compilation, but it
doesn't hurt to have it. It definitely spares a couple of CPU cycles
here and there.
The outcome would significantly vary depending on the actual code, but
in the most typical case of `$obj ~~ :foo` I have observed a speedup of
up to 49%.
Null pointer exception again.
@vrurg vrurg merged commit a68c5e4 into rakudo:master Dec 27, 2021
vrurg added a commit to vrurg/roast that referenced this pull request Dec 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant