doc/Language/performance.pod6

=begin pod :tag<tutorial>

=TITLE Performance

=SUBTITLE Measuring and improving run-time or compile-time performance

This page is about L<computer
performance|https://en.wikipedia.org/wiki/Computer_performance> in the context
of Perl 6.

=head1 First, profile your code

B<Make sure you're not wasting time on the wrong code>: start by identifying
your L<"critical 3%"|https://en.wikiquote.org/wiki/Donald_Knuth> by "profiling"
your code's performance. The rest of this document shows you how to do that.

=head2 Time with C<now - INIT now>

Expressions of the form C<now - INIT now>, where C<INIT> is a L<phase in the
running of a Perl 6 program|/language/phasers>, provide a great idiom for timing
code snippets.

Use the C<m: your code goes here> L<#perl6 channel
evalbot|/language/glossary#camelia> to write lines like:

    =begin code :skip-test
    m: say now - INIT now
    rakudo-moar abc1234: OUTPUT«0.0018558␤»
    =end code

The C<now> to the left of C<INIT> runs 0.0018558 seconds I<later> than the
C<now> to the right of the C<INIT> because the latter occurs during L<the INIT
phase|/language/phasers#INIT>.

=head2 Profile locally

When using the L<MoarVM|http://moarvm.org> backend, the
L<Rakudo|http://rakudo.org> compiler's C<--profile> command line option writes
the profile data to an HTML file. If the profile data is too big, it could take a
long time for a browser to open the file. In that case, output to a file with a
C<.json> extension, then open the file with
L<Qt viewer|https://github.com/tadzik/p6profiler-qt>.

To deal with even larger profiles, output to a file with a C<.sql> extension.
This will write the profile data as a series
of SQL statements, suitable for opening in SQLite.

    =begin code :skip-test
    # create a profile
    perl6 --profile --profile-filename=demo.sql -e 'say (^20).combinations(3).elems'

    # create a SQLite database
    sqlite3 demo.sqlite

    # load the profile data
    sqlite> .read demo.sql

    # the query below is equivalent to the default view of the "Routines" tab in the HTML profile
    sqlite> select
          case when r.name = "" then "<anon>" else r.name end,
          r.file,
          r.line,
          sum(entries) as entries,
          sum(case when rec_depth = 0 then inclusive_time else 0 end) as inclusive_time,
          sum(exclusive_time) as exclusive_time
        from
          call c,
          routines r
        where
          c.id = r.id
        group by
          c.id
        order by
          inclusive_time desc
        limit 30;
    =end code

To learn how to interpret the profile info, use the C<prof-m: your code goes
here> evalbot (explained above) and ask questions on the channel.

=head2 Profile compiling

If you want to profile the time and memory it takes to compile your code, use
Rakudo's C<--profile-compile> option.

=head2 Create or view benchmarks

Use L<perl6-bench|https://github.com/japhb/perl6-bench>.

If you run perl6-bench for multiple compilers (typically, versions of Perl 5,
Perl 6, or NQP), results for each are visually overlaid on the same graphs,
to provide for quick and easy comparison.

=head2 Share problems

Once you've used the above techniques to identify the code to improve,
you can then begin to address (and share) the problem with others:

=item For each problem, distill it down to a one-liner or the
gist and either provide performance numbers or make the snippet small enough
that it can be profiled using C<prof-m: your code or gist URL goes here>.

=item Think about the minimum speed increase (or ram reduction or whatever) you
need/want, and think about the cost associated with achieving that goal. What's
the improvement worth in terms of people's time and energy?

=item Let others know if your Perl 6 use-case is in a production setting or just for fun.

=head1 Solve problems

This bears repeating: B<make sure you're not wasting time on the wrong code>.
Start by identifying the L<"critical 3%"|https://en.wikiquote.org/wiki/Donald_Knuth>
of your code.

=head2 Line by line

A quick, fun, productive way to try improve code line-by-line is to collaborate
with others using the L<#perl6|/language/glossary#IRC> evalbot
L<camelia|/language/glossary#camelia>.

=head2 Routine by routine

With multidispatch, you can drop in new variants of routines "alongside" existing
ones:

    # existing code generically matches a two arg foo call:
    multi foo(Any $a, Any $b) { ... }

    # new variant takes over for a foo("quux", 42) call:
    multi foo("quux", Int $b) { ... }

The call overhead of having multiple C<foo> definitions is generally
insignificant (though see discussion of C<where> below), so if your new
definition handles its particular case more efficiently than the previously
existing set of definitions, then you probably just made your code that much
more efficient for that case.

=head2 Speed up type-checks and call resolution

Most L<C<where> clauses|/type/Signature#Type_Constraints> – and thus most
L<subsets|https://design.perl6.org/S12.html#Types_and_Subtypes> – force dynamic
(run-time) type checking and call resolution for any call it I<might> match.
This is slower, or at least later, than compile-time.

Method calls are generally resolved as late as possible (dynamically at run-time),
whereas sub calls are generally resolved statically at compile-time.

=head2 Choose better algorithms

One of the most reliable techniques for making large performance improvements,
regardless of language or compiler, is to pick a more appropriate algorithm.

A classic example is
L<Boyer-Moore|https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm>.
To match a small string in a large string, one obvious way to do it is to
compare the first character of the two strings and then, if they match, compare
the second characters, or, if they don't match, compare the first character of
the small string with the second character in the large string, and so on. In
contrast, the Boyer-Moore algorithm starts by comparing the *last* character of
the small string with the correspondingly positioned character in the large
string. For most strings, the Boyer-Moore algorithm is close to N times faster
algorithmically, where N is the length of the small string.

The next couple sections discuss two broad categories for algorithmic
improvement that are especially easy to accomplish in Perl 6. For more on this
general topic, read the wikipedia page on L<algorithmic
efficiency|https://en.wikipedia.org/wiki/Algorithmic_efficiency>, especially the
'See also' section near the end.

=head3 Change sequential/blocking code to parallel/non-blocking

This is another very important class of algorithmic improvement.

See the slides for
L<Parallelism, Concurrency, and Asynchrony in Perl 6|http://jnthn.net/papers/2015-yapcasia-concurrency.pdf#page=17>
and/or L<the matching video|https://www.youtube.com/watch?v=JpqnNCx7wVY&list=PLRuESFRW2Fa77XObvk7-BYVFwobZHdXdK&index=8>.

=head2 Use existing high performance code

There are plenty of high performance C libraries that you can use within Perl 6 and
L<NativeCall|/language/nativecall> makes it easy to create wrappers for them. There's
experimental support for C++ libraries, too.

If you want to L<use Perl 5 modules in Perl 6|http://stackoverflow.com/a/27206428/1077672>,
mix in Perl 6 types and the L<Meta-Object Protocol|/language/mop>.

More generally, Perl 6 is designed to smoothly interoperate with other languages and
there are a number of L<modules aimed at facilitating the use of libs from
other langs|https://modules.perl6.org/#q=inline>.

=head2 Make the Rakudo compiler generate faster code

To date, the focus for the compiler has been correctness, not
how fast it generates code or how fast or lean the code it
generates runs. But that's expected to change, eventually... You
can talk to compiler devs on the freenode IRC channels #perl6 and #moarvm about
what to expect. Better still, you can contribute yourself:

=item Rakudo is largely written in Perl 6. So if you can write Perl 6, then you
can hack on the compiler, including optimizing any of the large body of existing
high-level code that impacts the speed of your code (and everyone else's).

=item Most of the rest of the compiler is written in a small language called
L<NQP|https://github.com/perl6/nqp> that's basically a subset of Perl 6. If you
can write Perl 6, you can fairly easily learn to use and improve the mid-level
NQP code too, at least from a pure language point of view. To dig into NQP and
Rakudo's guts, start with
L<NQP and internals course|http://edumentab.github.io/rakudo-and-nqp-internals-course/>.

=item If low-level C hacking is your idea of fun, checkout
L<MoarVM|http://moarvm.org> and visit the freenode IRC channel #moarvm
(L<logs|https://irclog.perlgeek.de/moarvm/>).

=head2 Still need more ideas?

Some known current Rakudo performance weaknesses not yet covered in this page
include the use of gather/take, junctions, regexes and string handling in
general.

If you think some topic needs more coverage on this page, please submit a PR or
tell someone your idea. Thanks. :)

=head1 Not getting the results you need/want?

If you've tried everything on this page to no avail, please consider discussing
things with a compiler dev on #perl6, so we can learn from your use-case and what
you've found out about it so far.

Once a dev knows of your plight, allow enough time for
an informed response (a few days or weeks, depending on the exact nature of your
problem and potential solutions).

If I<that> hasn't worked out, please consider filing an issue about your experience at
L<our user experience repo|https://github.com/perl6/user-experience/issues> before moving on.

Thanks. :)

=end pod