Permalink
Fetching contributors…
Cannot retrieve contributors at this time
271 lines (197 sloc) 11.4 KB
=begin pod :tag<tutorial>
=TITLE Performance
=SUBTITLE Measuring and improving runtime or compile-time performance
This page is about L<computer performance|https://en.wikipedia.org/wiki/Computer_performance> in the context
of Perl 6.
=head1 First, profile your code
B<Make sure you're not wasting time on the wrong code>: start by identifying
your L<"critical 3%"|https://en.wikiquote.org/wiki/Donald_Knuth> by "profiling"
your code's performance. The rest of this document shows you how to do that.
=head2 Time with C<now - INIT now>
Expressions of the form C<now - INIT now>, where C<INIT> is a L<phase in the
running of a Perl 6 program|/language/phasers>, provide a great idiom for timing
code snippets.
Use the C<m: your code goes here> L<#perl6 channel
evalbot|/language/glossary#camelia> to write lines like:
=begin code :lang<irc command>
m: say now - INIT now
rakudo-moar abc1234: OUTPUT«0.0018558␤»
=end code
The C<now> to the left of C<INIT> runs 0.0018558 seconds I<later> than the
C<now> to the right of the C<INIT> because the latter occurs during L<the INIT
phase|/language/phasers#INIT>.
=head2 Profile locally
When using the L<MoarVM|https://moarvm.org> backend, the
L<Rakudo|https://rakudo.org> compiler's C<--profile> command line option writes
the profile data to an HTML file.
This file will open to the "Overview" section, which gives some overall data
about how the program ran, e.g., total runtime, time spent doing garbage
collection. One important piece of information you'll get here is percentage of
the total call frames (i.e., blocks) that were interpreted (slowest, in red),
L<speshed|/language/glossary#index-entry-Spesh> (faster, in orange), and jitted
(fastest, in green).
The next section, "Routines", is probably where you'll spend the most time. It has
a sortable and filterable table of routine (or block) name+file+line, the number of
times it ran, the inclusive time (time spent in that routine + time spent in all
routines called from it), exclusive time (just the time spent in that routine), and
whether it was interpreted, speshed, or jitted (same color code as the "Overview"
page). Sorting by exclusive time is a good way to know where to start optimizing.
Routines with a filename that starts like C<SETTING::src/core/> or C<gen/moar/> are
from the compiler, a good way to just see the stuff from your own code is to put
the filename of the script you profiled in the "Name" search box.
The "Call Graph" section gives a flame graph representation of much of the same
information as the "Routines" section.
The "Allocations" section gives you information about the amount of different types
that were allocated, as well as which routines did the allocating.
The "GC" section gives you detailed information about all the garbage collections
that occurred.
The "OSR / Deopt" section gives you information about On Stack Replacements (OSRs),
which is when routines are "upgraded" from interpreted to speshed or jitted. Deopts
are the opposite, when speshed or jitted code has to be "downgraded" to being
interpreted.
If the profile data is too big, it could take a long time for a browser to open
the file. In that case, output to a file with a C<.json> extension using the
C<--profile-filename> option, then open the file with the
L<Qt viewer|https://github.com/tadzik/p6profiler-qt>.
To deal with even larger profiles, output to a file with a C<.sql> extension.
This will write the profile data as a series
of SQL statements, suitable for opening in SQLite.
=begin code :lang<shell>
# create a profile
perl6 --profile --profile-filename=demo.sql -e 'say (^20).combinations(3).elems'
# create a SQLite database
sqlite3 demo.sqlite
# load the profile data
sqlite> .read demo.sql
# the query below is equivalent to the default view of the "Routines" tab in the HTML profile
sqlite> select
case when r.name = "" then "<anon>" else r.name end as name,
r.file,
r.line,
sum(entries) as entries,
sum(case when rec_depth = 0 then inclusive_time else 0 end) as inclusive_time,
sum(exclusive_time) as exclusive_time
from
calls c,
routines r
where
c.id = r.id
group by
c.id
order by
inclusive_time desc
limit 30;
=end code
To learn how to interpret the profile info, use the C<prof-m: your code goes
here> evalbot (explained above) and ask questions on the IRC channel.
=head2 Profile compiling
If you want to profile the time and memory it takes to compile your code, use
Rakudo's C<--profile-compile> or C<--profile-stage> options.
=head2 Create or view benchmarks
Use L<perl6-bench|https://github.com/japhb/perl6-bench>.
If you run perl6-bench for multiple compilers (typically, versions of Perl 5,
Perl 6, or NQP), results for each are visually overlaid on the same graphs,
to provide for quick and easy comparison.
=head2 Share problems
Once you've used the above techniques to identify the code to improve,
you can then begin to address (and share) the problem with others:
=item For each problem, distill it down to a one-liner or the
gist and either provide performance numbers or make the snippet small enough
that it can be profiled using C<prof-m: your code or gist URL goes here>.
=item Think about the minimum speed increase (or ram reduction or whatever) you
need/want, and think about the cost associated with achieving that goal. What's
the improvement worth in terms of people's time and energy?
=item Let others know if your Perl 6 use-case is in a production setting or just for fun.
=head1 Solve problems
This bears repeating: B<make sure you're not wasting time on the wrong code>.
Start by identifying the L<"critical 3%"|https://en.wikiquote.org/wiki/Donald_Knuth>
of your code.
=head2 Line by line
A quick, fun, productive way to try improve code line-by-line is to collaborate
with others using the L<#perl6|/language/glossary#IRC> evalbot
L<camelia|/language/glossary#camelia>.
=head2 Routine by routine
With multidispatch, you can drop in new variants of routines "alongside" existing
ones:
# existing code generically matches a two arg foo call:
multi sub foo(Any $a, Any $b) { ... }
# new variant takes over for a foo("quux", 42) call:
multi sub foo("quux", Int $b) { ... }
The call overhead of having multiple C<foo> definitions is generally
insignificant (though see discussion of C<where> below), so if your new
definition handles its particular case more efficiently than the previously
existing set of definitions, then you probably just made your code that much
more efficient for that case.
=head2 Speed up type-checks and call resolution
Most L<C<where> clauses|/type/Signature#Type_constraints> – and thus most
L<subsets|https://design.perl6.org/S12.html#Types_and_Subtypes> – force dynamic
(runtime) type checking and call resolution for any call it I<might> match.
This is slower, or at least later, than compile-time.
Method calls are generally resolved as late as possible (dynamically at runtime),
whereas sub calls are generally resolved statically at compile-time.
=head2 Choose better algorithms
One of the most reliable techniques for making large performance improvements,
regardless of language or compiler, is to pick a more appropriate algorithm.
A classic example is
L<Boyer-Moore|https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm>.
To match a small string in a large string, one obvious way to do it is to
compare the first character of the two strings and then, if they match, compare
the second characters, or, if they don't match, compare the first character of
the small string with the second character in the large string, and so on. In
contrast, the Boyer-Moore algorithm starts by comparing the *last* character of
the small string with the correspondingly positioned character in the large
string. For most strings, the Boyer-Moore algorithm is close to N times faster
algorithmically, where N is the length of the small string.
The next couple sections discuss two broad categories for algorithmic
improvement that are especially easy to accomplish in Perl 6. For more on this
general topic, read the wikipedia page on L<algorithmic efficiency|https://en.wikipedia.org/wiki/Algorithmic_efficiency>, especially the
'See also' section near the end.
=head3 Change sequential/blocking code to parallel/non-blocking
This is another very important class of algorithmic improvement.
See the slides for
L<Parallelism, Concurrency, and Asynchrony in Perl 6|https://jnthn.net/papers/2015-yapcasia-concurrency.pdf#page=17>
and/or L<the matching video|https://www.youtube.com/watch?v=JpqnNCx7wVY&list=PLRuESFRW2Fa77XObvk7-BYVFwobZHdXdK&index=8>.
=head2 Use existing high performance code
There are plenty of high performance C libraries that you can use within Perl 6 and
L<NativeCall|/language/nativecall> makes it easy to create wrappers for them. There's
experimental support for C++ libraries, too.
If you want to L<use Perl 5 modules in Perl 6|https://stackoverflow.com/a/27206428/1077672>,
mix in Perl 6 types and the L<Meta-Object Protocol|/language/mop>.
More generally, Perl 6 is designed to smoothly interoperate with other languages and
there are a number of L<modules aimed at facilitating the use of libs from other langs|https://modules.perl6.org/#q=inline>.
=head2 Make the Rakudo compiler generate faster code
To date, the focus for the compiler has been correctness, not
how fast it generates code or how fast or lean the code it
generates runs. But that's expected to change, eventually... You
can talk to compiler devs on the freenode IRC channels #perl6 and #moarvm about
what to expect. Better still, you can contribute yourself:
=item Rakudo is largely written in Perl 6. So if you can write Perl 6, then you
can hack on the compiler, including optimizing any of the large body of existing
high-level code that impacts the speed of your code (and everyone else's).
=item Most of the rest of the compiler is written in a small language called
L<NQP|https://github.com/perl6/nqp> that's basically a subset of Perl 6. If you
can write Perl 6, you can fairly easily learn to use and improve the mid-level
NQP code too, at least from a pure language point of view. To dig into NQP and
Rakudo's guts, start with
L<NQP and internals course|https://edumentab.github.io/rakudo-and-nqp-internals-course/>.
=item If low-level C hacking is your idea of fun, checkout
L<MoarVM|https://moarvm.org> and visit the freenode IRC channel #moarvm
(L<logs|https://colabti.org/irclogger/irclogger_logs/moarvm>).
=head2 Still need more ideas?
Some known current Rakudo performance weaknesses not yet covered in this page
include the use of gather/take, junctions, regexes and string handling in
general.
If you think some topic needs more coverage on this page, please submit a PR or
tell someone your idea. Thanks. :)
=head1 Not getting the results you need/want?
If you've tried everything on this page to no avail, please consider discussing
things with a compiler dev on #perl6, so we can learn from your use-case and what
you've found out about it so far.
Once a dev knows of your plight, allow enough time for
an informed response (a few days or weeks, depending on the exact nature of your
problem and potential solutions).
If I<that> hasn't worked out, please consider filing an issue about your experience at
L<our user experience repo|https://github.com/perl6/user-experience/issues> before moving on.
Thanks. :)
=end pod
# vim: expandtab softtabstop=4 shiftwidth=4 ft=perl6