/
README
614 lines (476 loc) · 26.3 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
Quick start instructions (see below for a few more details of what
these do if you are curious):
(1) Check below for install instructions specific to your OS.
Either: (2a) Install Leiningen and run lein-init.sh, which creates
directories 'lein' and '.m2' in your home directory (takes about 2
mins on my system after Leiningen is installed), or (2b) edit
env.sh to specify the location of your Clojure JAR files. Look
for the CLOJURE_CLASSPATH variable settings.
(2a) Leiningen installation instructions:
http://github.com/technomancy/leiningen
Then use Leiningen to install multiple Clojure JAR files in
$HOME/lein directory.
% ./lein-init.sh
(3) Use init.sh to create largish input and expected output files
(total about 850 Mbytes of storage), by running some Java
programs:
% ./init.sh output
Time required on some of my systems: 6-7 mins, or 18 mins on a
Windows + Cygwin running in a VM.
(4) Run Clojure 1.2 versions of all of the benchmark programs:
% ./run-all.sh clj-1.2
Time required on some of my systems: 25 - 35 mins (about half the
time was spent running the long fannkuch benchmark, and about 1/4
on the long knucleotide benchmmark.)
Another example: Run Java, Clojure 1.2, Clojure 1.3 alpha1, and
Clojure 1.3 alpha3 versions of all of the benchmark programs.
% ./run-all.sh long java clj-1.2 clj-1.3-alpha1 clj-1.3-alpha3
(5) If you want all results recorded to an XML file, look in
env.sh for MP_COMMON_ARGS and the comments before it on how to
enable this. Then do a ./run-all.sh command, or individual
batch.sh commands as described below. You can then convert the
XML file to a CSV file for easier importing into spreadsheets
with:
% bin/xmltable2csv $HOME/results.xml > results.csv
Systems on which this has been tested:
Mac OS X 10.5.8
Mac OS X 10.6.6, both with and without MacPorts installed
Ubuntu 10.4 LTS, 32-bit and 64-bit
Windows XP SP3 + Cygwin 1.7.7
You need a Java Development Kit installed. A Java Virtual Machine
with no Java compiler (javac) is not enough. You need a compiler for
Java source files that are used in the "./init.sh output" step above
to create the input files. You also need a JVM to run Leiningen, if
you use that.
Tested with recent Java 1.6.0.X HotSpot JVMs from Sun, and JRockit
from Oracle on Windows XP.
----------------------------------------------------------------------
Install instruction specific to Mac OS X
----------------------------------------------------------------------
The executable program bin/timemem-darwin must be run as root on Mac
OS X 10.6 (Snow Leopard). This is not needed for 10.5 (Leopard). See
below for more details if you are curious. One way to do this is to
make the program setuid root by running the commands below, entering
your password when prompted:
% sudo chown root bin/timemem-darwin
% sudo chmod 4555 bin/timemem-darwin
The following MacPorts packages can be useful. There may be
equivalents under Homebrew -- I haven't tried them.
gmp - If you want to run those pidigits programs that use the GNU MP
library.
git-core - optional, but useful for getting updated versions of
clojure-contrib from github.com
gcc44 - Big, takes a while to install, and only useful if you want to
compile some of the parallel C memory bandwidth benchmarking
programs in src/stream.
p5-xml-libxml - Required if you have installed some version of perl
within MacPorts (or more likely, it was installed as a dependency
required by something else you did choose to install). You can
tell if the output of 'which perl' shows /opt/local/bin/perl or
similar. Without this package, using a MacPorts-installed Perl
will likely result in error messages like the one below when you
try to run programs with measureproc (invoked by batch.sh):
----------------------------------------------------------------------
Can't locate XML/LibXML.pm in @INC (@INC contains: /opt/local/lib/perl5/site_perl/5.12.3/darwin-multi-2level /opt/local/lib/perl5/site_perl/5.12.3 /opt/local/lib/perl5/vendor_perl/5.12.3/darwin-multi-2level /opt/local/lib/perl5/vendor_perl/5.12.3 /opt/local/lib/perl5/5.12.3/darwin-multi-2level /opt/local/lib/perl5/5.12.3 /opt/local/lib/perl5/site_perl /opt/local/lib/perl5/vendor_perl .) at ../bin/measureproc line 22.
BEGIN failed--compilation aborted at ../bin/measureproc line 22.
----------------------------------------------------------------------
You should have a Sun/Oracle/Apple Hotspot JDK installed by default
already with java* commands in /usr/bin and some other support files
(headers for JNI, etc.) in subdirectories beneath
/System/Library/Frameworks/JavaVM.framework
More details on bin/timemem-darwin root permissions:
It needs this permission in order to measure the memory usage of the
benchmark programs, for the same reasons that top(1) and ps(1) need to
run as root. You can examine the source code of the program in
src/timemem-darwin/timemem-darwin.c if you want to know what it is
doing. The only thing it needs root permission for is using Mach OS
calls to get the current memory usage of a child process. It seems
getrusage() and wait4() system calls no longer measure the maximum
resident set size of a child process in OS X 10.6, like they used to
in 10.5. I've filed a bug on Apple's developer web site in January
2011, but this behavior change may be intentional for all I know.
----------------------------------------------------------------------
Install instructions specific to Ubuntu Linux 10.4
----------------------------------------------------------------------
If you want the Sun/Oracle JDK, not OpenJDK (which is the default on
Ubuntu as of 10.4, or perhaps even earlier versions of Ubuntu), follow
these steps:
% sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
After that, you can either run the following commands, or use the
graphical Synaptic Package Manager to add the sun-java6-jdk package.
% sudo apt-get update
% sudo apt-get install sun-java6-jdk
The following packages are also useful to add on Ubuntu Linux:
libgmp3-dev - If you want to run those pidigits programs that use the
GNU MP library.
git-core - optional, but useful for getting updated versions of
clojure-contrib from github.com
Source:
http://happy-coding.com/install-sun-java6-jdk-on-ubuntu-10-04-lucid/
----------------------------------------------------------------------
Install instructions specific to Windows+Cygwin
----------------------------------------------------------------------
Cygwin can be obtained from http://www.cygwin.com
Besides the basic packages added with every Cygwin installation, you
must install at least some of the following for this software to work:
perl (category Interpeters and Perl) and libxml2 (many categories,
including Devel) - needed if you want to run any of the included
Perl scripts, which includes bin/measureproc, the recommended way
to make time and memory measurements of the benchmark programs.
wget (in category Web) - needed if you use Leiningen
git (category Devel) - optional, but useful for getting updated
versions of clojure-contrib from github.com
Edit env.sh so that JAVA_BIN points at the directory where your java
and javac commands are installed. Examples are there already for the
default install locations of JRockit and Hotspot JVMs (at least
particular version numbers of them). If you install the Sun/Oracle
Hotspot JDK, its default location is C:\Program
Files\Java\jdk<version>. For Oracle JRockit the default install
location is C:\Program Files\Java\jrmc-<version>.
I've tried to make all bash scripts handle spaces in filenames by
using "${BASH_VARIABLE_NAME}" syntax to substitute the values of shell
variables. There is some weirdness in some places in the scripts
because Cygwin commands accept paths in Unix notation with /
separators, but when invoking the JVM commands, they often require
Windows path names with \ separators. The Cygwin command 'cygpath' is
useful in converting between these two formats.
Caveat: I don't yet know how to compile the JNI interface to the GMP
library so that the Java, Scala, or Clojure programs using that native
library can run. The gcc4-core (category Devel) and libgmp-devel
(category Libs and Math) Cygwin packages can certainly help in
compiling the files, but so far I haven't compiled them in a way that
they can be used by a JVM running on Windows. Please let me know if
you figure out how to do this.
----------------------------------------------------------------------
The Perl, SBCL, and Haskell programs have been well tested on Mac OS
X, and more lightly tested on Linux and Windows. The main issue is
getting the relatively recent versions of SBCL and GHC, and some
benchmark programs require additional libraries on top of that
(e.g. for regex matching).
The non-Clojure versions of these benchmark programs have been
downloaded from this web site:
http://shootout.alioth.debian.org
See the file COPYING for the licenses under which these files are
distributed.
So far I have Clojure implementations for the benchmarks noted in the
table below.
Clojure
Clojure Easy to program is
Benchmark program parallel- parallel-
written? ize? ized?
------------------- ---------- ----------- ----------
binarytrees yes yes yes
chameneos-redux - yes -
------------------- ---------- ----------- ----------
fannkuch yes yes yes
(^^^ deprecated on shootout web site)
fannkuchredux yes yes yes
fasta yes no no
------------------- ---------- ----------- ----------
knucleotide yes yes yes
mandelbrot yes yes yes
meteor-contest - ? -
------------------- ---------- ----------- ----------
nbody yes no no
pidigits yes no no
regexdna yes part of it yes
------------------- ---------- ----------- ----------
revcomp yes no no
spectralnorm yes yes yes
thread-ring - yes -
------------------- ---------- ----------- ----------
I have hacked up some shell scripts to automate the compilation and
running of some of these programs. They have been tested on Mac OS X
10.5.8 and 10.6.4 with recent versions of the Glasgow Haskell Compiler
ghc, SBCL Common Lisp, Perl, several versions of Sun's Java VM (for
Windows XP, Linux, and Mac), Oracle's JRockit JVM for Windows, and
Clojure 1.2.0 and 1.3.0 alpha1 (and earlier 1.0.0 or shortly before
that release). Some of the benchmarks use Clojure transients and thus
require 1.0 or later, and some use deftype, requiring Clojure 1.2.0 or
later (or whenever deftype was introduced). See below for the steps I
used to install SBCL and ghc.
You should edit the file env.sh to point at the location of your
Clojure JAR files, and perhaps give explicit path names to some of the
other language implementations, if you wish.
If you want to create Clojure JAR files in the locations already
specified in the file env.sh, you can install Leiningen and then run
the following script. Doing so will create directories named 'lein'
and '.m2' in your home directory, if they do not already exist, and
fill them with a few dozen files each.
Note: This step is optional, but if you do not do it, you must either
edit env.sh to point at your Clojure JAR files, or put Clojure JAR
files in the locations mentioned in that file.
% ./lein-init.sh
The knucleotide, regexdna, and revcomp benchmark input files are quite
large (250 Mbytes), and not included in the distribution. These input
files must be generated by running this shell script:
% ./init.sh
That will generate the input files only. You can also choose to
generate those plus the "expected output files" (which are just the
output of the Java versions of the benchmark programs) by running
this:
% ./init.sh output
If you have the input files generated, you can run all of the
implementations of the knucleotide benchmark, with the quick, medium,
and long test input sizes, using these commands:
% cd knucleotide
% ./batch.sh
You can also pick and choose which benchmark lengths to run, and which
language implementations to run, by giving options like these on the
command line:
% ./batch.sh java clj-1.2 clj-1.3-alpha1 quick medium
That would run the Java, Clojure 1.2, and Clojure 1.3 alpha1 versions
of the benchmark, each with the quick and medium size input files.
Note that the order of the command line arguments is not important.
The following command would run only the Clojure 1.2 version with the
long input file:
% ./batch.sh clj-1.2 long
You can also run the following command from the root directory of this
package, and it will run a batch.sh command with the same command line
arguments in each of the benchmark subdirectories, e.g.
./run-all.sh long java clj-1.2
will run the long benchmark for Java and Clojure 1.2 in all of the
subdirectories mentioned in the 'for' line you can see for yourself in
the run-all.sh script.
Note that all of the shootout web site results are for what are called
the 'long' benchmarks in this package. The short and medium tests are
primarily for quicker cycling through the edit-compile-run-debug loop,
when you are developing a benchmark program yourself.
If you find any improvements to the Clojure versions, I'd love to hear
about them.
The files RESULTS-clj-1.1 and RESULTS-clj-1.2 in the results directory
contains some summarized execution times from running these programs
on my home iMac. The file results-java-clj-1.2-1.3a1.xls is an Excel
spreadsheet containing run time results for several different JVMs and
operating systems, all on the same hardware as each other (but not the
same hardware as the RESULTS-clj-1.1 and RESULTS-clj-1.2 files above,
so don't go comparing results between these files to each other
directly).
Andy Fingerhut
andy_fingerhut@alum.wustl.edu
----------------------------------------------------------------------
On a Mac OS X machine (tested on 10.5.8 and 10.6.4 at least), download
and install MacPorts from here:
http://www.macports.org
After following the instructions there for installing MacPorts, you
can install the Glasgow Haskell compiler with the command:
sudo port install ghc
And SBCL with the threads (i.e. multi-threading) option enabled with
this command:
sudo port install sbcl@+threads
Here are the versions I currently have installed used to produce some
of my benchmark results:
% port installed ghc sbcl
The following ports are currently installed:
ghc @6.10.1_8+darwin_9_i386 (active)
sbcl @1.0.24_0+darwin_9_i386+html+test+threads (active)
% java -version
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03-211)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02-83, mixed mode)
% javac -version
javac 1.6.0_13
% sbcl --version
SBCL 1.0.29
% ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.10.1
----------------------------------------------------------------------
I've also done some testing of this set of scripts on an Ubuntu 10.04
Desktop i386 installation, in a VMWare Fusion virtual machine running
on my Mac (and earlier tested on Ubuntu 9.04).
I've tried it with these versions of packages installed using Ubuntu's
Synaptic Package Manager.
sun-java6-jdk 6-14-0ubuntu1.9.04
sbcl 1.0.42 (and earlier tested with 1.0.18.0-2)
ghc 6.8.2dfsg1-1ubuntu1
% java -version
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) Client VM (build 14.0-b16, mixed mode, sharing)
% javac -version
javac 1.6.0_14
% sbcl --version
SBCL 1.0.18.debian
% ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.8.2
Apparently the Haskell knucleotide program requires GHC 6.10 or
later, and gives a compilation error with GHC 6.8.x, as is currently
the latest available through Ubuntu's distribution system. If you
really want to run that benchmark, you could try installing GHC 6.10
yourself. This web page may provide the right recipe for doing so. I
haven't tried it myself.
http://www.johnmacfarlane.net/Gitit%20on%20Ubuntu
----------------------------------------------------------------------
Some features of Clojure that are definitely slow, and whether these
Clojure programs use those features
----------------------------------------------------------------------
The following notes come from a reply to a question on Stack Overflow
about why the Clojure programs are slower than the Scala programs.
http://stackoverflow.com/questions/4148382/why-does-clojure-do-worse-than-scala-in-the-alioth-benchmarks
Note that significant speed improvements were made to many of the
Clojure programs on the Computer Language Benchmarks Game web site
after this Stack Overflow discussion occurred, so what the discussion
participants saw then was noticeably slower (relative to Java -6
server) than what is on the web site now.
----------------------------------------
For example the following features in Clojure are all very cool and
useful for development convenience, but incur some runtime performance
overhead:
* Lazy sequences and lists
* Dynamic Java interoperability using reflection
* Runtime function composition / first class functions
* Multimethods / dynamic dispatch
* Dynamic compilation with eval or on the REPL
* BigInteger arithmetic
If you want absolute maximum performance (at the cost of some extra
complexity), you would want to rewrite code to avoid these and use
things like:
* Static type hinting (to avoid reflection)
* Transients
* Macros (for compile time code manipulation)
* Protocols
* Java primitives and arrays
* loop / recur for iteration
----------------------------------------
I have attempted to categorize which Clojure programs in this
collection use the features above, and which do not, in an Excel
spreadsheet. See the file results/clojure-features-used.xls. This
spreadsheet should be easily readable using Open Office or one of its
derivatives, as no "fancy features" are used in it. It is simply a
more convenient way for me to record this information in table form,
and later rearrange and view them in different ways, than a text file.
----------------------------------------------------------------------
Do these programs fall into a microbenchmark pitfall?
----------------------------------------------------------------------
Cliff Click's slides for his talk "How NOT To Write A Microbenchmark",
given at JavaOne 2002 (Sun's 2002 Worldwide Java Developer
Conference).
http://www.azulsystems.com/events/javaone_2002/microbenchmarks.pdf
I mention this because (1) it is worth reading for anyone interested
in benchmarks, to avoid pitfalls, and (2) so I can make a list of
pitfalls mentioned in there, and for each Clojure program describe
whether I believe it has fallen into that pitfall.
Are the shootout benchmark programs microbenchmarks? Dr. Click gives
his definition on p. 8, and several examples throughout his talk. The
shootout benchmark programs are relatively small, and in some cases
the datasets are "large" (i.e. hundreds of megabytes, but none are a
gigabyte). However, there is a significant difference between these
benchmarks vs. the examples in Dr. Click's slides: the goal with these
benchmarks is to measure the entire run time of the whole program, not
some smaller part of it. We aren't trying to see whether a particular
function call takes 7 microsec versus 15 microsec. We are trying to
find out how much time is required for all of the calls made together
during the entire run, and find a program that is as fast as possible
for that total time.
Now it is true that for most of these programs, 99% of the time is
spent in relatively small fraction of the lines of code. For example,
in knucleotide.clj-8b.clj, almost all of the compute time is spent in
the last 13 lines of the function tally-dna-subs-with-len. In
nbody.clj-12.clj, most of the time is spent in the last 10 lines of
the function advance, which calls the 5-line p-dt! and the 18-line
v-dt!, which in turn calls the 4-line v+!, for a total of 27 lines of
"hot code" in a 252-line file.
One piece of advice from Dr. Click's slides that is most relevant is
"Know what you test" (title of p. 32). These shootout benchmark
programs are not intended to discover some particular fact, and remove
all other variables, _unless_ your goal is to discover how well this
particular program performs on the task at hand. For example, the
knucleotide benchmark programs do not measure only the performance of
the hash table implementation (because there is at least some work in
doing other things, like reading in a file that is hundreds of
megabytes long), but that is certainly the most significant portion of
the program's run time.
In general, when comparing Java and Clojure run times, or any two
languages implemented on the Java Virtual Machine in this collection
of benchmarks, realize that the following run times are _included_ in
the times that you see reported:
JVM initialization time
loading classes from compiled class files
time spent reading the input data file (if any) and writing output
any time spent doing JIT compilation
any time spent performing garbage collection
The following times are _not_ included in the times you see reported:
Compiling source files to class files
All of the Clojure programs are ahead of time (or AOT) compiled, and
compiled class files are saved on disk, before the measurement clock
is started.
Is this fair? In the sense that the same kinds of processing time are
included in measurements of both Java and Clojure, it is fair. Is it
what _you_ personally want to measure? That depends upon what you
want to measure. You are welcome to measure other things about these
programs if you wish, and report the times you get.
Because the goal here is to measure the entire run time of these
programs, "warming up the JVM" via JIT compilation is all included.
There is no reason (with this goal) to first run the JVM until it is
warmed up, and then run "the real test". Warmup is part of the real
test.
In all of these programs, there should be little or no dead code to be
eliminated. All results calculated are printed out in some form.
Sometimes only a small summary of the the calculated results are
actually printed, but it is always intended to be a summary such that
there are no known compilers that are "smart" enough to figure out
that they could reduce the processing required in order to print the
desird results. Could a human do that? In some cases, it is pretty
easy for a human to change the program to produce the same output in a
similar way. For example, in the knucleotide benchmark the program
must print out the number of occurrences of the substring "GGT" in a
large input string. However, the instructions for the benchmark
require that first a table is made that maps _all_ length 3 substrings
of the input string to a count of the number of occurrences, then
extract out the entry for "GGT" and print its count. Obviously a
person can change the program to reduce the time and space required if
the goal is only to calculate counts for the substring "GGT", but it
is breaking the rules of the benchmark to do so.
Cliff Click's advice (summarized greatly):
+ Is there loop unrolling going on behind the scenes? (pp. 17-18)
+ "Beware 'hidden' Cache Blowout" - Does the data set fit into L1
cache? (p. 28-29)
+ Explicitly Handle GC: Either don't allocate in main loops, so no GC
pauses, or run long enough to reach GC steady state. (p. 30) TBD:
Consider adding instrumentation to measure total GC time in each run
involving a JVM.
+ Know what you test (p. 32) - use profiler to measure where most of
the time is spent, and report that with the results.
+ Be aware of system load - How to verify total CPU time taken by all
other processes on the system? I can eyeball it if I watch the
Activity Monitor on a Mac, or top on Linux, Windows Task Manager on
Windows, but is there an automated way to easily determine the total
CPU time used by all other processes during a particular time
interval?
+ Be aware of clock granularity (p. 33): This should not be an issue
since all of the Clojure programs run for at least 10 sec, and clock
granularity is definitely better than 0.1 sec. (TBD: Verify these
numbers)
+ Run to steady state, at least 10 sec (p. 33): In many of these
programs, each iteration is doing something a bit different, but the
general idea of running at least 10 sec is a good one.
+ Fixed size datasets (p. 34): Every run with input data has the same
input file. For each one, state the size of the input data that
actually needs to be "worked on".
+ Constant amount of work per iteration (p. 34): This is not really
relevant for these benchmarks, if we are only paying attention to
the total time to complete the benchmark run. It is only relevant
if we are trying to calculate a time per iteration of some loop,
which we are not.
+ Avoid the 'eqntott Syndrome' (p. 34): Again, we are not looking at
run times of separate iterations and comparing them to each other.
There is some warmup time of doing JIT compilation in all of these
runs, but it is there every single time, and it is there for Clojure
_and_ Java, and any other JVM-based language implementation.
+ Avoid 'dead' loops (p. 35): The final answer is always printed, and
the computation is non-trivial in all cases I can recall. Good to
verify this for each benchmark program individually.
+ Be explicit about GC (p. 36): Does this mean to measure it and
report it separately?
+ JIT performance may change over time. Warmup loop + test code
before ANY timing (p. 36): This is relevant if we want to measure
the time per iteration of some loop. That is not the goal for these
benchmark programs. We only care about the time to get the final
result.
----------------------------------------------------------------------
Profiling results
----------------------------------------------------------------------
TBD: Record a summary for the fastest known Clojure programs, or maybe
all of them, where they spend the most time, i.e. the call stacks that
(combined) take up at least half of the program execution time,
according to a Java profiler.