Skip to content

Getting Started with Core_bench

Kevin Fox edited this page Jan 31, 2024 · 9 revisions

Core_bench is a micro-benchmarking library for OCaml that can measure execution costs of operations that take 1ns to about 100ms. Core_bench tries to measure execution costs of such short-lived computations precisely while trying to account for delayed GC costs and noise introduced by other activity on the system.

The easiest way to get started is using an example:

open Core
open Core_bench

let main () =
  Random.self_init ();
  let x = Random.float 10.0 in
  let y = Random.float 10.0 in
  Command.run (Bench.make_command [
    Bench.Test.create ~name:"Float add" (fun () ->
      ignore (x +. y));
    Bench.Test.create ~name:"Float mul" (fun () ->
      ignore (x *. y));
    Bench.Test.create ~name:"Float div" (fun () ->
      ignore (x /. y));
  ])

let () = main ()

When compiled this gives you an executable:

$ ./z.exe 
Estimated testing time 30s (3 benchmarks x 10s). Change using -quota SECS.
┌───────────┬──────────┬─────────┬────────────┐
│ Name      │ Time/Run │ mWd/Run │ Percentage │
├───────────┼──────────┼─────────┼────────────┤
│ Float add │   2.53ns │   2.00w │     41.04% │
│ Float mul │   2.50ns │   2.00w │     40.63% │
│ Float div │   6.16ns │   2.00w │    100.00% │
└───────────┴──────────┴─────────┴────────────┘

If any of the functions resulted in allocation of words on the major heap (mjWd) or promotions, columns corresponding to those would be automatically displayed. In general, if a column does not have sginificant values, the column is not displayed. The most common options one would want to change are the -q flag which controls the time quota for testing and enabling/disabling specific columns.

In the simple case, a benchmark is simply a unit -> unit thunk and a name:

    Bench.Test.create ~name:"Float add" (fun () -> ignore (x +. y));

One can also create indexed benchmarks, which can be helpful in understanding non-linearities in the execution profiles of functions. For example:

open Core.Std
open Core_bench.Std

let main () =
  Command.run (Bench.make_command [
    Bench.Test.create_indexed
      ~name:"Array.create"
      ~args:[1;10;100;200;300;400]
      (fun len ->
         Staged.stage (fun () -> ignore(Array.create ~len 0)));
  ])

let () = main ()

which produces:

$ ./z.exe -q 3
Estimated testing time 18s (6 benchmarks x 3s). Change using -quota SECS.
┌──────────────────┬────────────┬─────────┬──────────┬────────────┐
│ Name             │   Time/Run │ mWd/Run │ mjWd/Run │ Percentage │
├──────────────────┼────────────┼─────────┼──────────┼────────────┤
│ Array.create:1   │    26.60ns │   2.00w │          │      0.99% │
│ Array.create:10  │    35.29ns │  11.00w │          │      1.31% │
│ Array.create:100 │   108.39ns │ 101.00w │          │      4.03% │
│ Array.create:200 │   178.45ns │ 201.00w │          │      6.64% │
│ Array.create:300 │ 1_996.86ns │         │  301.00w │     74.25% │
│ Array.create:400 │ 2_689.28ns │         │  401.00w │    100.00% │
└──────────────────┴────────────┴─────────┴──────────┴────────────┘

Core_bench produces self documenting executables. This documentation also closely corresponds to the functionality exposed through the .mli file and is a great way to interactively explore what the various options do. At the time of this writing -? displays:

Benchmark for Float add, Float mul, Float div

  z.exe [COLUMN ...]

Columns that can be specified are:
	time       - Number of nano secs taken.
	cycles     - Number of CPU cycles (RDTSC) taken.
	alloc      - Allocation of major, minor and promoted words.
	gc         - Show major and minor collections per 1000 runs.
	percentage - Relative execution time as a percentage.
	speedup    - Relative execution cost as a speedup.
	samples    - Number of samples collected for profiling.

Columns with no significant values will not be displayed. The
following columns will be displayed by default:
	time alloc percentage

Error Estimates
===============
To display error estimates, prefix the column name (or
regression) with a '+'. Example +time.

(1) R^2 is the fraction of the variance of the responder (such as
runtime) that is accounted for by the predictors (such as number of
runs).  More informally, it describes how good a fit we're getting,
with R^2 = 1 indicating a perfect fit and R^2 = 0 indicating a
horrible fit. Also see:
http://en.wikipedia.org/wiki/Coefficient_of_determination

(2) Bootstrapping is used to compute 95% confidence intervals
for each estimate.

Because we expect runtime to be very highly correlated with number of
runs, values very close to 1 are typical; an R^2 value for 'time' that
is less than 0.99 should cause some suspicion, and a value less than
0.9 probably indicates either a shortage of data or that the data is
erroneous or peculiar in some way.

Specifying additional regressions
=================================
The builtin in columns encode common analysis that apply to most
functions. Bench allows the user to specify custom analysis to help
understand relationships specific to a particular function using the
flag "-regression" . It is worth noting that this feature requires
some understanding of both linear regression and how various quatities
relate to each other in the OCaml runtime.  To specify a regression
one must specify the responder variable and a command separated list
of predictor variables.

For example: +Time:Run,mjGC,Comp

which asks bench to estimate execution time using three predictors
namely the number of runs, major GCs and compaction stats and display
error estimates. Drop the prefix '+' to suppress error estimation. The
variables available for regression include:
	Time  - Time
	Cycls - Cycles
	Run   - Runs per sampled batch
	mGC   - Minor Collections
	mjGC  - Major Collections
	Comp  - Compactions
	mWd   - Minor Words
	mjWd  - Major Words
	Prom  - Promoted Words
	One   - Constant predictor for estimating measurement overhead


=== flags ===

  [-all-values]         Show all column values, including very small ones.
  [-ascii]              Display data in simple ascii based tables.
  [-ci-absolute]        Display 95% confidence interval in absolute numbers
  [-clear-columns]      Don't display default columns. Only show user specified
                        ones.
  [-display STYLE]      Table style (short, tall, line, blank or column).
                        Default short.
  [-fork]               Fork and run each benchmark in separate child-process
  [-geometric SCALE]    Use geometric sampling. (default 1.01)
  [-linear INCREMENT]   Use linear sampling to explore number of runs, example
                        1.
  [-load FILE]          Analyze previously saved data files and
                        don't run tests. [-load] can be specified multiple
                        times.
  [-no-compactions]     Disable GC compactions.
  [-overheads]          Show measurement overheads, when applicable.
  [-quota SECS]         Time quota allowed per test (default 10s).
  [-reduced-bootstrap]  Reduce the number of bootstrapping iterations
  [-regression REGR]    Specify additional regressions (See -? help).
  [-save]               Save benchmark data to .txt files.
  [-stabilize-gc]       Stabilize GC between each sample capture.
  [-v]                  High verbosity level.
  [-width WIDTH]        width limit on column display (default 200).
  [-build-info]         print info about this build and exit
  [-version]            print the version of this build and exit
  [-help]               print this help text and exit
                        (alias: -?)
Clone this wiki locally