Add support for confidence intervals #69

chrisseaton · 2016-05-16T13:19:09Z

Current output:

Warming up --------------------------------------
                 mul   243.455k i/100ms
                 pow   274.906k i/100ms
Calculating -------------------------------------
                 mul      5.635M (± 8.4%) i/s -     27.997M in   5.014078s
                 pow      7.674M (± 5.1%) i/s -     38.487M in   5.029545s

Comparison:
                 pow:  7674355.1 i/s
                 mul:  5635018.0 i/s - 1.36x  slower

With confidence intervals:

Warming up --------------------------------------
                 mul   244.458k i/100ms
                 pow   282.247k i/100ms
Calculating -------------------------------------
                 mul      5.996M (± 0.6%) i/s -     30.068M in   5.020666s
                 pow      8.035M (± 1.1%) i/s -     39.797M in   5.000516s
                   with 95.0% confidence

Comparison:
                 pow:  8034594.6 i/s
                 mul:  5996097.6 i/s - 1.34x  (± 0.02) slower
                   with 95.0% confidence

Why are confidence intervals good? The standard deviation isn't really actionable. If I tell you something is plus/mins X SD, what can you do with that? If I tell you something is plus/minus X and I'm 95% confident about that then you can theoretically use that in a quantitive assessment of the risk and cost of being wrong and use that to make a decision. It also isn't parametric - you can't make it smaller if you want more certainty, or larger if you are more relaxed.

Another big benefit is that we can show a confidence interval for the comparison as well! This isn't something that isn't possible at the moment.

Finally, I think the standard deviation is overly conservative, and confidence intervals are smaller in practice. From experience using benchmark-ips, the standard deviations we currently use are not useful because they're so large.

Adds an optional dependency on the kalibera gem.

@thedarkone what do you think?

thedarkone · 2016-05-16T14:49:33Z

@thedarkone what do you think?

Haha, you make my PR #68 look like something coming from a village idiot 😆.

This is how it handles my bench from #68:

Warming up --------------------------------------
 work(1000 + r(500))    15.000  i/100ms
 work(1000 + r(400))    16.000  i/100ms
Calculating -------------------------------------
 work(1000 + r(500))    155.906  (± 0.9%) i/s -    780.000  in   5.008610s
 work(1000 + r(400))    161.551  (± 0.7%) i/s -    816.000  in   5.053923s
                   with 95.0% confidence

Comparison:
 work(1000 + r(400)):      161.6 i/s
 work(1000 + r(500)):      155.9 i/s - 1.04x  (± 0.01) slower
                   with 95.0% confidence

Awesome!

chrisseaton · 2016-05-16T15:52:25Z

Here's some slides from the Benchmarking '16 conference about the confidence intervals I'm using http://soft-dev.org/events/bench16/slides/Tomas_Kalibera.pdf

chrisseaton · 2016-05-16T19:20:13Z

And this is what confidence intervals look like compared to the SD for the ERB benchmark from MRI.

# Conflicts: # lib/benchmark/timing.rb

chrisseaton · 2016-07-27T20:10:39Z

@evanphx it looks like this is now failing CI because master is (I just merged).

Beside that, do you have any opinions on the PR?

The key advantage is it gives you a CI for the speedup as well as the absolute measurement.

evanphx · 2016-07-28T04:32:35Z

Looks great! I think the keys changes can end up on disk but those are saved briefly so it shouldn't be an issue.

I'll check CI in the morning, but looks fine!

kbrock · 2016-08-04T03:44:41Z

Thanks @chrisseaton - this is great

chrisseaton added 13 commits April 29, 2016 23:29

Remove unused stats code.

17c27bf

Abstract the statistical model used for the central tendency and error.

0a141d4

Also abstract the code for the slowdown.

16f0e7b

Rename stddev_percentage to the generic error_percentage.

b42c3df

Basics of bootstrap confidence intervals.

bdea869

Errors on the comparison.

08f9ffe

Nice error if kalibera can't be loaded.

b87ee5c

More work on README.

943a488

Remove debug code left in by mistake.

500c97e

Example advanced configuration.

f3f26e8

Statistics footers.

c48b4cb

Fix tests I broke.

8e0d150

Older versions of Ruby need an encoding magic comment.

816806b

Merge branch 'master' into kalibera-confidence-intervals

c9eeb95

# Conflicts: # lib/benchmark/timing.rb

evanphx merged commit 8188aee into evanphx:master Aug 3, 2016

evanphx mentioned this pull request Aug 9, 2016

Configurable sampling frequency/duration #68

Closed

tgxworld mentioned this pull request Aug 12, 2016

API change without deprecation in Benchmark::IPS::Report #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for confidence intervals #69

Add support for confidence intervals #69

chrisseaton commented May 16, 2016

thedarkone commented May 16, 2016

chrisseaton commented May 16, 2016

chrisseaton commented May 16, 2016

chrisseaton commented Jul 27, 2016

evanphx commented Jul 28, 2016

kbrock commented Aug 4, 2016

Add support for confidence intervals #69

Add support for confidence intervals #69

Conversation

chrisseaton commented May 16, 2016

thedarkone commented May 16, 2016

chrisseaton commented May 16, 2016

chrisseaton commented May 16, 2016

chrisseaton commented Jul 27, 2016

evanphx commented Jul 28, 2016

kbrock commented Aug 4, 2016