Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output median #171

Closed
hosewiejacke opened this issue Jun 1, 2019 · 10 comments · Fixed by #176
Closed

output median #171

hosewiejacke opened this issue Jun 1, 2019 · 10 comments · Fixed by #176
Labels
feature-request question Further information is requested

Comments

@hosewiejacke
Copy link

I'd like to measure the median instead of the mean. Currently, I have a perl script that reads the JSON output and calculates the quartiles. But that's not very convenient. Do you think it's good to output the median (instead of, or in addition to) the mean?

Here are some points:

  • If cron fires during a benchmark, the mean might be off
  • It seems many academic papers prefer the median.

PS: Thanks for hyperfine. It's much more convenient than time.

@sharkdp sharkdp added the question Further information is requested label Jun 2, 2019
@sharkdp
Copy link
Owner

sharkdp commented Jun 2, 2019

Thank you for your feedback!

Do you think it's good to output the median (instead of, or in addition to) the mean?

I'm not sure. I want hyperfine to be a very easy-to-use tool. This also means that I would like the output to be easily readable and understandable.

If cron fires during a benchmark, the mean might be off

Yes. For this reason, hyperfine includes a statistical outlier detection which actually uses the median and the median average deviation (MAD) as robust estimators:

/// A module for statistical outlier detection.
///
/// References:
/// - Boris Iglewicz and David Hoaglin (1993), "Volume 16: How to Detect and Handle Outliers",
/// The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka,
/// Ph.D., Editor.
use statistical::median;
/// Minimum modified Z-score for a datapoint to be an outlier. Here, 1.4826 is a factor that
/// converts the MAD to an estimator for the standard deviation. The second factor is the number
/// of standard deviations.
pub const OUTLIER_THRESHOLD: f64 = 1.4826 * 10.0;
/// Compute modifized Z-scores for a given sample. A (unmodified) Z-score is defined by
/// `(x_i - x_mean)/x_stddev` whereas the modified Z-score is defined by `|x_i - x_median|/MAD`
/// where MAD is the median average deviation.
///
/// References:
/// - <https://en.wikipedia.org/wiki/Median_absolute_deviation>
pub fn modified_zscores(xs: &[f64]) -> Vec<f64> {
assert!(!xs.is_empty());
// Compute sample median:
let x_median = median(xs);
// Compute the absolute deviations from the median:
let deviations: Vec<f64> = xs.iter().map(|x| (x - x_median).abs()).collect();
// Compute median absolute deviation:
let mad = median(&deviations);
// Compute modified Z-scores (x_i - x_median) / MAD
xs.iter().map(|&x| (x - x_median) / mad).collect()
}

If a cron job would fire in the middle of the benchmark, the outlier detection would hopefully catch this. For example:

▶ hyperfine 'md5sum $(which hyperfine)' &; sleep 2 && stress --cpu 12 --timeout 1 --quiet
  Time (mean ± σ):      17.8 ms ±  19.4 ms    [User: 11.9 ms, System: 1.9 ms]
  Range (min … max):    12.2 ms … 122.0 ms    193 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark
  on a quiet PC without any interferences from other programs. It might help to
  use the '--warmup' or '--prepare' options.

Here, we run a benchmark for md5sum $(which hyperfine), but start 12 CPU workers after 2 seconds of benchmarking time. Not only does hyperfine warn you that there were statistical outliers, it's also easy to see from the large standard deviation and the large min … max range. (The mean without the additional overhead should be around 13.4 ms).

It seems many academic papers prefer the median.

Interesting. Did you see any arguments why the median is preferred over the mean? Just because it's robust against outliers? I would be great if you could include the references here.

@sharkdp
Copy link
Owner

sharkdp commented Jun 2, 2019

Oh.. also: did you see the scripts/ folder?

https://github.com/sharkdp/hyperfine/tree/master/scripts

For the benchmark example above, the advanced_statistics.py script produces this output:

▶ python scripts/advanced_statistics.py md5sum.json 
Command 'md5sum $(which hyperfine)'
  mean:      0.018 s
  stddev:    0.020 s
  median:    0.013 s

  percentiles:
     P_05 .. P_95:    0.013 s .. 0.017 s
     P_25 .. P_75:    0.013 s .. 0.014 s  (IQR = 0.002 s)

(granted, this demonstrates quite well that the median is a robust measure)

@hosewiejacke
Copy link
Author

hosewiejacke commented Jun 3, 2019 via email

@hosewiejacke
Copy link
Author

hosewiejacke commented Jun 3, 2019 via email

@sharkdp
Copy link
Owner

sharkdp commented Jun 3, 2019

I'd be fine with printing the median (or even all quartiles) only at --export-json or --export-csv.

That's certainly something we can implement 👍

Yes, I get the outliers warnings often, e.g. at hyperfine --warmup 10 -r 100 "/bin/ls" "/usr/local/bin/exa" After quitting cron and most programs, and disabling CPU adjustment there are less warnings, but they're still there.

When benchmarking I/O heavy programs like ls, the biggest offenders for me are typically Spotify and Dropbox. If I close these two, I get far fewer outliers.

I guess, you can't completely get rid of them on a multitasking OS.

That's certainly true, yes.

I liked this paper (Besser Benchmarken) as a general introduction: https://pp.info.uni-karlsruhe.de/uploads/publikationen/bechberger16bachelorarbeit.pdf
This paper (The correct way to summarize benchmark results) objects the mean for normalized numbers: https://www.cse.unsw.edu.au/~cs9242/18/papers/Fleming_Wallace_86.pdf

Great, thank you for the references! I will have a look.

Don't get me wrong: I'm not complaining about hyperfine. IMO it's better then any other benchmarking tool as is.

No worries, I didn't get you wrong. I'm very grateful for every feedback I can get!

If there are good reasons for showing the median instead of the mean (by default, in the terminal output), I'm also open to changing this behavior.

I felt it may be worth to do it natively...

Adding the median is certainly something we can do within hyperfine, yes.

(TBH, I try to avoid external dependencies in python because of this: https://xkcd.com/1987/)

The scripts I provide are really more meant as a starting point for "power users" that want to further analyze the benchmark results. I also hate Pythons package management ecosystem, but what can I do? How would I avoid using something like matplotlib?

@hosewiejacke
Copy link
Author

hosewiejacke commented Jun 4, 2019 via email

@sharkdp
Copy link
Owner

sharkdp commented Jun 6, 2019

For ordinary plots gnuplot is not bad

I use gnuplot a lot (it's a love-hate relationship), but I think that's arguably even worse than depending on matplotlib. I'm pretty sure most Windows users do not have gnuplot installed, for example.

@sharkdp
Copy link
Owner

sharkdp commented Jun 8, 2019

Implemented in #176

@sharkdp
Copy link
Owner

sharkdp commented Jun 8, 2019

Released in v1.6.0.

@hosewiejacke
Copy link
Author

hosewiejacke commented Jun 8, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants