output median #171

hosewiejacke · 2019-06-01T17:40:04Z

I'd like to measure the median instead of the mean. Currently, I have a perl script that reads the JSON output and calculates the quartiles. But that's not very convenient. Do you think it's good to output the median (instead of, or in addition to) the mean?

Here are some points:

If cron fires during a benchmark, the mean might be off
It seems many academic papers prefer the median.

PS: Thanks for hyperfine. It's much more convenient than time.

sharkdp · 2019-06-02T10:26:53Z

Thank you for your feedback!

Do you think it's good to output the median (instead of, or in addition to) the mean?

I'm not sure. I want hyperfine to be a very easy-to-use tool. This also means that I would like the output to be easily readable and understandable.

If cron fires during a benchmark, the mean might be off

Yes. For this reason, hyperfine includes a statistical outlier detection which actually uses the median and the median average deviation (MAD) as robust estimators:

hyperfine/src/hyperfine/outlier_detection.rs

Lines 1 to 34 in b089eda

    
           /// A module for statistical outlier detection. 
        
           /// 
        
           /// References: 
        
           /// - Boris Iglewicz and David Hoaglin (1993), "Volume 16: How to Detect and Handle Outliers", 
        
           ///   The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, 
        
           ///   Ph.D., Editor. 
        
           use statistical::median; 
        
           /// Minimum modified Z-score for a datapoint to be an outlier. Here, 1.4826 is a factor that 
        
           /// converts the MAD to an estimator for the standard deviation. The second factor is the number 
        
           /// of standard deviations. 
        
           pub const OUTLIER_THRESHOLD: f64 = 1.4826 * 10.0; 
        
           /// Compute modifized Z-scores for a given sample. A (unmodified) Z-score is defined by 
        
           /// `(x_i - x_mean)/x_stddev` whereas the modified Z-score is defined by `|x_i - x_median|/MAD` 
        
           /// where MAD is the median average deviation. 
        
           /// 
        
           /// References: 
        
           /// - <https://en.wikipedia.org/wiki/Median_absolute_deviation> 
        
           pub fn modified_zscores(xs: &[f64]) -> Vec<f64> { 
        
               assert!(!xs.is_empty()); 
        
               // Compute sample median: 
        
               let x_median = median(xs); 
        
               // Compute the absolute deviations from the median: 
        
               let deviations: Vec<f64> = xs.iter().map(|x| (x - x_median).abs()).collect(); 
        
               // Compute median absolute deviation: 
        
               let mad = median(&deviations); 
        
               // Compute modified Z-scores (x_i - x_median) / MAD 
        
               xs.iter().map(|&x| (x - x_median) / mad).collect() 
        
           }

If a cron job would fire in the middle of the benchmark, the outlier detection would hopefully catch this. For example:

▶ hyperfine 'md5sum $(which hyperfine)' &; sleep 2 && stress --cpu 12 --timeout 1 --quiet
  Time (mean ± σ):      17.8 ms ±  19.4 ms    [User: 11.9 ms, System: 1.9 ms]
  Range (min … max):    12.2 ms … 122.0 ms    193 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark
  on a quiet PC without any interferences from other programs. It might help to
  use the '--warmup' or '--prepare' options.

Here, we run a benchmark for md5sum $(which hyperfine), but start 12 CPU workers after 2 seconds of benchmarking time. Not only does hyperfine warn you that there were statistical outliers, it's also easy to see from the large standard deviation and the large min … max range. (The mean without the additional overhead should be around 13.4 ms).

It seems many academic papers prefer the median.

Interesting. Did you see any arguments why the median is preferred over the mean? Just because it's robust against outliers? I would be great if you could include the references here.

sharkdp · 2019-06-02T10:31:21Z

Oh.. also: did you see the scripts/ folder?

https://github.com/sharkdp/hyperfine/tree/master/scripts

For the benchmark example above, the advanced_statistics.py script produces this output:

▶ python scripts/advanced_statistics.py md5sum.json 
Command 'md5sum $(which hyperfine)'
  mean:      0.018 s
  stddev:    0.020 s
  median:    0.013 s

  percentiles:
     P_05 .. P_95:    0.013 s .. 0.017 s
     P_25 .. P_75:    0.013 s .. 0.014 s  (IQR = 0.002 s)

(granted, this demonstrates quite well that the median is a robust measure)

hosewiejacke · 2019-06-03T07:41:35Z

Thank you for the thorough answer!

> Do you think it's good to output the median (instead of, or in addition to) the mean? I'm not sure. I want hyperfine to be a very easy-to-use tool. This also means that I would like the output to be easily readable and understandable.

That's the reason I like hyperfine. :) I'd be fine with printing the median (or even all quartiles) only at --export-json or --export-csv.

> If cron fires during a benchmark, the mean might be off Yes. For this reason, `hyperfine` include a statistical outlier detection:

Yes, I get the outliers warnings often, e.g. at hyperfine --warmup 10 -r 100 "/bin/ls" "/usr/local/bin/exa" After quitting cron and most programs, and disabling CPU adjustment there are less warnings, but they're still there. I guess, you can't completely get rid of them on a multitasking OS.

> It seems many academic papers prefer the median. Interesting. Did you see any arguments why the median is preferred over the mean? Just because it's robust against outliers? I would be great if you could include the references here.

I liked this paper (*Besser Benchmarken*) as a general introduction: https://pp.info.uni-karlsruhe.de/uploads/publikationen/bechberger16bachelorarbeit.pdf This paper (*The correct way to summarize benchmark results*) objects the mean for *normalized* numbers: https://www.cse.unsw.edu.au/~cs9242/18/papers/Fleming_Wallace_86.pdf Don't get me wrong: I'm not complaining about hyperfine. IMO it's better then any other benchmarking tool as is.

hosewiejacke · 2019-06-03T10:05:50Z

Oh.. also: did you see the `scripts/` folder?

Yes, but I felt it may be worth to do it natively... (TBH, I try to avoid external dependencies in python because of this: https://xkcd.com/1987/)

sharkdp · 2019-06-03T19:20:48Z

I'd be fine with printing the median (or even all quartiles) only at --export-json or --export-csv.

That's certainly something we can implement 👍

Yes, I get the outliers warnings often, e.g. at hyperfine --warmup 10 -r 100 "/bin/ls" "/usr/local/bin/exa" After quitting cron and most programs, and disabling CPU adjustment there are less warnings, but they're still there.

When benchmarking I/O heavy programs like ls, the biggest offenders for me are typically Spotify and Dropbox. If I close these two, I get far fewer outliers.

I guess, you can't completely get rid of them on a multitasking OS.

That's certainly true, yes.

I liked this paper (Besser Benchmarken) as a general introduction: https://pp.info.uni-karlsruhe.de/uploads/publikationen/bechberger16bachelorarbeit.pdf
This paper (The correct way to summarize benchmark results) objects the mean for normalized numbers: https://www.cse.unsw.edu.au/~cs9242/18/papers/Fleming_Wallace_86.pdf

Great, thank you for the references! I will have a look.

Don't get me wrong: I'm not complaining about hyperfine. IMO it's better then any other benchmarking tool as is.

No worries, I didn't get you wrong. I'm very grateful for every feedback I can get!

If there are good reasons for showing the median instead of the mean (by default, in the terminal output), I'm also open to changing this behavior.

I felt it may be worth to do it natively...

Adding the median is certainly something we can do within hyperfine, yes.

(TBH, I try to avoid external dependencies in python because of this: https://xkcd.com/1987/)

The scripts I provide are really more meant as a starting point for "power users" that want to further analyze the benchmark results. I also hate Pythons package management ecosystem, but what can I do? How would I avoid using something like matplotlib?

hosewiejacke · 2019-06-04T18:29:38Z

That's certainly something we can implement 👍

Great. I didn't manage to wrap my head around Rust's syntax yet. Otherwise I'd probably send you a PR.

The scripts I provide are really more meant as a starting point for "power users" that want to further analyze the benchmark results. I also hate Pythons package management ecosystem, but what can I do? How would I avoid using something like matplotlib?

For ordinary plots gnuplot is not bad, e.g.: $ cat | gnuplot -p -e "plot '<cat' with lines" 0 0 1 2 3 4 4 8

…

-p is for "persist". -e is for "execute".

sharkdp · 2019-06-06T21:39:36Z

For ordinary plots gnuplot is not bad

I use gnuplot a lot (it's a love-hate relationship), but I think that's arguably even worse than depending on matplotlib. I'm pretty sure most Windows users do not have gnuplot installed, for example.

sharkdp · 2019-06-08T11:57:46Z

Implemented in #176

sharkdp · 2019-06-08T13:42:37Z

Released in v1.6.0.

hosewiejacke · 2019-06-08T16:28:40Z

Thank you a lot!

sharkdp added the question Further information is requested label Jun 2, 2019

sharkdp mentioned this issue Jun 8, 2019

Add median run time to CSV and JSON export formats #176

Merged

sharkdp added the feature-request label Jun 8, 2019

sharkdp closed this as completed in #176 Jun 8, 2019

sharkdp mentioned this issue Aug 15, 2022

Suppress "statistical outliers detected" warning #528

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output median #171

output median #171

hosewiejacke commented Jun 1, 2019

sharkdp commented Jun 2, 2019 •

edited

Loading

sharkdp commented Jun 2, 2019 •

edited

Loading

hosewiejacke commented Jun 3, 2019 via email

hosewiejacke commented Jun 3, 2019 via email

sharkdp commented Jun 3, 2019

hosewiejacke commented Jun 4, 2019 via email

sharkdp commented Jun 6, 2019

sharkdp commented Jun 8, 2019

sharkdp commented Jun 8, 2019

hosewiejacke commented Jun 8, 2019 via email

output median #171

output median #171

Comments

hosewiejacke commented Jun 1, 2019

sharkdp commented Jun 2, 2019 • edited Loading

sharkdp commented Jun 2, 2019 • edited Loading

hosewiejacke commented Jun 3, 2019 via email

hosewiejacke commented Jun 3, 2019 via email

sharkdp commented Jun 3, 2019

hosewiejacke commented Jun 4, 2019 via email

sharkdp commented Jun 6, 2019

sharkdp commented Jun 8, 2019

sharkdp commented Jun 8, 2019

hosewiejacke commented Jun 8, 2019 via email

sharkdp commented Jun 2, 2019 •

edited

Loading

sharkdp commented Jun 2, 2019 •

edited

Loading