Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divide by zero and other issues when detecting outliers #329

Closed
tdulcet opened this issue Oct 9, 2020 · 2 comments
Closed

Divide by zero and other issues when detecting outliers #329

tdulcet opened this issue Oct 9, 2020 · 2 comments
Labels
bug good first issue Good for newcomers

Comments

@tdulcet
Copy link

tdulcet commented Oct 9, 2020

When calculating the modified Z-scores, if many of the times are the same, the MAD will be zero (see here), which means this line will divide by zero:

xs.iter().map(|&x| (x - x_median) / mad).collect()

The above link has a simple example that you can add to your test suit to reproduce this issue.


Additionally, the test suite correctly takes the absolute value of the modified Z-scores:

.filter(|&&s| s.abs() > OUTLIER_THRESHOLD)

However, the program does not:
let scores = modified_zscores(&times_real);
if scores[0] > OUTLIER_THRESHOLD {
warnings.push(Warnings::SlowInitialRun(times_real[0]));
} else if scores.iter().any(|&s| s > OUTLIER_THRESHOLD) {

Which means that it is missing 50% of the outliers.

I found these issues while testing hyperfine and implementing my Bash port. Thank you for making this wonderful tool!

@sharkdp
Copy link
Owner

sharkdp commented Oct 9, 2020

Thank you very much for your feedback and for this detailed bug report!

which means this line will divide by zero

I guess this could be related to #319 as well, but I did not take a closer look yet. Definitely looks like a problem.

[…] However, the program does not […] Which means that it is missing 50% of the outliers.

Oh that is bad. The comment of the modified_zscores function also seems wrong because it includes the absolute value ("whereas the modified Z-score is defined by |x_i - x_median|/MAD"), which would mean that everything is fine. But the implementation returns (x_i - x_median) / MAD.

How could this have gone unnoticed for so long :-/. The only comfort is that the outliers which were not detected are "fast outliers" - which are quite uncommon in practice.

@sharkdp
Copy link
Owner

sharkdp commented Oct 16, 2020

Fixed in hyperfine v1.11.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants