You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One often wants to see a report that compares only two or just a few fuzzers. E.g., compare only
afl with libfuzzer, or
two different versions of the same fuzzer, or
only fuzzers with dynamic binary instrumentation, etc.
This can be supported by adding a --fuzzers flag to the report generator, where users can list the fuzzers they would like the report to compare. We should allow specifying fuzzers with different versions, by either tagging them with their version number or perhaps with an experiment name. E.g., generate_report --fuzzers afl:v2.5 afl:v2.4, or generate_report --fuzzers afl:experiment-2020-01-15 afl:experiment-2020-02-15.
This is also useful to have because specifying a smaller/different subset of fuzzers can affect the result of the top level statistical analysis (Friedman test, critical difference plot). Further, when we only compare two fuzzer, we can even do more precise statistical tests with more specific visualizations. On the benchmark level, we don't need a pairwise comparison matrix of statistical significance, since we only have a single pair. On the experiment level, we don't need to use Friedman/Nemenyi test (which compares more than two fuzzers, and is rather conservative), but we can use the Wilcoxon signed-rank test, which is specifically designed for comparing two things (i.e., matched samples for the different benchmarks).
The text was updated successfully, but these errors were encountered:
Different report when we're comparing only two things:
when we only compare two fuzzer, we can even do more precise statistical tests with more specific visualizations. On the benchmark level, we don't need a pairwise comparison matrix of statistical significance, since we only have a single pair. On the experiment level, we don't need to use Friedman/Nemenyi test (which compares more than two fuzzers, and is rather conservative), but we can use the Wilcoxon signed-rank test, which is specifically designed for comparing two things (i.e., matched samples for the different benchmarks).
Should we reopen this issue, or would you prefer creating separate issues for these?
One often wants to see a report that compares only two or just a few fuzzers. E.g., compare only
This can be supported by adding a --fuzzers flag to the report generator, where users can list the fuzzers they would like the report to compare. We should allow specifying fuzzers with different versions, by either tagging them with their version number or perhaps with an experiment name. E.g.,
generate_report --fuzzers afl:v2.5 afl:v2.4
, orgenerate_report --fuzzers afl:experiment-2020-01-15 afl:experiment-2020-02-15
.This is also useful to have because specifying a smaller/different subset of fuzzers can affect the result of the top level statistical analysis (Friedman test, critical difference plot). Further, when we only compare two fuzzer, we can even do more precise statistical tests with more specific visualizations. On the benchmark level, we don't need a pairwise comparison matrix of statistical significance, since we only have a single pair. On the experiment level, we don't need to use Friedman/Nemenyi test (which compares more than two fuzzers, and is rather conservative), but we can use the Wilcoxon signed-rank test, which is specifically designed for comparing two things (i.e., matched samples for the different benchmarks).
The text was updated successfully, but these errors were encountered: