Support specifying which fuzzers to include in the report #79

lszekeres · 2020-03-09T19:29:52Z

One often wants to see a report that compares only two or just a few fuzzers. E.g., compare only

afl with libfuzzer, or
two different versions of the same fuzzer, or
only fuzzers with dynamic binary instrumentation, etc.

This can be supported by adding a --fuzzers flag to the report generator, where users can list the fuzzers they would like the report to compare. We should allow specifying fuzzers with different versions, by either tagging them with their version number or perhaps with an experiment name. E.g., generate_report --fuzzers afl:v2.5 afl:v2.4, or generate_report --fuzzers afl:experiment-2020-01-15 afl:experiment-2020-02-15.

This is also useful to have because specifying a smaller/different subset of fuzzers can affect the result of the top level statistical analysis (Friedman test, critical difference plot). Further, when we only compare two fuzzer, we can even do more precise statistical tests with more specific visualizations. On the benchmark level, we don't need a pairwise comparison matrix of statistical significance, since we only have a single pair. On the experiment level, we don't need to use Friedman/Nemenyi test (which compares more than two fuzzers, and is rather conservative), but we can use the Wilcoxon signed-rank test, which is specifically designed for comparing two things (i.e., matched samples for the different benchmarks).

The text was updated successfully, but these errors were encountered:

jonathanmetzman · 2020-03-25T19:24:58Z

This issue is fixed.

lszekeres · 2020-03-25T20:39:47Z

This issue is fixed.

Not everything is fixed yet. There's also:

Support version/experiment tagging, e.g.,

--fuzzers afl:v2.5 afl:v2.4.

and

Different report when we're comparing only two things:

when we only compare two fuzzer, we can even do more precise statistical tests with more specific visualizations. On the benchmark level, we don't need a pairwise comparison matrix of statistical significance, since we only have a single pair. On the experiment level, we don't need to use Friedman/Nemenyi test (which compares more than two fuzzers, and is rather conservative), but we can use the Wilcoxon signed-rank test, which is specifically designed for comparing two things (i.e., matched samples for the different benchmarks).

Should we reopen this issue, or would you prefer creating separate issues for these?

jonathanmetzman · 2020-03-25T21:25:54Z

My bad, reopening is fine.

inferno-chromium added the enhancement New feature or request label Mar 9, 2020

lszekeres mentioned this issue Mar 9, 2020

Support tracks to compare classes of fuzzers (binary-only, source-only, hybrids, etc) #76

Open

jonathanmetzman mentioned this issue Mar 9, 2020

[generate_report] Support fuzzer selection with -f option #80

Merged

jonathanmetzman closed this as completed Mar 25, 2020

jonathanmetzman reopened this Mar 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support specifying which fuzzers to include in the report #79

Support specifying which fuzzers to include in the report #79

lszekeres commented Mar 9, 2020

jonathanmetzman commented Mar 25, 2020

lszekeres commented Mar 25, 2020

jonathanmetzman commented Mar 25, 2020

Support specifying which fuzzers to include in the report #79

Support specifying which fuzzers to include in the report #79

Comments

lszekeres commented Mar 9, 2020

jonathanmetzman commented Mar 25, 2020

lszekeres commented Mar 25, 2020

jonathanmetzman commented Mar 25, 2020