AFLplusplus declined in aggregate rank from 2020-03-09 to 2020-03-11 #113

jonathanmetzman · 2020-03-16T21:52:28Z

2020-03-11 was the first experiment we ran with the new AFL++ fuzzer.py.

The aggregate ranking in this report claims AFL++ does about the same as AFL, whereas older reports, such as 2020-03-09, consistently show otherwise.

Since AFL++ already used afl-clang-fast in 2020-03-09, the aggregate change could be related to any of the following changes in fuzzer.py:

Use of fidgety AFL.
Use of libdislocator instead of ASAN.
Use of laf-intel.
Use of instrim.

I think changes causing the slip fall into two broad categories.

The change indeed hurts performance and fuzzbench is correctly pointing this out.
(possible for build time changes like instrumentation) FuzzBench's use of sancov means that new program behavior found using special instrumentation like laf-intel cannot be detected, maybe the current metrics FuzzBench uses are biased against these kinds of instrumentation.

I quickly eyeballed the two reports and put my observations about performance changes on different benchmaks in this table:

First Header	Second Header
bloaty	decline
curl	significant decline
freetype	decline
irssi	significant decline (2nd->last place)
lcms	very significant decline
libjpeg-turbo	drop from top tier to next
libpcap	increase from bottom tier to top tier
libxml	decline from upper tier to lower
sqlite3_ossfuzz	big increase to join qsym in top tier

I think libpcap and sqlite are some interesting success stories

jonathanmetzman · 2020-03-16T21:55:29Z

CC @andreafioraldi

I think we can run an experiment where we turn off some of the 4 changes I mention above to find out what's responsible.
Obviously if it is an issue of FuzzBench being biased I will try to fix (though I think we may need to start using crashes to do this).

vanhauser-thc · 2020-03-16T23:38:23Z

the drop in coverage - my guess it is fidgety (-d) - the deterministic fuzzing is in my experience really good for initial path discovery and later in finding features and options. so skipping this is likely not beneficial.

the big improvements are likely based on laf-intel.

instrim results in less map collisions and more speed, libdislocator is also a speed gain. more speed + less collisions = better coverage, so they should result in a medium improvement.

jonathanmetzman · 2020-03-16T23:53:03Z

I suspect mopt removal had a lot to do with it as well. Mopt seems to do terrifically on fuzzbench. I didn't realize this option was removed from AFL++ when I made this issue, but this definitely would be my first guess.

jonathanmetzman · 2020-03-16T23:56:17Z

laf-intel is my guess for why libpcap_fuzz_both is so much better now. It has no seeds and all other AFL based fuzzers do very badly compared to every single non-afl based fuzzer (I assume because they can instrument compares whereas the other AFL configurations we were using don't do this).

I think if we run another experiment with mopt-afl++, it will show that it was the reason for the overall drop and we can close this issue.

vanhauser-thc · 2020-03-17T07:23:46Z

@andreafioraldi how about we remove the -d and see how this changes the performance? This would be a great opportunity to see the difference for that.

Also you should update the checkout as this makes a good change to instrim, e.g. to a57896a7ce7f2d51aad001234c0686e237eea54f

(I would propose you take authority on this fuzzer instance and I take the one on aflplusplus-mopt (if that PR is accepted) so we dont interfere with experiments

@jonathanmetzman when is the next run of fuzzbench?

inferno-chromium · 2020-03-17T14:58:14Z

we run stuff periodically every few days, https://www.fuzzbench.com/reports/index.html
next one should start in next day or so, can you propose a PR for this change and then we can take it as part of next one.

vanhauser-thc · 2020-03-18T09:01:55Z

lets hope #122 makes it into the next run then :)

andreafioraldi · 2020-03-18T16:43:05Z

Hi,
my guess is not -d the problem (that is an improvement when dealing with structured inputs), but libdislocator.
If an application intensively allocates small chunks, libdislocator (and GWP-Asan too) are not the best choice cause two syscalls are performed each allocation.
This should explain also the high instability of different experiments on the same target.
I removed optional flags like -p and -L to evaluate the baseline AFL++ mutator, so the drop is coverage was expected (but not so high).
InsTrim was a bet, @vanhauser-thc suggested it, but I never evaluated it, so maybe it is decremental on some targets.

vanhauser-thc · 2020-03-18T16:54:21Z

@andreafioraldi the decline can be attributed to the removal of MOpt (#113 (comment))
Instrim is always an improvement, there is not case - or not possible - that it is worse to normal afl-clang-fast (except if we would have a bug in instrim :) )

andreafioraldi · 2020-03-26T08:50:02Z

Given the new report, https://www.fuzzbench.com/reports/2020-03-19/index.html, my guess about dislocator was right. -d also is a improvement when having a single run.

jonathanmetzman · 2020-03-26T23:40:40Z

Given the new report, https://www.fuzzbench.com/reports/2020-03-19/index.html, my guess about dislocator was right. -d also is a improvement when having a single run.

I agree, -d seems to be way better on fuzzbench.
I wonder if -d should be the default in AFL(++) WDYT?

vanhauser-thc · 2020-03-27T14:13:38Z

Given the new report, https://www.fuzzbench.com/reports/2020-03-19/index.html, my guess about dislocator was right. -d also is a improvement when having a single run.

I agree, -d seems to be way better on fuzzbench.
I wonder if -d should be the default in AFL(++) WDYT?

in a real effective fuzzing campaign you have a master and several slaves. By default deterministic fuzzing is only done on the master which is the optimal behavious IMHO.
If we change this we a ) need another command line parameter to allow to enable deterministic fuzzing and b) enable it automatically for the -M master switch.
I will put it as a reminder in your issue where we collect potential changes in default behavior: AFLplusplus/AFLplusplus#185

vanhauser-thc · 2020-03-27T14:15:36Z

@jonathanmetzman how about we add an afl fuzzer without -d and then compare the stock google afl's against each other? of course this makes more sense for targets that have a corpus

andreafioraldi · 2020-03-27T14:18:43Z

there is afl_deterministic

inferno-chromium · 2020-03-27T14:22:35Z

yes right, afl_deterministic is that one.

Since performance of afl_deterministic, we removed it and -d is now the default for all afl based fuzzers.

andreafioraldi · 2020-03-27T15:32:01Z

I feel also that, due to the good initial corpus, roadblocks bypassing techniques such as laf-intel may be decremental. There is also a benchmark that is super strange in this last run:

aflplusplus_mopt is -L 0 -p fast that this MOpt + AFLFast, two fuzzers that performed very bad on this target

inferno-chromium · 2020-03-27T15:42:44Z

Closing this issue, free free to file tracking issue for specific bugs.

vanhauser-thc · 2020-03-27T17:12:13Z

@andreafioraldi aflplusplus_mopt also has laf-intel and that is why its good - (it does not have a corpus) and MOpt and aflfast dont have laf-intel

jonathanmetzman · 2020-03-27T18:34:06Z

Given the new report, https://www.fuzzbench.com/reports/2020-03-19/index.html, my guess about dislocator was right. -d also is a improvement when having a single run.

I agree, -d seems to be way better on fuzzbench.
I wonder if -d should be the default in AFL(++) WDYT?

in a real effective fuzzing campaign you have a master and several slaves. By default deterministic fuzzing is only done on the master which is the optimal behavious IMHO.

Maybe deterministic steps can be defaulted to if -M is used otherwise not default (might be too confusing).

inferno-chromium closed this as completed Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AFLplusplus declined in aggregate rank from 2020-03-09 to 2020-03-11 #113

AFLplusplus declined in aggregate rank from 2020-03-09 to 2020-03-11 #113

jonathanmetzman commented Mar 16, 2020

jonathanmetzman commented Mar 16, 2020

vanhauser-thc commented Mar 16, 2020

jonathanmetzman commented Mar 16, 2020 •

edited

Loading

jonathanmetzman commented Mar 16, 2020 •

edited

Loading

vanhauser-thc commented Mar 17, 2020

inferno-chromium commented Mar 17, 2020

vanhauser-thc commented Mar 18, 2020

andreafioraldi commented Mar 18, 2020

vanhauser-thc commented Mar 18, 2020

andreafioraldi commented Mar 26, 2020

jonathanmetzman commented Mar 26, 2020

vanhauser-thc commented Mar 27, 2020

vanhauser-thc commented Mar 27, 2020

andreafioraldi commented Mar 27, 2020

inferno-chromium commented Mar 27, 2020

andreafioraldi commented Mar 27, 2020 •

edited

Loading

inferno-chromium commented Mar 27, 2020

vanhauser-thc commented Mar 27, 2020

jonathanmetzman commented Mar 27, 2020

AFLplusplus declined in aggregate rank from 2020-03-09 to 2020-03-11 #113

AFLplusplus declined in aggregate rank from 2020-03-09 to 2020-03-11 #113

Comments

jonathanmetzman commented Mar 16, 2020

jonathanmetzman commented Mar 16, 2020

vanhauser-thc commented Mar 16, 2020

jonathanmetzman commented Mar 16, 2020 • edited Loading

jonathanmetzman commented Mar 16, 2020 • edited Loading

vanhauser-thc commented Mar 17, 2020

inferno-chromium commented Mar 17, 2020

vanhauser-thc commented Mar 18, 2020

andreafioraldi commented Mar 18, 2020

vanhauser-thc commented Mar 18, 2020

andreafioraldi commented Mar 26, 2020

jonathanmetzman commented Mar 26, 2020

vanhauser-thc commented Mar 27, 2020

vanhauser-thc commented Mar 27, 2020

andreafioraldi commented Mar 27, 2020

inferno-chromium commented Mar 27, 2020

andreafioraldi commented Mar 27, 2020 • edited Loading

inferno-chromium commented Mar 27, 2020

vanhauser-thc commented Mar 27, 2020

jonathanmetzman commented Mar 27, 2020

jonathanmetzman commented Mar 16, 2020 •

edited

Loading

jonathanmetzman commented Mar 16, 2020 •

edited

Loading

andreafioraldi commented Mar 27, 2020 •

edited

Loading