Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AFLplusplus declined in aggregate rank from 2020-03-09 to 2020-03-11 #113

Closed
jonathanmetzman opened this issue Mar 16, 2020 · 19 comments
Closed

Comments

@jonathanmetzman
Copy link
Contributor

2020-03-11 was the first experiment we ran with the new AFL++ fuzzer.py.

The aggregate ranking in this report claims AFL++ does about the same as AFL, whereas older reports, such as 2020-03-09, consistently show otherwise.

Since AFL++ already used afl-clang-fast in 2020-03-09, the aggregate change could be related to any of the following changes in fuzzer.py:

  1. Use of fidgety AFL.
  2. Use of libdislocator instead of ASAN.
  3. Use of laf-intel.
  4. Use of instrim.

I think changes causing the slip fall into two broad categories.

  1. The change indeed hurts performance and fuzzbench is correctly pointing this out.
  2. (possible for build time changes like instrumentation) FuzzBench's use of sancov means that new program behavior found using special instrumentation like laf-intel cannot be detected, maybe the current metrics FuzzBench uses are biased against these kinds of instrumentation.

I quickly eyeballed the two reports and put my observations about performance changes on different benchmaks in this table:

First Header Second Header
bloaty decline
curl significant decline
freetype decline
irssi significant decline (2nd->last place)
lcms very significant decline
libjpeg-turbo drop from top tier to next
libpcap increase from bottom tier to top tier
libxml decline from upper tier to lower
sqlite3_ossfuzz big increase to join qsym in top tier

I think libpcap and sqlite are some interesting success stories

@jonathanmetzman
Copy link
Contributor Author

CC @andreafioraldi

I think we can run an experiment where we turn off some of the 4 changes I mention above to find out what's responsible.
Obviously if it is an issue of FuzzBench being biased I will try to fix (though I think we may need to start using crashes to do this).

@vanhauser-thc
Copy link
Collaborator

the drop in coverage - my guess it is fidgety (-d) - the deterministic fuzzing is in my experience really good for initial path discovery and later in finding features and options. so skipping this is likely not beneficial.

the big improvements are likely based on laf-intel.

instrim results in less map collisions and more speed, libdislocator is also a speed gain. more speed + less collisions = better coverage, so they should result in a medium improvement.

@jonathanmetzman
Copy link
Contributor Author

jonathanmetzman commented Mar 16, 2020

I suspect mopt removal had a lot to do with it as well. Mopt seems to do terrifically on fuzzbench. I didn't realize this option was removed from AFL++ when I made this issue, but this definitely would be my first guess.

@jonathanmetzman
Copy link
Contributor Author

jonathanmetzman commented Mar 16, 2020

laf-intel is my guess for why libpcap_fuzz_both is so much better now. It has no seeds and all other AFL based fuzzers do very badly compared to every single non-afl based fuzzer (I assume because they can instrument compares whereas the other AFL configurations we were using don't do this).

I think if we run another experiment with mopt-afl++, it will show that it was the reason for the overall drop and we can close this issue.

@vanhauser-thc
Copy link
Collaborator

@andreafioraldi how about we remove the -d and see how this changes the performance? This would be a great opportunity to see the difference for that.

Also you should update the checkout as this makes a good change to instrim, e.g. to a57896a7ce7f2d51aad001234c0686e237eea54f

(I would propose you take authority on this fuzzer instance and I take the one on aflplusplus-mopt (if that PR is accepted) so we dont interfere with experiments

@jonathanmetzman when is the next run of fuzzbench?

@inferno-chromium
Copy link
Collaborator

we run stuff periodically every few days, https://www.fuzzbench.com/reports/index.html
next one should start in next day or so, can you propose a PR for this change and then we can take it as part of next one.

@vanhauser-thc
Copy link
Collaborator

lets hope #122 makes it into the next run then :)

@andreafioraldi
Copy link
Contributor

Hi,
my guess is not -d the problem (that is an improvement when dealing with structured inputs), but libdislocator.
If an application intensively allocates small chunks, libdislocator (and GWP-Asan too) are not the best choice cause two syscalls are performed each allocation.
This should explain also the high instability of different experiments on the same target.
I removed optional flags like -p and -L to evaluate the baseline AFL++ mutator, so the drop is coverage was expected (but not so high).
InsTrim was a bet, @vanhauser-thc suggested it, but I never evaluated it, so maybe it is decremental on some targets.

@vanhauser-thc
Copy link
Collaborator

@andreafioraldi the decline can be attributed to the removal of MOpt (#113 (comment))
Instrim is always an improvement, there is not case - or not possible - that it is worse to normal afl-clang-fast (except if we would have a bug in instrim :) )

@andreafioraldi
Copy link
Contributor

Given the new report, https://www.fuzzbench.com/reports/2020-03-19/index.html, my guess about dislocator was right. -d also is a improvement when having a single run.

@jonathanmetzman
Copy link
Contributor Author

Given the new report, https://www.fuzzbench.com/reports/2020-03-19/index.html, my guess about dislocator was right. -d also is a improvement when having a single run.

I agree, -d seems to be way better on fuzzbench.
I wonder if -d should be the default in AFL(++) WDYT?

@vanhauser-thc
Copy link
Collaborator

Given the new report, https://www.fuzzbench.com/reports/2020-03-19/index.html, my guess about dislocator was right. -d also is a improvement when having a single run.

I agree, -d seems to be way better on fuzzbench.
I wonder if -d should be the default in AFL(++) WDYT?

in a real effective fuzzing campaign you have a master and several slaves. By default deterministic fuzzing is only done on the master which is the optimal behavious IMHO.
If we change this we a ) need another command line parameter to allow to enable deterministic fuzzing and b) enable it automatically for the -M master switch.
I will put it as a reminder in your issue where we collect potential changes in default behavior: AFLplusplus/AFLplusplus#185

@vanhauser-thc
Copy link
Collaborator

@jonathanmetzman how about we add an afl fuzzer without -d and then compare the stock google afl's against each other? of course this makes more sense for targets that have a corpus

@andreafioraldi
Copy link
Contributor

there is afl_deterministic

@inferno-chromium
Copy link
Collaborator

yes right, afl_deterministic is that one.

Since performance of afl_deterministic, we removed it and -d is now the default for all afl based fuzzers.

@andreafioraldi
Copy link
Contributor

andreafioraldi commented Mar 27, 2020

I feel also that, due to the good initial corpus, roadblocks bypassing techniques such as laf-intel may be decremental. There is also a benchmark that is super strange in this last run:

libpcap_fuzz_both_ranking

aflplusplus_mopt is -L 0 -p fast that this MOpt + AFLFast, two fuzzers that performed very bad on this target

@inferno-chromium
Copy link
Collaborator

Closing this issue, free free to file tracking issue for specific bugs.

@vanhauser-thc
Copy link
Collaborator

@andreafioraldi aflplusplus_mopt also has laf-intel and that is why its good - (it does not have a corpus) and MOpt and aflfast dont have laf-intel

@jonathanmetzman
Copy link
Contributor Author

Given the new report, https://www.fuzzbench.com/reports/2020-03-19/index.html, my guess about dislocator was right. -d also is a improvement when having a single run.

I agree, -d seems to be way better on fuzzbench.
I wonder if -d should be the default in AFL(++) WDYT?

in a real effective fuzzing campaign you have a master and several slaves. By default deterministic fuzzing is only done on the master which is the optimal behavious IMHO.

Maybe deterministic steps can be defaulted to if -M is used otherwise not default (might be too confusing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants