Experiment Announcement Thread #249

jonathanmetzman · 2020-04-22T04:54:42Z

New experiments will be announced on this issue as discussed on #205.
I think this is a good place to discuss experiments as well.

jonathanmetzman · 2020-04-22T05:01:42Z

https://www.fuzzbench.com/reports/2020-04-20-aflplusplus/index.html contains afl++ variants.
The experiment is currently running, most of the trials have completed, though some started running earlier today, and coverage measurement still needs to finish.

I just started another experiment https://www.fuzzbench.com/reports/2020-04-21/index.html that includes every fuzzer except for the afl++ variants benchmarked above.

Once both of these experiments complete I will make a combined report.

Note that coverage measuring for the previous experiment https://www.fuzzbench.com/reports/2020-04-14/index.html didn't complete because of a bug I have fixed.

vanhauser-thc · 2020-04-22T08:53:38Z

@jonathanmetzman when do you expect the 2020-04-20-aflplusplus one to be finished with the measurement? and is the title then updated to not mention "incomplete"?

no need to combine this with 2020-04-21 btw, this will just make the overall report graphs unreadable with so many entries :)

jonathanmetzman · 2020-04-22T17:28:19Z

@jonathanmetzman when do you expect the 2020-04-20-aflplusplus one to be finished with the measurement? and is the title then updated to not mention "incomplete"?

Title gets updated automatically.
Hard to say when it will complete, I expect a day or two from now.
The recent fixes we made and removal of irssi should reduce measurement time considerably.
Unfortuantely, afl++ experiment started before that.

no need to combine this with 2020-04-21 btw, this will just make the overall report graphs unreadable with so many entries :)

OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well).

vanhauser-thc · 2020-04-22T18:18:03Z

OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well).

that would be good however the new run does not have irssi target in there ... does that work and not f*ck up the benchmark calculation?

jonathanmetzman · 2020-04-22T18:54:06Z

I can exclude that benchmark.

jonathanmetzman · 2020-04-28T18:27:24Z

https://www.fuzzbench.com/reports/2020-04-20-aflplusplus/index.html contains afl++ variants.
The experiment is currently running, most of the trials have completed, though some started running earlier today, and coverage measurement still needs to finish.

I just started another experiment https://www.fuzzbench.com/reports/2020-04-21/index.html that includes every fuzzer except for the afl++ variants benchmarked above.

Once both of these experiments complete I will make a combined report.

Both of these experiments have finished measuring. I think we are going to prioritize speeding up measuring.

jonathanmetzman · 2020-04-28T19:05:46Z

OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well).

that would be good however the new run does not have irssi target in there ... does that work and not f*ck up the benchmark calculation?

Made this combination report: https://www.fuzzbench.com/reports/2020-04-21-and-20-aflplusplus/index.html
It contains all benchmarks (except woff and irssi) and all afl++ based fuzzers from 2020-04-21 and 2020-04-20-aflplusplus

vanhauser-thc · 2020-04-28T19:35:17Z

@jonathanmetzman thanks! whenever you can start the next batch - that one will be very interesting, especially those with increased map sizes :)

jonathanmetzman · 2020-04-28T19:38:55Z

@jonathanmetzman thanks! whenever you can start the next batch - that one will be very interesting, especially those with increased map sizes :)

Not exactly sure when this will be but probably by Friday, I'm trying to work on some long term improvement and planning this week.

jonathanmetzman · 2020-05-02T02:36:17Z

I started experiments for AFL++ and fastcgs with and without huge page tables.

jonathanmetzman · 2020-05-04T20:12:39Z

I started experiments for AFL++ and fastcgs with and without huge page tables.

Experiments are done.
https://www.fuzzbench.com/reports/2020-05-01-fastcgs/index.html compared versions of fastcgs with support for huge page tables and without. CC @alifahmed

https://www.fuzzbench.com/reports/2020-05-01-aflplusplus-1/index.html and https://www.fuzzbench.com/reports/2020-05-01-aflplusplus-2/index.html compared afl++ variants.

jonathanmetzman · 2020-05-05T00:03:07Z

I'm starting to work on using resources more intelligently in experiments (particularly for fuzzer variants or features that are in development).
The next experiments I will test using preemptible instances for trials.
The differences you will see here are:

Experiments will be 23 hours (since preemptible instances cannot last longer than 24 hours and we need time to start up).
Some (generally, 5-15% according to the docs) trials will not complete.
Assuming that 3 trials do not complete for a fuzzer-benchmark, it only means comparisons between it and other fuzzers will be slightly less significant.
We may be able to fix this in the future by restarting preempted trials, but for now we won't.

jonathanmetzman · 2020-05-05T02:26:26Z

Running a new experiment with the main fuzzers and the new benchmark.
The experiment is using preemptible VMs
https://www.fuzzbench.com/reports/2020-05-04-preempt-new-bench/index.html

EDIT:

I'll discuss the results from that experiment here to reduce spam on this thread.

inferno-chromium · 2020-05-13T04:37:07Z

https://www.fuzzbench.com/reports/2020-05-11/index.html

jonathanmetzman · 2020-05-26T23:05:56Z

A 15 trial full experiment: https://www.fuzzbench.com/reports/2020-05-24/index.html

jonathanmetzman · 2020-05-31T20:16:05Z

20 trial experiment comparing aflplusplus_optimal, aflplusplus_shmem, and ankou (with a buggy integration): https://www.fuzzbench.com/reports/2020-05-28/index.html

jonathanmetzman · 2020-06-16T05:47:56Z

https://www.fuzzbench.com/reports/2020-06-12/index.html is an experiment with libfuzzer_nocmp, aflcc, manul and afl++ (and its variants) combined with 2020-05-24.

vanhauser-thc · 2020-07-16T20:58:21Z

The results of the new experiment https://www.fuzzbench.com/reports/2020-07-17/index.html currently look very different to previous runs.
What has changed? is it using the new coverage? or are the benchmark targets changed?

was there already an assessment which of the two coverage methods is better or what the advantages/disadvantages are?
thanks!

jonathanmetzman · 2020-07-16T21:08:06Z

What did you notice that's different in 2020-07-17? I stopped that experiment to run 2020-07-13 (the AFL++ experiment I was supposed to run for you a few days ago, but didn't accidentally because of a bug in the service code, 2020-07-13 running right now).

We're only part way in 2020-07-17 but I ran it to make sure that the results are the same as usual. 2020-07-17 was run using #509 which will allow non-FuzzBench maintainers to easily contribute benchmarks from OSS-Fuzz. In theory the results should be the same even though the builds are different. But any differences are very helpful for me.

jonathanmetzman · 2020-07-16T21:19:19Z

I moved 2020-07-17 here since I don't consider it an official experiment.

was there already an assessment which of the two coverage methods is better or what the advantages/disadvantages are?
thanks!

Note that 2020-07-17 used sancov, the current cov implementation. Only clang-cov-test used it. So far it looks totally fine to replace sancov with clang cov. But we're still investigating.

vanhauser-thc · 2020-07-16T22:31:42Z

What did you notice that's different in 2020-07-17?

honggfuzz is not in the top list and instead fastcgs_lm is ... both unusual. Also aflplusplus_ctx_nozerosingle has no reason to be at place 3 and should rather be around ankou.
sure it is only running for 1/3 of the time, but usually there are not dramatic improvements or degration over the whole benchmark after 6 hours.

vanhauser-thc · 2020-07-17T07:00:56Z

Also 2020-07-17 is not updated anymore since moved

jonathanmetzman · 2020-07-17T19:21:10Z

Also 2020-07-17 is not updated anymore since moved

I stopped that experiment to run yours.

alifahmed · 2020-08-15T19:34:49Z

There is a very noticeable bump in edge coverage between all the reports before 07/25 and after 08/03.

https://www.fuzzbench.com/reports/2020-07-25/index.html
https://www.fuzzbench.com/reports/2020-08-03/index.html

I am wondering what caused it?

vanhauser-thc · 2020-08-15T19:50:37Z

true! and systemd has much less now which is weird

inferno-chromium · 2020-08-15T21:07:00Z

There is a very noticeable bump in edge coverage between all the reports before 07/25 and after 08/03.

https://www.fuzzbench.com/reports/2020-07-25/index.html
https://www.fuzzbench.com/reports/2020-08-03/index.html

I am wondering what caused it?

We moved to using clang code coverage instead of sancov.

vanhauser-thc · 2020-08-15T21:31:05Z

what is measured now? edges? basic blocks? lines of codes? instructions?

inferno-chromium · 2020-08-15T21:53:03Z

what is measured now? edges? basic blocks? lines of codes? instructions?

It is called regions in clang code coverage. https://llvm.org/docs/CoverageMappingFormat.html#id14. It is character level precision, e.g. multiple ones in just one line “return x || y && z”.

alifahmed · 2020-08-15T22:57:20Z

Thanks! I think the report graphs should indicate this change to avoid confusion. Currently the graphs still says "Reached edge coverage". Should it not be something like region/code coverage?

inferno-chromium · 2020-08-15T23:13:32Z

Thanks! I think the report graphs should indicate this change to avoid confusion. Currently the graphs still says "Reached edge coverage". Should it not be something like region/code coverage?

I think we discussed it a little internally, i think some people felt that just region can be confusing and community is more accustomed to edges for coverage. @lszekeres - can you revisit and fix this confusion sometime next week ?

lszekeres · 2020-08-16T18:22:47Z

Thanks! I think the report graphs should indicate this change to avoid confusion. Currently the graphs still says "Reached edge coverage". Should it not be something like region/code coverage?

I think we discussed it a little internally, i think some people felt that just region can be confusing and community is more accustomed to edges for coverage. @lszekeres - can you revisit and fix this confusion sometime next week ?

Yes, let me clarify this in the report (along with the other things we discussed to make it more clear).

inferno-chromium · 2020-08-16T18:53:37Z

Heads up guys, there will be some hiccups as we transition to clang code coverage, have differential coverage metrics (see #657) and stabilize things.

mboehme · 2020-09-13T05:05:39Z

[LibFuzzer / Entropic] No Restart

PR: Add libfuzzer and entropic norestart variants #726: Disable LF and Entropic restart with smaller corpus every 5 minutes when -fork=1
Report: 2020-09-11: Results suggest disabling restart is effective.
CC: @dokyungs and @kcc.

Details

Currently, in fork mode, LibFuzzer runs a new instance (job) every five minutes and then merges the generate corpus into the main corpus. The new LF instance (job) starts with a very small subset of the main corpus.

In this commit, in fork mode, LibFuzzer runs a new instance only when a crash is encountered and merges the current corpus into the main corpus every five minutes. The new LF instance (job) always starts with the entire main corpus.

Some seed files might have been deleted before the merge. So, there is a thread-unsafe quick check whether the file still exists before it is merged.

Results

Normalized coverage achieved across all benchmarks (Log-Time)

Critical Difference @1 hour

Critical Difference @4 hours

Critical Difference @23 hours

mboehme · 2020-09-29T06:07:54Z

[AFL++] Boosting the fast and coe schedules

(Might prepare another experiment and update here)

PRs Requesting experiment for two new AFL++ schedules #749 and New AFL++ power schedules (variants of FAST and COE) #765: Determine performance improvements after fixing a bug in AFLFast schedules. Evaluate new variants of FAST and COE schedules versus EXPLOIT and EXPLORE baselines, and other general boosting strategies for AFL++.
Report 2020-09-26: Results suggest that (i) the bug fix is effective, (ii) FAST3, COE3, and FAST4 are the most effective improvements of FAST and COE, and (iii) other explored boosting strategies for AFL++ are ineffective.
CC @vanhauser-thc

Details

In terms of normalized coverage (mean of means across all benchmarks), FAST3 consistently performs best.

For most subjects, the new schedules outperform the baseline schedules. FAST3 performs really well on bloaty, lcms, and proj4. FAST4 performs really well on the new benchmarks Nginx and Libxslt. On LibXSLT COE3 quickly catches up. Also, EXPLOIT is often worst but performs exceptionally well on vorbis, wolff, curl, and freetype2.

Note: For the ragged graphs (e.g., zlib, libxslt, libjpeg, openthread, vorbis, re2, and lcms), if we had more trials, we wouldn't see such random jumps but a smoother progress. In this data, there are 20 trials for all time stamps, benchmarks, and fuzzers. For some subjects, more trials would be beneficial.

Log Time

Linear Time

inferno-chromium · 2020-09-29T16:48:19Z

@mboehme - regenerated https://www.fuzzbench.com/reports/experimental/2020-09-26/index.html with just that experiment results, no merges.

mboehme · 2020-09-29T23:54:22Z

Thanks @inferno-chromium!

inferno-chromium · 2020-10-13T23:25:01Z

Experiment that uses saturated OSS-Fuzz corpora [from months/years of continuous fuzzing] is now available - https://www.fuzzbench.com/reports/2020-10-11-saturated-ossfuzz-corpus/index.html

jonathanmetzman pinned this issue Apr 22, 2020

jonathanmetzman mentioned this issue May 5, 2020

Initial contribution from SVRWB benchmark (take 2) #259

Merged

mboehme mentioned this issue Sep 23, 2020

Cleanup Libfuzzer, Entropic and AFL++ variants #764

Merged

mboehme mentioned this issue Sep 29, 2020

Patching and improving AFLFast schedules. AFLplusplus/AFLplusplus#568

Merged

jonathanmetzman unpinned this issue Feb 27, 2022

Experiment Announcement Thread #249

Experiment Announcement Thread #249

Comments

jonathanmetzman commented Apr 22, 2020 • edited Loading

jonathanmetzman commented Apr 22, 2020

vanhauser-thc commented Apr 22, 2020

jonathanmetzman commented Apr 22, 2020

vanhauser-thc commented Apr 22, 2020

jonathanmetzman commented Apr 22, 2020

jonathanmetzman commented Apr 28, 2020

jonathanmetzman commented Apr 28, 2020

vanhauser-thc commented Apr 28, 2020

jonathanmetzman commented Apr 28, 2020

jonathanmetzman commented May 2, 2020

jonathanmetzman commented May 4, 2020

jonathanmetzman commented May 5, 2020 • edited Loading

jonathanmetzman commented May 5, 2020 • edited Loading

inferno-chromium commented May 13, 2020

jonathanmetzman commented May 26, 2020

jonathanmetzman commented May 31, 2020 • edited Loading

jonathanmetzman commented Jun 16, 2020

vanhauser-thc commented Jul 16, 2020

jonathanmetzman commented Jul 16, 2020 • edited Loading

jonathanmetzman commented Jul 16, 2020

vanhauser-thc commented Jul 16, 2020

vanhauser-thc commented Jul 17, 2020

jonathanmetzman commented Jul 17, 2020

alifahmed commented Aug 15, 2020

vanhauser-thc commented Aug 15, 2020

inferno-chromium commented Aug 15, 2020

vanhauser-thc commented Aug 15, 2020

inferno-chromium commented Aug 15, 2020

alifahmed commented Aug 15, 2020

inferno-chromium commented Aug 15, 2020 • edited Loading

lszekeres commented Aug 16, 2020

inferno-chromium commented Aug 16, 2020

mboehme commented Sep 13, 2020 • edited Loading

[LibFuzzer / Entropic] No Restart

Details

Results

mboehme commented Sep 29, 2020 • edited Loading

[AFL++] Boosting the fast and coe schedules

Details

Log Time

Linear Time

inferno-chromium commented Sep 29, 2020

mboehme commented Sep 29, 2020

inferno-chromium commented Oct 13, 2020

jonathanmetzman commented Apr 22, 2020 •

edited

Loading

jonathanmetzman commented May 5, 2020 •

edited

Loading

jonathanmetzman commented May 5, 2020 •

edited

Loading

jonathanmetzman commented May 31, 2020 •

edited

Loading

jonathanmetzman commented Jul 16, 2020 •

edited

Loading

inferno-chromium commented Aug 15, 2020 •

edited

Loading

mboehme commented Sep 13, 2020 •

edited

Loading

mboehme commented Sep 29, 2020 •

edited

Loading