Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment Announcement Thread #249

Open
jonathanmetzman opened this issue Apr 22, 2020 · 37 comments
Open

Experiment Announcement Thread #249

jonathanmetzman opened this issue Apr 22, 2020 · 37 comments

Comments

@jonathanmetzman
Copy link
Contributor

jonathanmetzman commented Apr 22, 2020

New experiments will be announced on this issue as discussed on #205.
I think this is a good place to discuss experiments as well.

@jonathanmetzman jonathanmetzman pinned this issue Apr 22, 2020
@jonathanmetzman
Copy link
Contributor Author

https://www.fuzzbench.com/reports/2020-04-20-aflplusplus/index.html contains afl++ variants.
The experiment is currently running, most of the trials have completed, though some started running earlier today, and coverage measurement still needs to finish.

I just started another experiment https://www.fuzzbench.com/reports/2020-04-21/index.html that includes every fuzzer except for the afl++ variants benchmarked above.

Once both of these experiments complete I will make a combined report.

Note that coverage measuring for the previous experiment https://www.fuzzbench.com/reports/2020-04-14/index.html didn't complete because of a bug I have fixed.

@vanhauser-thc
Copy link
Collaborator

@jonathanmetzman when do you expect the 2020-04-20-aflplusplus one to be finished with the measurement? and is the title then updated to not mention "incomplete"?

no need to combine this with 2020-04-21 btw, this will just make the overall report graphs unreadable with so many entries :)

@jonathanmetzman
Copy link
Contributor Author

@jonathanmetzman when do you expect the 2020-04-20-aflplusplus one to be finished with the measurement? and is the title then updated to not mention "incomplete"?

Title gets updated automatically.
Hard to say when it will complete, I expect a day or two from now.
The recent fixes we made and removal of irssi should reduce measurement time considerably.
Unfortuantely, afl++ experiment started before that.

no need to combine this with 2020-04-21 btw, this will just make the overall report graphs unreadable with so many entries :)

OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well).

@vanhauser-thc
Copy link
Collaborator

OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well).

that would be good however the new run does not have irssi target in there ... does that work and not f*ck up the benchmark calculation?

@jonathanmetzman
Copy link
Contributor Author

I can exclude that benchmark.

@jonathanmetzman
Copy link
Contributor Author

https://www.fuzzbench.com/reports/2020-04-20-aflplusplus/index.html contains afl++ variants.
The experiment is currently running, most of the trials have completed, though some started running earlier today, and coverage measurement still needs to finish.

I just started another experiment https://www.fuzzbench.com/reports/2020-04-21/index.html that includes every fuzzer except for the afl++ variants benchmarked above.

Once both of these experiments complete I will make a combined report.

Both of these experiments have finished measuring. I think we are going to prioritize speeding up measuring.

@jonathanmetzman
Copy link
Contributor Author

OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well).

that would be good however the new run does not have irssi target in there ... does that work and not f*ck up the benchmark calculation?

Made this combination report: https://www.fuzzbench.com/reports/2020-04-21-and-20-aflplusplus/index.html
It contains all benchmarks (except woff and irssi) and all afl++ based fuzzers from 2020-04-21 and 2020-04-20-aflplusplus

@vanhauser-thc
Copy link
Collaborator

@jonathanmetzman thanks! whenever you can start the next batch - that one will be very interesting, especially those with increased map sizes :)

@jonathanmetzman
Copy link
Contributor Author

@jonathanmetzman thanks! whenever you can start the next batch - that one will be very interesting, especially those with increased map sizes :)

Not exactly sure when this will be but probably by Friday, I'm trying to work on some long term improvement and planning this week.

@jonathanmetzman
Copy link
Contributor Author

I started experiments for AFL++ and fastcgs with and without huge page tables.

@jonathanmetzman
Copy link
Contributor Author

I started experiments for AFL++ and fastcgs with and without huge page tables.

Experiments are done.
https://www.fuzzbench.com/reports/2020-05-01-fastcgs/index.html compared versions of fastcgs with support for huge page tables and without. CC @alifahmed

https://www.fuzzbench.com/reports/2020-05-01-aflplusplus-1/index.html and https://www.fuzzbench.com/reports/2020-05-01-aflplusplus-2/index.html compared afl++ variants.

@jonathanmetzman
Copy link
Contributor Author

jonathanmetzman commented May 5, 2020

I'm starting to work on using resources more intelligently in experiments (particularly for fuzzer variants or features that are in development).
The next experiments I will test using preemptible instances for trials.
The differences you will see here are:

  1. Experiments will be 23 hours (since preemptible instances cannot last longer than 24 hours and we need time to start up).
  2. Some (generally, 5-15% according to the docs) trials will not complete.
    Assuming that 3 trials do not complete for a fuzzer-benchmark, it only means comparisons between it and other fuzzers will be slightly less significant.
    We may be able to fix this in the future by restarting preempted trials, but for now we won't.

@jonathanmetzman
Copy link
Contributor Author

jonathanmetzman commented May 5, 2020

Running a new experiment with the main fuzzers and the new benchmark.
The experiment is using preemptible VMs
https://www.fuzzbench.com/reports/2020-05-04-preempt-new-bench/index.html

EDIT:

I'll discuss the results from that experiment here to reduce spam on this thread.

@inferno-chromium
Copy link
Collaborator

@jonathanmetzman
Copy link
Contributor Author

A 15 trial full experiment: https://www.fuzzbench.com/reports/2020-05-24/index.html

@jonathanmetzman
Copy link
Contributor Author

jonathanmetzman commented May 31, 2020

20 trial experiment comparing aflplusplus_optimal, aflplusplus_shmem, and ankou (with a buggy integration): https://www.fuzzbench.com/reports/2020-05-28/index.html

@jonathanmetzman
Copy link
Contributor Author

https://www.fuzzbench.com/reports/2020-06-12/index.html is an experiment with libfuzzer_nocmp, aflcc, manul and afl++ (and its variants) combined with 2020-05-24.

@vanhauser-thc
Copy link
Collaborator

The results of the new experiment https://www.fuzzbench.com/reports/2020-07-17/index.html currently look very different to previous runs.
What has changed? is it using the new coverage? or are the benchmark targets changed?

was there already an assessment which of the two coverage methods is better or what the advantages/disadvantages are?
thanks!

@jonathanmetzman
Copy link
Contributor Author

jonathanmetzman commented Jul 16, 2020

What did you notice that's different in 2020-07-17? I stopped that experiment to run 2020-07-13 (the AFL++ experiment I was supposed to run for you a few days ago, but didn't accidentally because of a bug in the service code, 2020-07-13 running right now).

We're only part way in 2020-07-17 but I ran it to make sure that the results are the same as usual. 2020-07-17 was run using #509 which will allow non-FuzzBench maintainers to easily contribute benchmarks from OSS-Fuzz. In theory the results should be the same even though the builds are different. But any differences are very helpful for me.

@jonathanmetzman
Copy link
Contributor Author

I moved 2020-07-17 here since I don't consider it an official experiment.

was there already an assessment which of the two coverage methods is better or what the advantages/disadvantages are?
thanks!

Note that 2020-07-17 used sancov, the current cov implementation. Only clang-cov-test used it. So far it looks totally fine to replace sancov with clang cov. But we're still investigating.

@vanhauser-thc
Copy link
Collaborator

What did you notice that's different in 2020-07-17?

honggfuzz is not in the top list and instead fastcgs_lm is ... both unusual. Also aflplusplus_ctx_nozerosingle has no reason to be at place 3 and should rather be around ankou.
sure it is only running for 1/3 of the time, but usually there are not dramatic improvements or degration over the whole benchmark after 6 hours.

@vanhauser-thc
Copy link
Collaborator

Also 2020-07-17 is not updated anymore since moved

@jonathanmetzman
Copy link
Contributor Author

Also 2020-07-17 is not updated anymore since moved

I stopped that experiment to run yours.

@alifahmed
Copy link
Contributor

There is a very noticeable bump in edge coverage between all the reports before 07/25 and after 08/03.

https://www.fuzzbench.com/reports/2020-07-25/index.html
https://www.fuzzbench.com/reports/2020-08-03/index.html

I am wondering what caused it?

@vanhauser-thc
Copy link
Collaborator

true! and systemd has much less now which is weird

@inferno-chromium
Copy link
Collaborator

There is a very noticeable bump in edge coverage between all the reports before 07/25 and after 08/03.

https://www.fuzzbench.com/reports/2020-07-25/index.html
https://www.fuzzbench.com/reports/2020-08-03/index.html

I am wondering what caused it?

We moved to using clang code coverage instead of sancov.

@vanhauser-thc
Copy link
Collaborator

what is measured now? edges? basic blocks? lines of codes? instructions?

@inferno-chromium
Copy link
Collaborator

what is measured now? edges? basic blocks? lines of codes? instructions?

It is called regions in clang code coverage. https://llvm.org/docs/CoverageMappingFormat.html#id14. It is character level precision, e.g. multiple ones in just one line “return x || y && z”.

@alifahmed
Copy link
Contributor

Thanks! I think the report graphs should indicate this change to avoid confusion. Currently the graphs still says "Reached edge coverage". Should it not be something like region/code coverage?

@inferno-chromium
Copy link
Collaborator

inferno-chromium commented Aug 15, 2020

Thanks! I think the report graphs should indicate this change to avoid confusion. Currently the graphs still says "Reached edge coverage". Should it not be something like region/code coverage?

I think we discussed it a little internally, i think some people felt that just region can be confusing and community is more accustomed to edges for coverage. @lszekeres - can you revisit and fix this confusion sometime next week ?

@lszekeres
Copy link
Member

Thanks! I think the report graphs should indicate this change to avoid confusion. Currently the graphs still says "Reached edge coverage". Should it not be something like region/code coverage?

I think we discussed it a little internally, i think some people felt that just region can be confusing and community is more accustomed to edges for coverage. @lszekeres - can you revisit and fix this confusion sometime next week ?

Yes, let me clarify this in the report (along with the other things we discussed to make it more clear).

@inferno-chromium
Copy link
Collaborator

Heads up guys, there will be some hiccups as we transition to clang code coverage, have differential coverage metrics (see #657) and stabilize things.

@mboehme
Copy link
Contributor

mboehme commented Sep 13, 2020

[LibFuzzer / Entropic] No Restart

Details

Currently, in fork mode, LibFuzzer runs a new instance (job) every five minutes and then merges the generate corpus into the main corpus. The new LF instance (job) starts with a very small subset of the main corpus.

In this commit, in fork mode, LibFuzzer runs a new instance only when a crash is encountered and merges the current corpus into the main corpus every five minutes. The new LF instance (job) always starts with the entire main corpus.

Some seed files might have been deleted before the merge. So, there is a thread-unsafe quick check whether the file still exists before it is merged.

Results

Normalized coverage achieved across all benchmarks (Log-Time)
image

Critical Difference @1 hour
image

Critical Difference @4 hours
image

Critical Difference @23 hours

@mboehme
Copy link
Contributor

mboehme commented Sep 29, 2020

[AFL++] Boosting the fast and coe schedules

(Might prepare another experiment and update here)

Details

In terms of normalized coverage (mean of means across all benchmarks), FAST3 consistently performs best.

image

image

For most subjects, the new schedules outperform the baseline schedules. FAST3 performs really well on bloaty, lcms, and proj4. FAST4 performs really well on the new benchmarks Nginx and Libxslt. On LibXSLT COE3 quickly catches up. Also, EXPLOIT is often worst but performs exceptionally well on vorbis, wolff, curl, and freetype2.

Note: For the ragged graphs (e.g., zlib, libxslt, libjpeg, openthread, vorbis, re2, and lcms), if we had more trials, we wouldn't see such random jumps but a smoother progress. In this data, there are 20 trials for all time stamps, benchmarks, and fuzzers. For some subjects, more trials would be beneficial.

Log Time

image

Linear Time

image

@inferno-chromium
Copy link
Collaborator

@mboehme - regenerated https://www.fuzzbench.com/reports/experimental/2020-09-26/index.html with just that experiment results, no merges.

@mboehme
Copy link
Contributor

mboehme commented Sep 29, 2020

Thanks @inferno-chromium!

@inferno-chromium
Copy link
Collaborator

Experiment that uses saturated OSS-Fuzz corpora [from months/years of continuous fuzzing] is now available - https://www.fuzzbench.com/reports/2020-10-11-saturated-ossfuzz-corpus/index.html

@jonathanmetzman jonathanmetzman unpinned this issue Feb 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants