-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment Announcement Thread #249
Comments
https://www.fuzzbench.com/reports/2020-04-20-aflplusplus/index.html contains afl++ variants. I just started another experiment https://www.fuzzbench.com/reports/2020-04-21/index.html that includes every fuzzer except for the afl++ variants benchmarked above. Once both of these experiments complete I will make a combined report. Note that coverage measuring for the previous experiment https://www.fuzzbench.com/reports/2020-04-14/index.html didn't complete because of a bug I have fixed. |
@jonathanmetzman when do you expect the 2020-04-20-aflplusplus one to be finished with the measurement? and is the title then updated to not mention "incomplete"? no need to combine this with 2020-04-21 btw, this will just make the overall report graphs unreadable with so many entries :) |
Title gets updated automatically.
OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well). |
that would be good however the new run does not have irssi target in there ... does that work and not f*ck up the benchmark calculation? |
I can exclude that benchmark. |
Both of these experiments have finished measuring. I think we are going to prioritize speeding up measuring. |
Made this combination report: https://www.fuzzbench.com/reports/2020-04-21-and-20-aflplusplus/index.html |
@jonathanmetzman thanks! whenever you can start the next batch - that one will be very interesting, especially those with increased map sizes :) |
Not exactly sure when this will be but probably by Friday, I'm trying to work on some long term improvement and planning this week. |
I started experiments for AFL++ and fastcgs with and without huge page tables. |
Experiments are done. https://www.fuzzbench.com/reports/2020-05-01-aflplusplus-1/index.html and https://www.fuzzbench.com/reports/2020-05-01-aflplusplus-2/index.html compared afl++ variants. |
I'm starting to work on using resources more intelligently in experiments (particularly for fuzzer variants or features that are in development).
|
Running a new experiment with the main fuzzers and the new benchmark. EDIT: I'll discuss the results from that experiment here to reduce spam on this thread. |
A 15 trial full experiment: https://www.fuzzbench.com/reports/2020-05-24/index.html |
20 trial experiment comparing aflplusplus_optimal, aflplusplus_shmem, and ankou (with a buggy integration): https://www.fuzzbench.com/reports/2020-05-28/index.html |
https://www.fuzzbench.com/reports/2020-06-12/index.html is an experiment with libfuzzer_nocmp, aflcc, manul and afl++ (and its variants) combined with 2020-05-24. |
The results of the new experiment https://www.fuzzbench.com/reports/2020-07-17/index.html currently look very different to previous runs. was there already an assessment which of the two coverage methods is better or what the advantages/disadvantages are? |
What did you notice that's different in 2020-07-17? I stopped that experiment to run 2020-07-13 (the AFL++ experiment I was supposed to run for you a few days ago, but didn't accidentally because of a bug in the service code, 2020-07-13 running right now). We're only part way in 2020-07-17 but I ran it to make sure that the results are the same as usual. 2020-07-17 was run using #509 which will allow non-FuzzBench maintainers to easily contribute benchmarks from OSS-Fuzz. In theory the results should be the same even though the builds are different. But any differences are very helpful for me. |
I moved 2020-07-17 here since I don't consider it an official experiment.
Note that 2020-07-17 used sancov, the current cov implementation. Only clang-cov-test used it. So far it looks totally fine to replace sancov with clang cov. But we're still investigating. |
honggfuzz is not in the top list and instead fastcgs_lm is ... both unusual. Also aflplusplus_ctx_nozerosingle has no reason to be at place 3 and should rather be around ankou. |
Also 2020-07-17 is not updated anymore since moved |
I stopped that experiment to run yours. |
There is a very noticeable bump in edge coverage between all the reports before 07/25 and after 08/03. https://www.fuzzbench.com/reports/2020-07-25/index.html I am wondering what caused it? |
true! and systemd has much less now which is weird |
We moved to using clang code coverage instead of sancov. |
what is measured now? edges? basic blocks? lines of codes? instructions? |
It is called regions in clang code coverage. https://llvm.org/docs/CoverageMappingFormat.html#id14. It is character level precision, e.g. multiple ones in just one line “return x || y && z”. |
Thanks! I think the report graphs should indicate this change to avoid confusion. Currently the graphs still says "Reached edge coverage". Should it not be something like region/code coverage? |
I think we discussed it a little internally, i think some people felt that just region can be confusing and community is more accustomed to edges for coverage. @lszekeres - can you revisit and fix this confusion sometime next week ? |
Yes, let me clarify this in the report (along with the other things we discussed to make it more clear). |
Heads up guys, there will be some hiccups as we transition to clang code coverage, have differential coverage metrics (see #657) and stabilize things. |
[LibFuzzer / Entropic] No Restart
DetailsCurrently, in fork mode, LibFuzzer runs a new instance (job) every five minutes and then merges the generate corpus into the main corpus. The new LF instance (job) starts with a very small subset of the main corpus. In this commit, in fork mode, LibFuzzer runs a new instance only when a crash is encountered and merges the current corpus into the main corpus every five minutes. The new LF instance (job) always starts with the entire main corpus. Some seed files might have been deleted before the merge. So, there is a thread-unsafe quick check whether the file still exists before it is merged. ResultsNormalized coverage achieved across all benchmarks (Log-Time) Critical Difference @1 hour Critical Difference @4 hours Critical Difference @23 hours |
[AFL++] Boosting the fast and coe schedules(Might prepare another experiment and update here)
DetailsIn terms of normalized coverage (mean of means across all benchmarks), FAST3 consistently performs best. For most subjects, the new schedules outperform the baseline schedules. FAST3 performs really well on bloaty, lcms, and proj4. FAST4 performs really well on the new benchmarks Nginx and Libxslt. On LibXSLT COE3 quickly catches up. Also, EXPLOIT is often worst but performs exceptionally well on vorbis, wolff, curl, and freetype2. Note: For the ragged graphs (e.g., zlib, libxslt, libjpeg, openthread, vorbis, re2, and lcms), if we had more trials, we wouldn't see such random jumps but a smoother progress. In this data, there are 20 trials for all time stamps, benchmarks, and fuzzers. For some subjects, more trials would be beneficial. Log TimeLinear Time |
@mboehme - regenerated https://www.fuzzbench.com/reports/experimental/2020-09-26/index.html with just that experiment results, no merges. |
Thanks @inferno-chromium! |
Experiment that uses saturated OSS-Fuzz corpora [from months/years of continuous fuzzing] is now available - https://www.fuzzbench.com/reports/2020-10-11-saturated-ossfuzz-corpus/index.html |
New experiments will be announced on this issue as discussed on #205.
I think this is a good place to discuss experiments as well.
The text was updated successfully, but these errors were encountered: