symcc-aflplusplus: add new fuzzer. #1165

DavidKorczynski · 2021-06-01T13:17:22Z

Following the two recent symcc_afl experiments 1 and 2 I thought it would be interesting to see how symcc and aflplusplus combines, and this is the purpose of this PR.

I combine symcc and aflplusplus in the same way as symcc and afl is combined. Thus, I don't use the custom mutator in aflplusplus here but rather combine it using the symcc-afl set up, as I read a comment from @vanhauser-thc on the fuzzing discord that the custom mutator runs way too often (please do let me know if I read this wrong @vanhauser-thc).

The difference between this PR and the symcc_afl integration is that I clone aflplusplus instead of afl and I use the aflplusplus fuzzer.py script to build aflplusplus, other than that it's the same as symcc_afl. I am not super familiar with aflplusplus @vanhauser-thc so if there are parts of aflplusplus that means I can't do this then please let me know :)!

I tested this on the bloaty target and things seems to be running the way they should: I see symcc and afl output. Am currently testing on more benchmarks.

vanhauser-thc · 2021-06-01T13:25:41Z

fuzzers/symcc_aflplusplus/fuzzer.py

+    print('[run_fuzzer] Running AFL for SymCC')
+    aflplusplus_fuzzer.prepare_fuzz_environment(input_corpus)
+    launch_afl_thread(input_corpus, output_corpus, target_binary,
+                      ["-M", "afl-master"])


-M enables options we dont want on a benchmark run. rather use -S afl.
also please add os.environ['AFL_DISABLE_TRIM'] = "1"

why are there two launch_afl_thread calls?

Thanks for the rapid review, the two launch_afl_thread calls is because I follow the documentation from symcc which asks for two AFL instances (one with -M and one with -S): https://github.com/eurecom-s3/symcc/blob/9b20609adab02279c181010c8b1e61a9a9acac62/docs/Fuzzing.txt#L91

why would there any value in this? contrary, this makes the comparison even less fair :)
you compare a single run fuzzer with a double run fuzzer + symcc.

already afl + symcc is unfair vs afl.
you would need to compare afl + afl vs afl + symcc.

I did that in an Eclipser run to see if it was really that good, but it turned out that afl++ + afl++ is better than afl++ + Eclipser ...

(the instances only have one CPU assigned so that is the right way to test IMHO)

If there is a single CPU dedicated then that should make it fair? Interesting with the Eclipser experiment - but again, if there is a single CPU dedicated then it shouldn't matter if you run N instances vs 1? - in fact it should be a disadvantage for the ones with multiple analysis tools assuming more context switches? Unless you assume the two will receive more total CPU time than the one instance.

it makes a difference as a single fuzzer does not 100% utilize the CPU. the more fuzzers you run, the close to the 100% you get. e.g. if you have 10 core * 2 thread = 20 thread machine, you will have peak exec/c at ~25 fuzzing threads in my experience. (with afl++ where we pin to CPUs, with libfuzzer that might be even more because they are less effective on their own)

but to make a fairer comparison - add a fuzzer instance that has just the two afl and no symcc - then this is something to compare against (although the 2x afl + symcc should still have an advantage simply because it has a 3rd fuzzer doing things)

but to make a fairer comparison - add a fuzzer instance that has just the two afl and no symcc - then this is something to compare against (although the 2x afl + symcc should still have an advantage simply because it has a 3rd fuzzer doing things)

Yeah I agree - I will add this too so we can that comparison as well. Will also switch to using a single AFL although I will have to check it doesn't do anything unexpected to symcc.

vanhauser-thc · 2021-06-01T13:30:51Z

I combine symcc and aflplusplus in the same way as symcc and afl is combined. Thus, I don't use the custom mutator in aflplusplus here but rather combine it using the symcc-afl set up, as I read a comment from @vanhauser-thc on the fuzzing discord that the custom mutator runs way too often (please do let me know if I read this wrong @vanhauser-thc).

yes that is correct.

I notice that you do not clone from the original symcc repo but your company's import - which you did not fork from the original one, hence it is a lot lot effort to identify the exact changes that you guys made to it. (this is not best practice ...)

did you make changes to the symcc fuzzer helper script? because richard and me found several issues there that resulted in termination of the helper during real fuzzing. plus I made more changes for better results (although I am not 100% sure they are good, so a second pair of eyes would be good -> eurecom-s3/symcc#46

DavidKorczynski · 2021-06-01T13:48:23Z

I notice that you do not clone from the original symcc repo but your company's import - which you did not fork from the original one, hence it is a lot lot effort to identify the exact changes that you guys made to it. (this is not best practice ...)

The main reason for copying the git repository rather than a fork is you can't search in forked repos. The repository I code on has two main changes - update the code so it can be compiled with the LLVM in the fuzzbench/oss-fuzz docker images and some initial PoC work to do pure concolic execution (no combination with a fuzzer) but this is still very much early stages and exploratory.

Am happy to assist if you are looking to grab any of these changes.

did you make changes to the symcc fuzzer helper script? because richard and me found several issues there that resulted in termination of the helper during real fuzzing. plus I made more changes for better results (although I am not 100% sure they are good, so a second pair of eyes would be good -> eurecom-s3/symcc#46

I made no changes to the helper script, but have seen the changes you made. So far I observed the helper only crash once, but since you and Richard experienced some issues I think it is likely that it crashed during the experiments at some point. It's on my todo-list to investigate this a bit deeper.

vanhauser-thc · 2021-06-01T13:57:06Z

I notice that you do not clone from the original symcc repo but your company's import - which you did not fork from the original one, hence it is a lot lot effort to identify the exact changes that you guys made to it. (this is not best practice ...)

The main reason for copying the git repository rather than a fork is you can't search in forked repos. The repository I code on has two main changes - update the code so it can be compiled with the LLVM in the fuzzbench/oss-fuzz docker images and some initial PoC work to do pure concolic execution (no combination with a fuzzer) but this is still very much early stages and exploratory.

Am happy to assist if you are looking to grab any of these changes.

did you make changes to the symcc fuzzer helper script? because richard and me found several issues there that resulted in termination of the helper during real fuzzing. plus I made more changes for better results (although I am not 100% sure they are good, so a second pair of eyes would be good -> eurecom-s3/symcc#46

I made no changes to the helper script, but have seen the changes you made. So far I observed the helper only crash once, but since you and Richard experienced some issues I think it is likely that it crashed during the experiments at some point. It's on my todo-list to investigate this a bit deeper.

the issue for the crashes is that the rust tool does not like any return codes it doesnt expect and then panics. (e.g. a timeout will result in that - which happens if the queue entry was on the border to timeout but didnt but with afl-showmap then does).
in my assessment the whole afl-showmap is not helpful, so I removed that completely :)
so you have to ensure that this doesnt happen.

DavidKorczynski · 2021-06-01T14:01:35Z

the issue for the crashes is that the rust tool does not like any return codes it doesnt expect and then panics. (e.g. a timeout will result in that - which happens if the queue entry was on the border to timeout but didnt but with afl-showmap then does).
in my assessment the whole afl-showmap is not helpful, so I removed that completely :)
so you have to ensure that this doesnt happen.

Interesting - my approach was essentially this: First get symcc as presented in the paper to work in fuzzbench with as few modifications as possible (in the afl-symcc case this is only the LLVM changes, i.e. these are essentially compatibility changes and less of functionality changes). Second, look for improvements - I guess I found modifying the helper to be an improvement as such. But thanks for the heads up - I will take a look at it

DavidKorczynski · 2021-06-01T14:24:38Z

in my assessment the whole afl-showmap is not helpful, so I removed that completely :)

Am not sure I understand why this is the case for the same reasons that @sebastianpoeplau highlights eurecom-s3/symcc#49 (comment)

DavidKorczynski · 2021-06-01T17:10:01Z

I made no changes to the helper script, but have seen the changes you made. So far I observed the helper only crash once, but since you and Richard experienced some issues I think it is likely that it crashed during the experiments at some point. It's on my todo-list to investigate this a bit deeper.

As a quick note on this - I ran it through the libhtp benchmark multiple times now: one with the AFL used in symcc_afl and also the afl from the symcc repository and found no issues. I then ran it with aflplusplus and the panics from running afl-showmap happens instantly.

DavidKorczynski · 2021-06-02T20:37:53Z

Okay, I now looked a bit more into the details of combining symcc and aflplusplus. There were some changes in the behaviour of afl-showmap that didn't seem to translate so well: the map size was a bit more difficult to control (even when I explicitly define the environment variable AFL_MAP_SIZE it wouldn't always set the size to the specified value - am not sure if this is an issue on my end or something to do with release 3.10c of AFLplusplus which says the environment variable is "(mostly) obsolete)". Althought this problem was easily solved, however, I then ran into issues in symcc where the bitmaps generated by afl-showmap never had any changes and thus symcc would never see any "new" inputs. The solution I went for was simply avoiding the afl-showmap and simply take all seeds generated by symcc into the afl-queue. This is the same as what @vanhauser-thc suggested - however, it changes the behaviour of symcc as many more seeds will be put in the afl queue by symcc, and am not really sure how this translates in practice as the number of seeds can quickly blow up from symcc if duplicates are not handled.

One more minor change was that I increased the timeout inbetween symcc runs from 5 to 20 seconds.

I didn't include a new experiment that uses multiple processes of aflplusplus (with regards to the concerns here #1165 (comment)). This is because multiple processes doesn't seem to be an advantage for AFL following the experiment we have here https://www.fuzzbench.com/reports/experimental/2021-06-01-symccafl/index.html where after 10 hours the single-process and two-process versions of afl are very close to each other in the majority of cases and each have performed better over the other in a few benchmarks. If it happens that this symcc_aflplusplus combination performs well and there is doubt about whether this is because of multiple processes then we can just launch another experiment where we clarify that hypothesis.

DavidKorczynski · 2021-06-03T11:22:36Z

@vanhauser-thc I fixed up os.environ['AFL_DISABLE_TRIM'] = "1" and -S afl. Unless you have any blockers then @inferno-chromium I would be happy to have this one merged in.

vanhauser-thc · 2021-06-03T12:01:08Z

I didn't include a new experiment that uses multiple processes of aflplusplus (with regards to the concerns here #1165 (comment)). This is because multiple processes doesn't seem to be an advantage for AFL following the experiment we have here https://www.fuzzbench.com/reports/experimental/2021-06-01-symccafl/index.html where after 10 hours the single-process and two-process versions of afl are very close to each other in the majority of cases and each have performed better over the other in a few benchmarks. If it happens that this symcc_aflplusplus combination performs well and there is doubt about whether this is because of multiple processes then we can just launch another experiment where we clarify that hypothesis.

I noticed before that you don't seem to have the patience to wait until results are stable :p
It will still take another 24h until results are stable but even now you can see that between symcc_afl and symcc_afl_single there is a huge difference.
dont ask me though what is happening with afl_two_instances, which with worse than single afl. but then again, the benchmark is just at about 40% complete.

DavidKorczynski · 2021-06-03T12:45:25Z

It will still take another 24h until results are stable but even now you can see that between symcc_afl and symcc_afl_single there is a huge difference.
dont ask me though what is happening with afl_two_instances, which with worse than single afl. but then again, the benchmark is just at about 40% complete.

My thoughts here are that if we assume the multi process part does not influence AFL, then likely it won't have an impact on the two AFL processes in the SymCC run, and that makes me think a likely scenario for why symcc_afl_single has worse performance than symcc_afl is because something in the scheduling between symcc and afl changed (this is also in part why I try to increase the timer between afl and symcc now, i.e. see if we can get the performance of symcc + 2 afl with symcc + 1 afl). A part of the why i'm a bit sceptical of switching to a single afl + symcc is that the symcc authors describe symcc+fuzzing set up as using two AFL instances and I assume there is a reason for that beyond more processes give better results (https://github.com/eurecom-s3/symcc/blob/9b20609adab02279c181010c8b1e61a9a9acac62/docs/Fuzzing.txt#L91)

Am happy to wait for the current experiment to complete but am also happy to work with experiments running in parallel.

inferno-chromium · 2021-06-03T14:42:25Z

@laurentsimon - can you please take a final review pass and merge.

laurentsimon · 2021-06-03T16:43:47Z

I then ran into issues in symcc where the bitmaps generated by afl-showmap never had any changes and thus symcc would never see any "new" inputs

not that afl does not seem to flush the bitmap to disk when writing it, so there's no guarantee it's on disk. Could be a reason you don't see changes.

laurentsimon · 2021-06-03T16:47:10Z

A part of the why i'm a bit sceptical of switching to a single afl + symcc is that the symcc authors describe symcc+fuzzing set up as using two AFL instances and I assume there is a reason for that beyond more processes give better results (https://github.com/eurecom-s3/symcc/blob/9b20609adab02279c181010c8b1e61a9a9acac62/docs/Fuzzing.txt#L91)

ask them to be sure. I doubt there's a technical reason. Early papers on fuzzing used one fuzzer instance and 2 concolic replayers... researchers probably just re-used the same setup to help comparison.

DavidKorczynski · 2021-06-03T16:48:50Z

A part of the why i'm a bit sceptical of switching to a single afl + symcc is that the symcc authors describe symcc+fuzzing set up as using two AFL instances and I assume there is a reason for that beyond more processes give better results (https://github.com/eurecom-s3/symcc/blob/9b20609adab02279c181010c8b1e61a9a9acac62/docs/Fuzzing.txt#L91)

ask them to be sure. I doubt there's a technical reason. Early papers on fuzzing used one fuzzer instance and 2 concolic replayers... researchers probably just re-used the same setup to help comparison.

I asked here: #1166 (comment)

Just asked a bit more formally on the SymCC project: eurecom-s3/symcc#58

DavidKorczynski · 2021-06-03T16:50:34Z

I then ran into issues in symcc where the bitmaps generated by afl-showmap never had any changes and thus symcc would never see any "new" inputs

not that afl does not seem to flush the bitmap to disk when writing it, so there's no guarantee it's on disk. Could be a reason you don't see changes.

This behaviour where the output from afl-showmap showed no new inputs in the symcc context only occurred with afl-plusplus and not afl. But it might be that it is not that important, so would be happy to see the results from disabling that part of the symcc workflow.

laurentsimon · 2021-06-03T16:52:21Z

A few of the checks are failing. Seems to jut be flaky tests, ie unrelated to your changes. Please confirm.

DavidKorczynski · 2021-06-03T16:53:17Z

A few of the checks are failing. Seems to jut be flaky tests, ie unrelated to your changes. Please confirm.

I confirm. I went through it and it looks to be good in similar vein as the previous symcc PRs. There are issues where the symcc CI runs out of disk - but I have tested on the several benchmarks and it looks good on my end.

laurentsimon · 2021-06-03T16:56:55Z

not that afl does not seem to flush the bitmap to disk when writing it, so there's no guarantee it's on disk. Could be a reason you don't see changes.

This behaviour where the output from afl-showmap showed no new inputs in the symcc context only occurred with afl-plusplus and not afl. But it might be that it is not that important, so would be happy to see the results from disabling that part of the symcc workflow.

aflpp is built on afl, AFAIK. Seems to use the same code for saving the bitmap. I'll let the author chime in.

symcc-aflplusplus: add new fuzzer.

6cac72d

google-cla bot added the cla: yes label Jun 1, 2021

DavidKorczynski marked this pull request as draft June 1, 2021 13:18

DavidKorczynski added 2 commits June 1, 2021 14:22

updated fuzzers.yml.

f695239

use the afl_fuzzer prepare environment.

07cd428

vanhauser-thc reviewed Jun 1, 2021

View reviewed changes

DavidKorczynski mentioned this pull request Jun 1, 2021

AFL integration with multiple AFL instances running and a SymCC instance with only a single AFL process. #1166

Merged

DavidKorczynski added 8 commits June 2, 2021 21:07

update aflplusplus set up.

8f36874

change the commit for symcc_afl to try new changes.

a8728c6

merge

4290696

switch symcc_afl_single to new update.

5222846

added experiment request.

8e1f65f

fix formatting.

d05c404

update experiment description to be more accurate.

97630d1

one more update in experiment description.

bdf60fd

DavidKorczynski marked this pull request as ready for review June 2, 2021 20:22

Disable benchmark for symcc_aflplusplus similar to symcc_afl

dd9e9ee

inferno-chromium requested a review from laurentsimon June 3, 2021 14:41

laurentsimon approved these changes Jun 3, 2021

View reviewed changes

DavidKorczynski mentioned this pull request Jun 3, 2021

Recommended workflow for afl and symcc combination eurecom-s3/symcc#58

Closed

inferno-chromium merged commit d51da57 into google:master Jun 3, 2021

wideglide mentioned this pull request Jun 12, 2021

Finding a kernel that qsym works on #1179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

symcc-aflplusplus: add new fuzzer. #1165

symcc-aflplusplus: add new fuzzer. #1165

DavidKorczynski commented Jun 1, 2021 •

edited

vanhauser-thc Jun 1, 2021

DavidKorczynski Jun 1, 2021 •

edited

vanhauser-thc Jun 1, 2021

vanhauser-thc Jun 1, 2021

DavidKorczynski Jun 1, 2021 •

edited

vanhauser-thc Jun 1, 2021

vanhauser-thc Jun 1, 2021

DavidKorczynski Jun 1, 2021

vanhauser-thc commented Jun 1, 2021

DavidKorczynski commented Jun 1, 2021

vanhauser-thc commented Jun 1, 2021

DavidKorczynski commented Jun 1, 2021 •

edited

DavidKorczynski commented Jun 1, 2021 •

edited

DavidKorczynski commented Jun 1, 2021

DavidKorczynski commented Jun 2, 2021 •

edited

DavidKorczynski commented Jun 3, 2021

vanhauser-thc commented Jun 3, 2021

DavidKorczynski commented Jun 3, 2021

inferno-chromium commented Jun 3, 2021

laurentsimon commented Jun 3, 2021

laurentsimon commented Jun 3, 2021 •

edited

DavidKorczynski commented Jun 3, 2021 •

edited

DavidKorczynski commented Jun 3, 2021 •

edited

laurentsimon commented Jun 3, 2021

DavidKorczynski commented Jun 3, 2021

laurentsimon commented Jun 3, 2021

symcc-aflplusplus: add new fuzzer. #1165

symcc-aflplusplus: add new fuzzer. #1165

Conversation

DavidKorczynski commented Jun 1, 2021 • edited

vanhauser-thc Jun 1, 2021

Choose a reason for hiding this comment

DavidKorczynski Jun 1, 2021 • edited

Choose a reason for hiding this comment

vanhauser-thc Jun 1, 2021

Choose a reason for hiding this comment

vanhauser-thc Jun 1, 2021

Choose a reason for hiding this comment

DavidKorczynski Jun 1, 2021 • edited

Choose a reason for hiding this comment

vanhauser-thc Jun 1, 2021

Choose a reason for hiding this comment

vanhauser-thc Jun 1, 2021

Choose a reason for hiding this comment

DavidKorczynski Jun 1, 2021

Choose a reason for hiding this comment

vanhauser-thc commented Jun 1, 2021

DavidKorczynski commented Jun 1, 2021

vanhauser-thc commented Jun 1, 2021

DavidKorczynski commented Jun 1, 2021 • edited

DavidKorczynski commented Jun 1, 2021 • edited

DavidKorczynski commented Jun 1, 2021

DavidKorczynski commented Jun 2, 2021 • edited

DavidKorczynski commented Jun 3, 2021

vanhauser-thc commented Jun 3, 2021

DavidKorczynski commented Jun 3, 2021

inferno-chromium commented Jun 3, 2021

laurentsimon commented Jun 3, 2021

laurentsimon commented Jun 3, 2021 • edited

DavidKorczynski commented Jun 3, 2021 • edited

DavidKorczynski commented Jun 3, 2021 • edited

laurentsimon commented Jun 3, 2021

DavidKorczynski commented Jun 3, 2021

laurentsimon commented Jun 3, 2021

DavidKorczynski commented Jun 1, 2021 •

edited

DavidKorczynski Jun 1, 2021 •

edited

DavidKorczynski Jun 1, 2021 •

edited

DavidKorczynski commented Jun 1, 2021 •

edited

DavidKorczynski commented Jun 1, 2021 •

edited

DavidKorczynski commented Jun 2, 2021 •

edited

laurentsimon commented Jun 3, 2021 •

edited

DavidKorczynski commented Jun 3, 2021 •

edited

DavidKorczynski commented Jun 3, 2021 •

edited