-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new_edge and new_features stat may be incorrect (e.g. -max_len is used or -use_value_profile) #802
Comments
@mbarbella-chromium @mukundv-chrome for the short term, should we exclude random max length strategy? What'll be the best option for your ongoing experiment(s)? Another bad idea is to do an extra run with |
Discussed offline. The experimental data should be good for now, so I'll upload a CL to record |
@inferno-chromium suggested a nice solution on libFuzzer side (to print stats after reading the first corpus dir), I've uploaded a CL as https://reviews.llvm.org/D66020, but we probably need to chose some other keyword. |
Summary: The purpose is to be able to extract the number of new edges added to the original (i.e. output) corpus directory after doing the merge. Use case example: in ClusterFuzz, we do merge after every fuzzing session, to avoid uploading too many corpus files, and we also record coverage stats at that point. Having a separate line indicating stats after reading the initial output corpus directory would make the stats extraction easier for both humans and parsing scripts. Context: google/clusterfuzz#802. Reviewers: morehouse, hctim Reviewed By: hctim Subscribers: delcypher, #sanitizers, llvm-commits, kcc Tags: #llvm, #sanitizers Differential Revision: https://reviews.llvm.org/D66020 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@368461 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: The purpose is to be able to extract the number of new edges added to the original (i.e. output) corpus directory after doing the merge. Use case example: in ClusterFuzz, we do merge after every fuzzing session, to avoid uploading too many corpus files, and we also record coverage stats at that point. Having a separate line indicating stats after reading the initial output corpus directory would make the stats extraction easier for both humans and parsing scripts. Context: google/clusterfuzz#802. Reviewers: morehouse, hctim Reviewed By: hctim Subscribers: delcypher, #sanitizers, llvm-commits, kcc Tags: #llvm, #sanitizers Differential Revision: https://reviews.llvm.org/D66020 llvm-svn: 368461
One more relevant problem: we don't use I'll fix it as well when making the other merge changes. |
One more thing that came up during discussion with @jonathanmetzman today. With this new merge thing, we might want to do |
FTR, the most recent Clang roll in Chromium should include my |
#815) * Use the two step merge process to collect accurate stats for the new corpus units. * Working on the tests * add files for two merge steps * Make the tests work with two step process * improve the tests, add Minijail, Fuchsia, and Android support. * self review and minor clean up * add default values for device specific dirs * Address comments by Jonathan * Refactor minimize_corpus and revert to the original interface * Address other review comments * Minor fixes / comments clarifications. * Adjust the timeout after the first step * address final comments, remove old stats code, test everywhere * Fix old stats test
#802). (#815)" * Use the two step merge process to collect accurate stats for the new corpus units. * Working on the tests * add files for two merge steps * Make the tests work with two step process * improve the tests, add Minijail, Fuchsia, and Android support. * self review and minor clean up * add default values for device specific dirs * Address comments by Jonathan * Refactor minimize_corpus and revert to the original interface * Address other review comments * Minor fixes / comments clarifications. * Adjust the timeout after the first step * address final comments, remove old stats code, test everywhere * Fix old stats test
#802). (#815)" (#1261) * Reland "libFuzzer: use two step merge after fuzzing for accurate stats (#802). (#815)" * Use the two step merge process to collect accurate stats for the new corpus units. * Working on the tests * add files for two merge steps * Make the tests work with two step process * improve the tests, add Minijail, Fuchsia, and Android support. * self review and minor clean up * add default values for device specific dirs * Address comments by Jonathan * Refactor minimize_corpus and revert to the original interface * Address other review comments * Minor fixes / comments clarifications. * Adjust the timeout after the first step * address final comments, remove old stats code, test everywhere * Fix old stats test * Add a fallback to the single step merge + convert assertions into errors * Patch existing tests to work * Update integration test and add a new one. * add dict and options file for the old fuzzer * make the check work for Fuchsia * Another fix for Fuchsia
I've checked BigQuery stats for the past 15 days for both Chromium ClusterFuzz and OSS-Fuzz. The change has definitely had an effect -- I can share stats with Googlers. One thing that is noticeable and expected is that an average Some other movements are also evident, though hard to comment on, as previously stats were incomplete as we skipped recording them for certain strategies. |
…ep (#802). I think this case is very unlikely to happen, but better safe than sorry I guess.
As per Abhishek's suggestion I've checked whether this new implementation affected the number of post-fuzzing merge timeouts. I've checked the logs for the past 7+ days and it looks like the overall number of timeouts is not affected, which is expected, as the overall complexity of performing a merge hasn't changed much.
|
Re-evaluating some strategies / metrics after the change being deployed for ~2 weeks (access to the link is restricted for non-Googlers):
on average, DFT strategy is still losing to the runs where it's not being used
ML RNN looks pretty good, it seems to be be useful, as 66.7% of fuzz targets perform better with ML RNN, while only 32% of targets perform better without it (all on average)
The graphs seem to be broken, possibly due to #1304 I've ran some queries myself in https://docs.google.com/spreadsheets/d/1dNgK9xg9zkdds6ckFalNWniVI2FtVZylkfcFKR52kX0/edit#gid=0 It might be a little hard to read, but an awesome observation here is how Once the Data Studio charts are fixed, we should be able to observe even more changes. |
Summary: The purpose is to be able to extract the number of new edges added to the original (i.e. output) corpus directory after doing the merge. Use case example: in ClusterFuzz, we do merge after every fuzzing session, to avoid uploading too many corpus files, and we also record coverage stats at that point. Having a separate line indicating stats after reading the initial output corpus directory would make the stats extraction easier for both humans and parsing scripts. Context: google/clusterfuzz#802. Reviewers: morehouse, hctim Reviewed By: hctim Subscribers: delcypher, #sanitizers, llvm-commits, kcc Tags: #llvm, #sanitizers Differential Revision: https://reviews.llvm.org/D66020 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk/lib/fuzzer@368461 91177308-0d34-0410-b5e6-96231b3b80d8
Has this been fixed? |
I think so: #1915 |
If we fuzz with some
-max_len
value, theinitial_edge_coverage
stat will not represent the actual edge coverage of the full corpus, because libFuzzer will ignore the contents past the firstmax_len
bytes in every input.During the merge, we reasonably remove
-max_len
argument, in order to avoid truncating the corpus and losing the coverage we have.The resulting
edge_coverage
is extracted from the merge log and might be greater than theinitial_edge_coverage
even if we didn't generate any inputs, just because-max_len
caused the initial coverage to be incomplete.The resulting
new_edge
metric is calculated as the difference between these two (excluding the corpus subset strategy, as in that case we were able to realize the very same problem and fixed it some time ago):clusterfuzz/src/python/bot/fuzzers/libFuzzer/stats.py
Line 291 in 649d43c
I don't have a good solution in mind. We can skip
new_edge
metric for random max length strategy, but that would mean we lose a signal for the strategy.My best idea so far is to make libFuzzer's merge to give us all the numbers (i.e. initial coverage + new coverage), but that'll complicate libFuzzer's interface, so I don't really like this either.
UPD: we don't use value profiling during merge, that means we might be missing some features.
UPD 2: One more thing that came up during discussion with @jonathanmetzman today. With this new merge thing, we might want to do
gsutil cp
andgsutil rm
after every fuzzing run, to essentially "prune" the corpus. That way, the recurring corpus pruning task would have much less of a load and would be more like a clean up / stats collection task, rather than a real heavy corpus pruning job.Action Items
merge_edge_coverage
column), sha1 check for corpus filenames._is_multistep_merge_supported
using a less hacky approach (check help message).The text was updated successfully, but these errors were encountered: