new_edge and new_features stat may be incorrect (e.g. -max_len is used or -use_value_profile) #802

Dor1s · 2019-08-08T19:30:05Z

If we fuzz with some -max_len value, the initial_edge_coverage stat will not represent the actual edge coverage of the full corpus, because libFuzzer will ignore the contents past the first max_len bytes in every input.
During the merge, we reasonably remove -max_len argument, in order to avoid truncating the corpus and losing the coverage we have.
The resulting edge_coverage is extracted from the merge log and might be greater than the initial_edge_coverage even if we didn't generate any inputs, just because -max_len caused the initial coverage to be incomplete.
The resulting new_edge metric is calculated as the difference between these two (excluding the corpus subset strategy, as in that case we were able to realize the very same problem and fixed it some time ago):

clusterfuzz/src/python/bot/fuzzers/libFuzzer/stats.py

Line 291 in 649d43c

stats['new_edges'] = (

I don't have a good solution in mind. We can skip new_edge metric for random max length strategy, but that would mean we lose a signal for the strategy.

My best idea so far is to make libFuzzer's merge to give us all the numbers (i.e. initial coverage + new coverage), but that'll complicate libFuzzer's interface, so I don't really like this either.

UPD: we don't use value profiling during merge, that means we might be missing some features.

UPD 2: One more thing that came up during discussion with @jonathanmetzman today. With this new merge thing, we might want to do gsutil cp and gsutil rm after every fuzzing run, to essentially "prune" the corpus. That way, the recurring corpus pruning task would have much less of a load and would be more like a clean up / stats collection task, rather than a real heavy corpus pruning job.

Action Items

1. Modify libFuzzer's merge process so that it could provide the accurate stats. Done in https://reviews.llvm.org/D66107.
2. Switch ClusterFuzz's launcher code to use the new merge process.
3. Monitor the new behavior, debug and fix any failures.
4. Revisit old code for parsing merge stats (and other stats overlapping with these), consider removing (e.g. merge_edge_coverage column), sha1 check for corpus filenames.
5. Rewrite the _is_multistep_merge_supported using a less hacky approach (check help message).
6. Update Android test binaries: Add two-step merge testing for Android #1732.

The text was updated successfully, but these errors were encountered:

Dor1s · 2019-08-08T20:07:09Z

@mbarbella-chromium @mukundv-chrome for the short term, should we exclude random max length strategy? What'll be the best option for your ongoing experiment(s)?

Another bad idea is to do an extra run with -runs=0 on the full corpus to get the real initial_edge_coverage. We can do it only for the runs where random max length is used, but that will be quite expensive and unwise anyway :(

Dor1s · 2019-08-08T21:28:39Z

@mbarbella-chromium @mukundv-chrome for the short term, should we exclude random max length strategy? What'll be the best option for your ongoing experiment(s)?

Discussed offline. The experimental data should be good for now, so I'll upload a CL to record new_edges = 0 when random max length strategy is used.

…th is used (#802).

…th is used (#802). (#805)

Dor1s · 2019-08-09T18:04:31Z

@inferno-chromium suggested a nice solution on libFuzzer side (to print stats after reading the first corpus dir), I've uploaded a CL as https://reviews.llvm.org/D66020, but we probably need to chose some other keyword. INITED is supposed to indicate that ALL inputs files are read, so in our case it's kinda PARTIALLY INITED :)

Summary: The purpose is to be able to extract the number of new edges added to the original (i.e. output) corpus directory after doing the merge. Use case example: in ClusterFuzz, we do merge after every fuzzing session, to avoid uploading too many corpus files, and we also record coverage stats at that point. Having a separate line indicating stats after reading the initial output corpus directory would make the stats extraction easier for both humans and parsing scripts. Context: google/clusterfuzz#802. Reviewers: morehouse, hctim Reviewed By: hctim Subscribers: delcypher, #sanitizers, llvm-commits, kcc Tags: #llvm, #sanitizers Differential Revision: https://reviews.llvm.org/D66020 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@368461 91177308-0d34-0410-b5e6-96231b3b80d8

Summary: The purpose is to be able to extract the number of new edges added to the original (i.e. output) corpus directory after doing the merge. Use case example: in ClusterFuzz, we do merge after every fuzzing session, to avoid uploading too many corpus files, and we also record coverage stats at that point. Having a separate line indicating stats after reading the initial output corpus directory would make the stats extraction easier for both humans and parsing scripts. Context: google/clusterfuzz#802. Reviewers: morehouse, hctim Reviewed By: hctim Subscribers: delcypher, #sanitizers, llvm-commits, kcc Tags: #llvm, #sanitizers Differential Revision: https://reviews.llvm.org/D66020 llvm-svn: 368461

…rge log (#802).

Dor1s · 2019-09-03T18:18:39Z

One more relevant problem: we don't use -use_value_profile=1 during merge, that means we discard some features that could've been discovered during fuzzing. That explains why new_features stat so often has non-zero values.

I'll fix it as well when making the other merge changes.

Dor1s · 2019-09-06T21:21:08Z

One more thing that came up during discussion with @jonathanmetzman today. With this new merge thing, we might want to do gsutil cp and gsutil rm after every fuzzing run, to essentially "prune" the corpus. That way, the recurring corpus pruning task would have much less of a load and would be more like a clean up / stats collection task, rather than a real heavy corpus pruning job.

Dor1s · 2019-09-19T04:59:24Z

FTR, the most recent Clang roll in Chromium should include my -merge=1 change, so that is not a blocker anymore, but there are still some other things that can delay switching to the new merge logic.

#815) * Use the two step merge process to collect accurate stats for the new corpus units. * Working on the tests * add files for two merge steps * Make the tests work with two step process * improve the tests, add Minijail, Fuchsia, and Android support. * self review and minor clean up * add default values for device specific dirs * Address comments by Jonathan * Refactor minimize_corpus and revert to the original interface * Address other review comments * Minor fixes / comments clarifications. * Adjust the timeout after the first step * address final comments, remove old stats code, test everywhere * Fix old stats test

#802). (#815)" This reverts commit 393a525.

#802). (#815)" (#1256) This reverts commit 393a525.

#802). (#815)" * Use the two step merge process to collect accurate stats for the new corpus units. * Working on the tests * add files for two merge steps * Make the tests work with two step process * improve the tests, add Minijail, Fuchsia, and Android support. * self review and minor clean up * add default values for device specific dirs * Address comments by Jonathan * Refactor minimize_corpus and revert to the original interface * Address other review comments * Minor fixes / comments clarifications. * Adjust the timeout after the first step * address final comments, remove old stats code, test everywhere * Fix old stats test

#802). (#815)" (#1261) * Reland "libFuzzer: use two step merge after fuzzing for accurate stats (#802). (#815)" * Use the two step merge process to collect accurate stats for the new corpus units. * Working on the tests * add files for two merge steps * Make the tests work with two step process * improve the tests, add Minijail, Fuchsia, and Android support. * self review and minor clean up * add default values for device specific dirs * Address comments by Jonathan * Refactor minimize_corpus and revert to the original interface * Address other review comments * Minor fixes / comments clarifications. * Adjust the timeout after the first step * address final comments, remove old stats code, test everywhere * Fix old stats test * Add a fallback to the single step merge + convert assertions into errors * Patch existing tests to work * Update integration test and add a new one. * add dict and options file for the old fuzzer * make the check work for Fuchsia * Another fix for Fuchsia

Dor1s · 2019-12-16T23:33:04Z

I've checked BigQuery stats for the past 15 days for both Chromium ClusterFuzz and OSS-Fuzz.

The change has definitely had an effect -- I can share stats with Googlers.

One thing that is noticeable and expected is that an average new_edges value has dropped.

Some other movements are also evident, though hard to comment on, as previously stats were incomplete as we skipped recording them for certain strategies.

…ep (#802). I think this case is very unlikely to happen, but better safe than sorry I guess.

…ep (#802). (#1289) * Use correct exception for timeout detected in _minimize_corpus_two_step (#802). I think this case is very unlikely to happen, but better safe than sorry I guess. * Also add logging of failed merge output in _merge_new_units * fix a typo

Dor1s · 2019-12-17T20:44:32Z

As per Abhishek's suggestion I've checked whether this new implementation affected the number of post-fuzzing merge timeouts. I've checked the logs for the past 7+ days and it looks like the overall number of timeouts is not affected, which is expected, as the overall complexity of performing a merge hasn't changed much.

# Chromium ClusterFuzz, number of "Merging new testcases timed out" errors a day.
2019-12-10 : 1423
2019-12-11 : 1246
2019-12-12 : 1435 # two step merge was deployed at the end of this day
2019-12-13 : 1497
2019-12-14 : 1330
2019-12-15 : 1363
2019-12-16 : 1446
2019-12-17 : 1128 # not a full day

Dor1s · 2019-12-26T22:03:05Z

Re-evaluating some strategies / metrics after the change being deployed for ~2 weeks (access to the link is restricted for non-Googlers):

DFT on OSS-Fuzz: https://docs.google.com/spreadsheets/d/14hAEuRsfofSiInUTR7KA48SbGY4QPuxHOzObjVPInQk/edit#gid=1395928105

on average, DFT strategy is still losing to the runs where it's not being used

ML RNN on ClusterFuzz: https://docs.google.com/spreadsheets/d/13te_2ZcRQhbm25B_h2Xg8-TFV0TBn4TSSBmbTNLy2_E/edit#gid=0

ML RNN looks pretty good, it seems to be be useful, as 66.7% of fuzz targets perform better with ML RNN, while only 32% of targets perform better without it (all on average)

MAB probabilities: https://datastudio.google.com/c/u/0/reporting/1crV7jDmoFh25j2ZMATEMa3X0HfsKqGOQ/page/QAox

The graphs seem to be broken, possibly due to #1304

I've ran some queries myself in https://docs.google.com/spreadsheets/d/1dNgK9xg9zkdds6ckFalNWniVI2FtVZylkfcFKR52kX0/edit#gid=0

It might be a little hard to read, but an awesome observation here is how fork strategy is showing up among a half of the top strategies since Dec 16th. Until that, fork strategy wasn't ever in top 10, just because we did not have good stats for it. ML RNN probability has also jumped for ~10-20%, but that might be a temporary fluctuation.

Once the Data Studio charts are fixed, we should be able to observe even more changes.

Summary: The purpose is to be able to extract the number of new edges added to the original (i.e. output) corpus directory after doing the merge. Use case example: in ClusterFuzz, we do merge after every fuzzing session, to avoid uploading too many corpus files, and we also record coverage stats at that point. Having a separate line indicating stats after reading the initial output corpus directory would make the stats extraction easier for both humans and parsing scripts. Context: google/clusterfuzz#802. Reviewers: morehouse, hctim Reviewed By: hctim Subscribers: delcypher, #sanitizers, llvm-commits, kcc Tags: #llvm, #sanitizers Differential Revision: https://reviews.llvm.org/D66020 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk/lib/fuzzer@368461 91177308-0d34-0410-b5e6-96231b3b80d8

nwellnhof · 2024-02-24T16:57:35Z

One more relevant problem: we don't use -use_value_profile=1 during merge, that means we discard some features that could've been discovered during fuzzing. That explains why new_features stat so often has non-zero values.

I'll fix it as well when making the other merge changes.

Has this been fixed?

Dor1s · 2024-02-26T05:37:45Z

One more relevant problem: we don't use -use_value_profile=1 during merge, that means we discard some features that could've been discovered during fuzzing. That explains why new_features stat so often has non-zero values.
I'll fix it as well when making the other merge changes.

Has this been fixed?

I think so: #1915

Dor1s added the bug Something isn't working label Aug 8, 2019

Dor1s self-assigned this Aug 8, 2019

Dor1s added a commit that referenced this issue Aug 8, 2019

libFuzzer stats: skip new_edges and new_features when random max leng…

729d6de

…th is used (#802).

Dor1s added a commit that referenced this issue Aug 8, 2019

libFuzzer stats: skip new_edges and new_features when random max leng…

f0d2dc1

…th is used (#802). (#805)

Dor1s added a commit that referenced this issue Aug 9, 2019

libFuzzer stats: use LOADED marker to parse initial stats from the me…

d2e603e

…rge log (#802).

Dor1s mentioned this issue Aug 14, 2019

More fuzzing engine refactoring (#483). #835

Merged

Dor1s changed the title ~~new_edge stat may be incorrect when -max_len is used~~ new_edge and new_features stat may be incorrect (e.g. -max_len is used or -use_value_profile) Sep 3, 2019

Dor1s added a commit that referenced this issue Dec 11, 2019

Revert "libFuzzer: use two step merge after fuzzing for accurate stats (

79ba7d1

#802). (#815)" This reverts commit 393a525.

Dor1s added a commit that referenced this issue Dec 12, 2019

Revert "libFuzzer: use two step merge after fuzzing for accurate stats (

c2931e7

#802). (#815)" (#1256) This reverts commit 393a525.

Dor1s mentioned this issue Dec 12, 2019

Reland "libFuzzer: use two step merge after fuzzing for accurate stats (#802). (#815)" #1261

Merged

Dor1s added a commit that referenced this issue Dec 17, 2019

Use correct exception for timeout detected in _minimize_corpus_two_st…

101dacd

…ep (#802). I think this case is very unlikely to happen, but better safe than sorry I guess.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new_edge and new_features stat may be incorrect (e.g. -max_len is used or -use_value_profile) #802

new_edge and new_features stat may be incorrect (e.g. -max_len is used or -use_value_profile) #802

Dor1s commented Aug 8, 2019 •

edited

Loading

Dor1s commented Aug 8, 2019

Dor1s commented Aug 8, 2019

Dor1s commented Aug 9, 2019

Dor1s commented Sep 3, 2019

Dor1s commented Sep 6, 2019

Dor1s commented Sep 19, 2019

Dor1s commented Dec 16, 2019

Dor1s commented Dec 17, 2019

Dor1s commented Dec 26, 2019

nwellnhof commented Feb 24, 2024

Dor1s commented Feb 26, 2024

new_edge and new_features stat may be incorrect (e.g. -max_len is used or -use_value_profile) #802

new_edge and new_features stat may be incorrect (e.g. -max_len is used or -use_value_profile) #802

Comments

Dor1s commented Aug 8, 2019 • edited Loading

Dor1s commented Aug 8, 2019

Dor1s commented Aug 8, 2019

Dor1s commented Aug 9, 2019

Dor1s commented Sep 3, 2019

Dor1s commented Sep 6, 2019

Dor1s commented Sep 19, 2019

Dor1s commented Dec 16, 2019

Dor1s commented Dec 17, 2019

Dor1s commented Dec 26, 2019

nwellnhof commented Feb 24, 2024

Dor1s commented Feb 26, 2024

Dor1s commented Aug 8, 2019 •

edited

Loading