Differential graphs on report #657

Shadoom7 · 2020-08-10T21:05:51Z

This CL implements some differential matrix to measure the uniqueness of each fuzzer. Demo report.

The first graph is the normal ranking plot. The shadowed region shows how many unique regions are covered by each fuzzer.

The second graph is ranking plot showing the unique regions covered by each fuzzer. It is kept besides the first graph in case someone needs to see the ranking based on the unique-region coverage or know the exact number of coverage.

The third graph is a square matrix where the number in the cell represents the number of regions covered by the fuzzer of the column but not by the fuzzer of the row.

analysis/generate_report.py

experiment/coverage_utils.py

analysis/data_utils.py

analysis/report_templates/default.html

analysis/data_utils.py

jonathanmetzman · 2020-08-11T16:23:25Z

I didn't realize the coverage reports show every file on the first page. @Dor1s is there a way from clang coverage to generate the reports so that it shows things on a directory level? I think this is much more readable.

analysis/benchmark_results.py

Dor1s · 2020-08-11T16:36:26Z

@Dor1s is there a way from clang coverage to generate the reports so that it shows things on a directory level? I think this is much more readable.

Yes, you would need to post-process the report like this https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage#L165

where coverage_helper is calling into coverage_utils.py: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage_helper#L17

which gets cloned from Chromium repo like this: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/Dockerfile#L39

You do need to grab the full folder from Chromium as it has necessary HTML templates. A good thing though that it didn't change in a while and you may grab it once and it should be working hopefully forever.

We were thinking about upstreaming directory view support to LLVM, but for some reason it didn't happen.

analysis/generate_report.py

analysis/coverage_data_utils.py

analysis/report_templates/default.html

analysis/generate_report.py

jonathanmetzman · 2020-08-11T16:43:27Z

Does this work with merging experiments?

jonathanmetzman · 2020-08-11T16:45:42Z

There's a bit of a problem here and with coverage reports. Previously, it was not the end of the world if one fuzzer got more trials than another. But at least for these things those fuzzers will have an unfair advantage and have better results. I'm not sure how we should handle this.

jonathanmetzman · 2020-08-11T16:47:10Z

Some thoughts for the future: I think actually an aggregate of all of these metrics could be useful. So that for all fuzzbench programs we can see a graph of which fuzzers covered the most unique regions. But I don't think that should come here.

jonathanmetzman · 2020-08-11T18:44:12Z

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

analysis/data_utils.py

analysis/coverage_data_utils.py

analysis/benchmark_results.py

analysis/report_templates/default.html

lszekeres · 2020-08-11T20:39:23Z

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

Yes, I suggested the shadows, because

seeing the total coverage is important context for the unique coverage, and
that way we don't need to add yet another plot for something that we can show on an existing plot.

So I think a separate "rare bar graph" is not really necessary. Wdty?

And yes, it the plots definitely need a description / explanation how to read them under them.

Shadoom7 · 2020-08-11T21:43:29Z

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

Yes, I suggested the shadows, because

seeing the total coverage is important context for the unique coverage, and

that way we don't need to add yet another plot for something that we can show on an existing plot.

So I think a separate "rare bar graph" is not really necessary. Wdty?

And yes, it the plots definitely need a description / explanation how to read them under them.

Yeah, it's true. However, the shadow has some drawbacks from my perspective.

It doesn't give you the rank or numbers. I've tried to put numbers above those shadows but looks very messy.
The origin graph is the median coverage rank plot. But we can't do the same for unique region, since measuring unique region in per-trial level doesn't make sense (should we compare trial-1 of all fuzzers or what).

Wdyt?

Shadoom7 · 2020-08-11T21:45:25Z

@Dor1s is there a way from clang coverage to generate the reports so that it shows things on a directory level? I think this is much more readable.

Yes, you would need to post-process the report like this https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage#L165

where coverage_helper is calling into coverage_utils.py: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage_helper#L17

which gets cloned from Chromium repo like this: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/Dockerfile#L39

You do need to grab the full folder from Chromium as it has necessary HTML templates. A good thing though that it didn't change in a while and you may grab it once and it should be working hopefully forever.

We were thinking about upstreaming directory view support to LLVM, but for some reason it didn't happen.

Thank you Max! I'll try it.

jonathanmetzman · 2020-08-11T21:47:24Z

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

Yes, I suggested the shadows, because

seeing the total coverage is important context for the unique coverage, and

that way we don't need to add yet another plot for something that we can show on an existing plot.

So I think a separate "rare bar graph" is not really necessary. Wdty?
And yes, it the plots definitely need a description / explanation how to read them under them.

Yeah, it's true. However, the shadow has some drawbacks from my perspective.

It doesn't give you the rank or numbers. I've tried to put numbers above those shadows but looks very messy.

The origin graph is the median coverage rank plot. But we can't do the same for unique region, since measuring unique region in per-trial level doesn't make sense (should we compare trial-1 of all fuzzers or what).

Sorry I don't understand this. Is the shadow the unique coverage of the median?

Wdyt?

How about we don't do shadows on main graph (keeping them clean and understandable) but we do a second graph that ranks each fuzzer by unique coverage and each bar contains the total edges (?) and a shadow of the unique edges. This way we put things in context, share all of the info we want to convey, don't show anything redundant and don't reduce the clarity of the main graph.

jonathanmetzman · 2020-08-11T21:51:27Z

@Dor1s is there a way from clang coverage to generate the reports so that it shows things on a directory level? I think this is much more readable.

Yes, you would need to post-process the report like this https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage#L165
where coverage_helper is calling into coverage_utils.py: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage_helper#L17
which gets cloned from Chromium repo like this: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/Dockerfile#L39
You do need to grab the full folder from Chromium as it has necessary HTML templates. A good thing though that it didn't change in a while and you may grab it once and it should be working hopefully forever.
We were thinking about upstreaming directory view support to LLVM, but for some reason it didn't happen.

Thank you Max! I'll try it.

Let's do that later, this PR is the priority. I should have mentioned that when I raised this issue. Thank you for the advice Max

Shadoom7 · 2020-08-11T21:55:11Z

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

Yes, I suggested the shadows, because

seeing the total coverage is important context for the unique coverage, and

that way we don't need to add yet another plot for something that we can show on an existing plot.

So I think a separate "rare bar graph" is not really necessary. Wdty?
And yes, it the plots definitely need a description / explanation how to read them under them.

Yeah, it's true. However, the shadow has some drawbacks from my perspective.

It doesn't give you the rank or numbers. I've tried to put numbers above those shadows but looks very messy.

The origin graph is the median coverage rank plot. But we can't do the same for unique region, since measuring unique region in per-trial level doesn't make sense (should we compare trial-1 of all fuzzers or what).

Sorry I don't understand this. Is the shadow the unique coverage of the median?

Sorry for the confusion. The shadow is the unique coverage of all trials aggregated, but the background graph is the median coverage. That's why I don't think it's a fit.

Wdyt?

How about we don't do shadows on main graph (keeping them clean and understandable) but we do a second graph that ranks each fuzzer by unique coverage and each bar contains the total edges (?) and a shadow of the unique edges. This way we put things in context, share all of the info we want to convey, don't show anything redundant and don't reduce the clarity of the main graph.

Yeah I think it's a good idea.

Shadoom7 · 2020-08-12T02:02:54Z

This patch doesn't do merging right now.
Here's my ideas for merging:

In the end of each experiment, we save the json summary files in per fuzzer level. (Currently it is per experiment level)
When generating report, fetch the json summary file inside BenchmarkResults class. (Though I don't think it's a good idea to have the class access to the bucket, I can't think of another way around)
Since the experiment_df already drops the old duplicate 'benchmark-fuzzer' pairs when merge_with_clobber flag is on. We just fetch the json summary file according to the new experiment_filestore column in the experiment_df for each benchmark-fuzzer pair.
As @jonathanmetzman suggested to me, If the trial numbers are different for fuzzers on the same benchmark. We pick the smallest number and truncate other fuzzers' results. This also means we need to store the json summary files per trial level. The second option is that we only pick one trial from each fuzzer-benchmark pair to store and measure.
Wdyt? @jonathanmetzman @lszekeres @inferno-chromium

lszekeres · 2020-08-12T03:22:59Z

How about we don't do shadows on main graph (keeping them clean and understandable) but we do a second graph that ranks each fuzzer by unique coverage and each bar contains the total edges (?) and a shadow of the unique edges. This way we put things in context, share all of the info we want to convey, don't show anything redundant and don't reduce the clarity of the main graph.

Yeah I think it's a good idea.

SGTM!

lszekeres · 2020-08-12T03:39:21Z

When generating report, fetch the json summary file inside BenchmarkResults class. (Though I don't think it's a good idea to have the class access to the bucket, I can't think of another way around)

Yes, please consider doing the downloading outside of BenchmarkResults, and passing in the object instead.

As @jonathanmetzman suggested to me, If the trial numbers are different for fuzzers on the same benchmark. We pick the smallest number and truncate other fuzzers' results. This also means we need to store the json summary files per trial level. The second option is that we only pick one trial from each fuzzer-benchmark pair to store and measure.
Wdyt? @jonathanmetzman @lszekeres @inferno-chromium

I'm not sure it's strictly necessary to add that complexity. How often does it happen that we don't have 20 trials? (also, why does it happen?) If it happens often, we might want to do our best instead to make sure we do have 20, when possible. And if it still happens some time, it's not the end of the world, we already put an "alert" in those cases in the report, no?

jonathanmetzman · 2020-08-12T04:21:11Z

I'm not sure it's strictly necessary to add that complexity. How often does it happen that we don't have 20 trials? (also, why does it happen?) If it happens often, we might want to do our best instead to make sure we do have 20, when possible. And if it still happens some time, it's not the end of the world, we already put an "alert" in those cases in the report, no?

I think it's quite often though it isn't the norm. A quick look at past reports can verify this (example https://www.fuzzbench.com/reports/2020-07-25/index.html). I don't know why it happens but I would assume that one reason is because some fuzzers OOM. This solution seems much easier than ensuring we have 20 trials for everything.

Shadoom7 · 2020-08-12T15:38:04Z

When generating report, fetch the json summary file inside BenchmarkResults class. (Though I don't think it's a good idea to have the class access to the bucket, I can't think of another way around)

Yes, please consider doing the downloading outside of BenchmarkResults, and passing in the object instead.

The thing is that we can't load all the json files ahead, which will cause ooms. So another way is to copy those json files to local directory ahead and then pass their local paths to the class as parameter. And let the class load those local json files. Do you think we should do this way?

As @jonathanmetzman suggested to me, If the trial numbers are different for fuzzers on the same benchmark. We pick the smallest number and truncate other fuzzers' results. This also means we need to store the json summary files per trial level. The second option is that we only pick one trial from each fuzzer-benchmark pair to store and measure.
Wdyt? @jonathanmetzman @lszekeres @inferno-chromium

I'm not sure it's strictly necessary to add that complexity. How often does it happen that we don't have 20 trials? (also, why does it happen?) If it happens often, we might want to do our best instead to make sure we do have 20, when possible. And if it still happens some time, it's not the end of the world, we already put an "alert" in those cases in the report, no?

inferno-chromium

Very nicely done, lets finish fixing nits, test on a quick 30 min experiment and land.

analysis/benchmark_results.py

analysis/experiment_results.py

analysis/generate_report.py

analysis/report_templates/default.html

experiment/reporter.py

analysis/test_coverage_data_utils.py

analysis/coverage_data_utils.py

inferno-chromium · 2020-08-14T17:02:30Z

Feel free to commit, once tested on a small experiment.

analysis/coverage_data_utils.py

jonathanmetzman · 2020-08-14T17:04:11Z

analysis/coverage_data_utils.py

+
+def get_fuzzer_filestore_path(benchmark_df, fuzzer):
+    """Gets the filestore_path for |fuzzer| in |benchmark_df|."""
+    fuzzer_df = benchmark_df[benchmark_df.fuzzer == fuzzer]


This is supposed to work with merging experiments right?
Can you confirm that it does? Looking at this function, it feels like intuitively it does not.

Yes I'll test it!

Like I guess this sort of assumes that benchmarks for a fuzzer all come from the same experiment. I don't think we make that assumption elsewhere though. I think the assumption is fine to make though it might break in some rare cases (such as when we add benchmarks). Can you at least document this assumption please?

I just tested and it works. The benchmark_df here is the dataframe for one certain benchmark. So we are not assuming all benchmarks for a fuzzer come from the same experiment. We are just assuming all trials for a certain pair of fuzzer-benchmark come from the same experiment, which is what merge_with_clobber does.

jonathanmetzman · 2020-08-14T17:07:41Z

analysis/benchmark_results.py

+        """Returns the filestore name of the |fuzzer_name|."""
+        filestore_path = self._get_experiment_filestore_path(fuzzer_name)
+        gcs_prefix = 'gs://'
+        gcs_http_prefix = 'https://storage.googleapis.com/'


:-( we don't make gsutil a dependency for fuzzbench. This method of downloading the data over http allows us not to do this but doesn't support a unix filesystem as a filestore. @lszekeres thoughts?

Actually I think my understanding of what this is used for is wrong.
@Shadoom7 is this function only used for displaying a link in a report?
Maybe we can implement some of this as a helper function filestore utils that just assumes the http link and we can refactor later when we actually support local experiments.

Yes, it's only used to display the link. Do you mean we should add the function in filestore_utils to return the http link and we just call that function here, ignoring the case where filestore_path is local?

Yes exactly.

Should we put it into filestore_utils though? I thought filestore_utils only contains possible terminal commands that are related to filestore.

I think Jonathan's original question is more relevant to line 63 of coverage_data_utils.py (filestore_utils.cp(src_file, dst_file)), which copies the JSON file first with gsutil cp or with cp. If we want to eliminate generate_report.py's dependency on gstuil (which would be a good idea), what we could do is:

in case of gs:// path, get the file via http from storage.googleapis.com (eg with urllib)

in case of a local path, simply open the file directly

wdyt?

I think Jonathan's original question is more relevant to line 63 of coverage_data_utils.py (filestore_utils.cp(src_file, dst_file)), which copies the JSON file first with gsutil cp or with cp. If we want to eliminate generate_report.py's dependency on gstuil (which would be a good idea), what we could do is:

in case of gs:// path, get the file via http from storage.googleapis.com (eg with urllib)

in case of a local path, simply open the file directly

wdyt?

Yeah can come later but it is probably a good idea. I sort of wished we investigated alternatives to gsutil more since this is another edge case we have to deal with on our own.

analysis/benchmark_results.py

analysis/coverage_data_utils.py

analysis/experiment_results.py

analysis/generate_report.py

jonathanmetzman

Just some style nits. This LGTM but I'll let @inferno-chromium or @lszekeres +1 since they have been more involved in this review.

analysis/test_coverage_data_utils.py

experiment/coverage_utils.py

lszekeres · 2020-08-14T19:23:48Z

analysis/plotting.py

+                    edgecolor='0.2',
+                    ax=axes)
+
+        axes.set(ylabel='Reached unique edges coverage')


edges coverage -> edge coverage (or just coverage)

lszekeres · 2020-08-14T20:09:01Z

analysis/benchmark_results.py

+        """Returns the filestore name of the |fuzzer_name|."""
+        filestore_path = self._get_experiment_filestore_path(fuzzer_name)
+        gcs_prefix = 'gs://'
+        gcs_http_prefix = 'https://storage.googleapis.com/'


I think Jonathan's original question is more relevant to line 63 of coverage_data_utils.py (filestore_utils.cp(src_file, dst_file)), which copies the JSON file first with gsutil cp or with cp. If we want to eliminate generate_report.py's dependency on gstuil (which would be a good idea), what we could do is:

in case of gs:// path, get the file via http from storage.googleapis.com (eg with urllib)

in case of a local path, simply open the file directly

wdyt?

analysis/coverage_data_utils.py

Shadoom7 · 2020-08-14T21:55:17Z

A successful test experiment with 3 benchmarks and 5 fuzzers

lszekeres · 2020-08-14T23:35:09Z

A successful test experiment with 3 benchmarks and 5 fuzzers

So nice!!

Super minor potential visual improvement: in the pairwise unique coverage matrix, put the fuzzers in the same order left to right as they are in the ranking plot above.

jonathanmetzman · 2020-08-14T23:40:24Z

A successful test experiment with 3 benchmarks and 5 fuzzers

So nice!!

Super minor potential visual improvement: in the pairwise unique coverage matrix, put the fuzzers in the same order left to right as they are in the ranking plot above.

Agreed this demo is very impressive.
Some more visual feedback:

The spacing between fuzzer names in the rows would be nice to improve: In https://storage.googleapis.com/my-fuzzbench-reports/differential-plots-3/bloaty_fuzz_target_pairwise_unique_coverage_plot.svg mopt and honggfuzz look like one word.
I think we switched columns from having a color and a dark shadow to being transparent (with borders) and a shadow with colors? Was this intentional, I liked the original version better (do others disagree?).

Shadoom7 · 2020-08-15T00:47:14Z

A successful test experiment with 3 benchmarks and 5 fuzzers

So nice!!
Super minor potential visual improvement: in the pairwise unique coverage matrix, put the fuzzers in the same order left to right as they are in the ranking plot above.

Agreed this demo is very impressive.
Some more visual feedback:

The spacing between fuzzer names in the rows would be nice to improve: In https://storage.googleapis.com/my-fuzzbench-reports/differential-plots-3/bloaty_fuzz_target_pairwise_unique_coverage_plot.svg mopt and honggfuzz look like one word.

Yeah good point. This issue may get worse when more fuzzers are added. I think I can change it so that it rotates the words the way we did in Ranking by median reached coverage plot. Wdyt?

I think we switched columns from having a color and a dark shadow to being transparent (with borders) and a shadow with colors? Was this intentional, I liked the original version better (do others disagree?).

Yes it is intentional. I just thought it is clearer since the unique edge coverage is the focus here. Another reason I did this is that it is too hard to put numbers with the origin plot( you can't put it above the colored bar because the number shows the unique coverage; if you put it above the shadow bar then it is a bit messy with all different colors in the background)

inferno-chromium · 2020-08-15T17:38:23Z

More enchancements can come in incremental new CLs, merging this.

google-cla bot added the cla: yes label Aug 10, 2020

Shadoom7 requested review from inferno-chromium, mbarbella-chromium and jonathanmetzman August 10, 2020 21:20

lszekeres reviewed Aug 10, 2020

View reviewed changes

analysis/generate_report.py Outdated Show resolved Hide resolved

experiment/coverage_utils.py Outdated Show resolved Hide resolved

analysis/data_utils.py Outdated Show resolved Hide resolved

analysis/report_templates/default.html Show resolved Hide resolved

lszekeres reviewed Aug 10, 2020

View reviewed changes

analysis/data_utils.py Outdated Show resolved Hide resolved

jonathanmetzman reviewed Aug 11, 2020

View reviewed changes

analysis/benchmark_results.py Show resolved Hide resolved

jonathanmetzman reviewed Aug 11, 2020

View reviewed changes

lszekeres reviewed Aug 11, 2020

View reviewed changes

analysis/data_utils.py Outdated Show resolved Hide resolved

analysis/coverage_data_utils.py Outdated Show resolved Hide resolved

analysis/benchmark_results.py Show resolved Hide resolved

analysis/report_templates/default.html Outdated Show resolved Hide resolved

lszekeres mentioned this pull request Aug 12, 2020

Brandon Falk @gamozolabs FuzzBench feedback #654

Open

Shadoom7 added 3 commits August 12, 2020 13:47

init

1ba9bc1

update exp_results

c800394

update benchmark_results

7edd98a

update region select

195585f

inferno-chromium approved these changes Aug 14, 2020

View reviewed changes

update

6ffb6c9

jonathanmetzman mentioned this pull request Aug 14, 2020

research feature: tracking branches taken to form branch probabilities #77

Open