Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differential graphs on report #657

Merged
merged 35 commits into from
Aug 15, 2020
Merged

Differential graphs on report #657

merged 35 commits into from
Aug 15, 2020

Conversation

Shadoom7
Copy link
Contributor

@Shadoom7 Shadoom7 commented Aug 10, 2020

This CL implements some differential matrix to measure the uniqueness of each fuzzer. Demo report.

The first graph is the normal ranking plot. The shadowed region shows how many unique regions are covered by each fuzzer.
bloaty_fuzz_target_ranking

The second graph is ranking plot showing the unique regions covered by each fuzzer. It is kept besides the first graph in case someone needs to see the ranking based on the unique-region coverage or know the exact number of coverage.
bloaty_fuzz_target_ranking_rare_region

The third graph is a square matrix where the number in the cell represents the number of regions covered by the fuzzer of the column but not by the fuzzer of the row.
bloaty_fuzz_target_correlation_plot

analysis/generate_report.py Outdated Show resolved Hide resolved
experiment/coverage_utils.py Outdated Show resolved Hide resolved
analysis/data_utils.py Outdated Show resolved Hide resolved
analysis/report_templates/default.html Show resolved Hide resolved
analysis/data_utils.py Outdated Show resolved Hide resolved
@jonathanmetzman
Copy link
Contributor

I didn't realize the coverage reports show every file on the first page. @Dor1s is there a way from clang coverage to generate the reports so that it shows things on a directory level? I think this is much more readable.

@Dor1s
Copy link
Contributor

Dor1s commented Aug 11, 2020

@Dor1s is there a way from clang coverage to generate the reports so that it shows things on a directory level? I think this is much more readable.

Yes, you would need to post-process the report like this https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage#L165

where coverage_helper is calling into coverage_utils.py: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage_helper#L17

which gets cloned from Chromium repo like this: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/Dockerfile#L39

You do need to grab the full folder from Chromium as it has necessary HTML templates. A good thing though that it didn't change in a while and you may grab it once and it should be working hopefully forever.

We were thinking about upstreaming directory view support to LLVM, but for some reason it didn't happen.

analysis/generate_report.py Outdated Show resolved Hide resolved
analysis/coverage_data_utils.py Outdated Show resolved Hide resolved
analysis/report_templates/default.html Outdated Show resolved Hide resolved
analysis/report_templates/default.html Outdated Show resolved Hide resolved
analysis/generate_report.py Outdated Show resolved Hide resolved
@jonathanmetzman
Copy link
Contributor

Does this work with merging experiments?

@jonathanmetzman
Copy link
Contributor

There's a bit of a problem here and with coverage reports. Previously, it was not the end of the world if one fuzzer got more trials than another. But at least for these things those fuzzers will have an unfair advantage and have better results. I'm not sure how we should handle this.

@jonathanmetzman
Copy link
Contributor

Some thoughts for the future: I think actually an aggregate of all of these metrics could be useful. So that for all fuzzbench programs we can see a graph of which fuzzers covered the most unique regions. But I don't think that should come here.

@jonathanmetzman
Copy link
Contributor

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

analysis/data_utils.py Outdated Show resolved Hide resolved
analysis/coverage_data_utils.py Outdated Show resolved Hide resolved
analysis/benchmark_results.py Show resolved Hide resolved
analysis/report_templates/default.html Outdated Show resolved Hide resolved
@lszekeres
Copy link
Member

lszekeres commented Aug 11, 2020

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

Yes, I suggested the shadows, because

  1. seeing the total coverage is important context for the unique coverage, and
  2. that way we don't need to add yet another plot for something that we can show on an existing plot.

So I think a separate "rare bar graph" is not really necessary. Wdty?

And yes, it the plots definitely need a description / explanation how to read them under them.

@Shadoom7
Copy link
Contributor Author

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

Yes, I suggested the shadows, because

  1. seeing the total coverage is important context for the unique coverage, and
  2. that way we don't need to add yet another plot for something that we can show on an existing plot.

So I think a separate "rare bar graph" is not really necessary. Wdty?

And yes, it the plots definitely need a description / explanation how to read them under them.

Yeah, it's true. However, the shadow has some drawbacks from my perspective.

  1. It doesn't give you the rank or numbers. I've tried to put numbers above those shadows but looks very messy.
  2. The origin graph is the median coverage rank plot. But we can't do the same for unique region, since measuring unique region in per-trial level doesn't make sense (should we compare trial-1 of all fuzzers or what).

Wdyt?

@Shadoom7
Copy link
Contributor Author

@Dor1s is there a way from clang coverage to generate the reports so that it shows things on a directory level? I think this is much more readable.

Yes, you would need to post-process the report like this https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage#L165

where coverage_helper is calling into coverage_utils.py: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage_helper#L17

which gets cloned from Chromium repo like this: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/Dockerfile#L39

You do need to grab the full folder from Chromium as it has necessary HTML templates. A good thing though that it didn't change in a while and you may grab it once and it should be working hopefully forever.

We were thinking about upstreaming directory view support to LLVM, but for some reason it didn't happen.

Thank you Max! I'll try it.

@jonathanmetzman
Copy link
Contributor

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

Yes, I suggested the shadows, because

  1. seeing the total coverage is important context for the unique coverage, and
  2. that way we don't need to add yet another plot for something that we can show on an existing plot.

So I think a separate "rare bar graph" is not really necessary. Wdty?
And yes, it the plots definitely need a description / explanation how to read them under them.

Yeah, it's true. However, the shadow has some drawbacks from my perspective.

  1. It doesn't give you the rank or numbers. I've tried to put numbers above those shadows but looks very messy.
  2. The origin graph is the median coverage rank plot. But we can't do the same for unique region, since measuring unique region in per-trial level doesn't make sense (should we compare trial-1 of all fuzzers or what).

Sorry I don't understand this. Is the shadow the unique coverage of the median?

Wdyt?

How about we don't do shadows on main graph (keeping them clean and understandable) but we do a second graph that ranks each fuzzer by unique coverage and each bar contains the total edges (?) and a shadow of the unique edges. This way we put things in context, share all of the info we want to convey, don't show anything redundant and don't reduce the clarity of the main graph.

@jonathanmetzman
Copy link
Contributor

@Dor1s is there a way from clang coverage to generate the reports so that it shows things on a directory level? I think this is much more readable.

Yes, you would need to post-process the report like this https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage#L165
where coverage_helper is calling into coverage_utils.py: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/coverage_helper#L17
which gets cloned from Chromium repo like this: https://github.com/google/oss-fuzz/blob/0987ddf994e06dd663954e8b65fdb74653b60b2f/infra/base-images/base-runner/Dockerfile#L39
You do need to grab the full folder from Chromium as it has necessary HTML templates. A good thing though that it didn't change in a while and you may grab it once and it should be working hopefully forever.
We were thinking about upstreaming directory view support to LLVM, but for some reason it didn't happen.

Thank you Max! I'll try it.

Let's do that later, this PR is the priority. I should have mentioned that when I raised this issue. Thank you for the advice Max

@Shadoom7
Copy link
Contributor Author

Shadoom7 commented Aug 11, 2020

@lszekeres @Shadoom7 mentioned to me that you wanted the shadowing on the main graphs. There's no explanation for this though which I don't think is good since it's not intuitive what it means. I think we can either explain the shadowing or we can get rid of the shadowing and only keep the rare bar graph. I don't think it makes sense to have the rare bar graph + shadowing since they convey the same info.

Yes, I suggested the shadows, because

  1. seeing the total coverage is important context for the unique coverage, and
  2. that way we don't need to add yet another plot for something that we can show on an existing plot.

So I think a separate "rare bar graph" is not really necessary. Wdty?
And yes, it the plots definitely need a description / explanation how to read them under them.

Yeah, it's true. However, the shadow has some drawbacks from my perspective.

  1. It doesn't give you the rank or numbers. I've tried to put numbers above those shadows but looks very messy.
  2. The origin graph is the median coverage rank plot. But we can't do the same for unique region, since measuring unique region in per-trial level doesn't make sense (should we compare trial-1 of all fuzzers or what).

Sorry I don't understand this. Is the shadow the unique coverage of the median?

Sorry for the confusion. The shadow is the unique coverage of all trials aggregated, but the background graph is the median coverage. That's why I don't think it's a fit.

Wdyt?

How about we don't do shadows on main graph (keeping them clean and understandable) but we do a second graph that ranks each fuzzer by unique coverage and each bar contains the total edges (?) and a shadow of the unique edges. This way we put things in context, share all of the info we want to convey, don't show anything redundant and don't reduce the clarity of the main graph.

Yeah I think it's a good idea.

@Shadoom7
Copy link
Contributor Author

Shadoom7 commented Aug 12, 2020

This patch doesn't do merging right now.
Here's my ideas for merging:

  1. In the end of each experiment, we save the json summary files in per fuzzer level. (Currently it is per experiment level)
  2. When generating report, fetch the json summary file inside BenchmarkResults class. (Though I don't think it's a good idea to have the class access to the bucket, I can't think of another way around)
  3. Since the experiment_df already drops the old duplicate 'benchmark-fuzzer' pairs when merge_with_clobber flag is on. We just fetch the json summary file according to the new experiment_filestore column in the experiment_df for each benchmark-fuzzer pair.
  4. As @jonathanmetzman suggested to me, If the trial numbers are different for fuzzers on the same benchmark. We pick the smallest number and truncate other fuzzers' results. This also means we need to store the json summary files per trial level. The second option is that we only pick one trial from each fuzzer-benchmark pair to store and measure.
    Wdyt? @jonathanmetzman @lszekeres @inferno-chromium

@lszekeres
Copy link
Member

How about we don't do shadows on main graph (keeping them clean and understandable) but we do a second graph that ranks each fuzzer by unique coverage and each bar contains the total edges (?) and a shadow of the unique edges. This way we put things in context, share all of the info we want to convey, don't show anything redundant and don't reduce the clarity of the main graph.

Yeah I think it's a good idea.

SGTM!

@lszekeres
Copy link
Member

  1. When generating report, fetch the json summary file inside BenchmarkResults class. (Though I don't think it's a good idea to have the class access to the bucket, I can't think of another way around)

Yes, please consider doing the downloading outside of BenchmarkResults, and passing in the object instead.

  1. As @jonathanmetzman suggested to me, If the trial numbers are different for fuzzers on the same benchmark. We pick the smallest number and truncate other fuzzers' results. This also means we need to store the json summary files per trial level. The second option is that we only pick one trial from each fuzzer-benchmark pair to store and measure.
    Wdyt? @jonathanmetzman @lszekeres @inferno-chromium

I'm not sure it's strictly necessary to add that complexity. How often does it happen that we don't have 20 trials? (also, why does it happen?) If it happens often, we might want to do our best instead to make sure we do have 20, when possible. And if it still happens some time, it's not the end of the world, we already put an "alert" in those cases in the report, no?

@jonathanmetzman
Copy link
Contributor

I'm not sure it's strictly necessary to add that complexity. How often does it happen that we don't have 20 trials? (also, why does it happen?) If it happens often, we might want to do our best instead to make sure we do have 20, when possible. And if it still happens some time, it's not the end of the world, we already put an "alert" in those cases in the report, no?

I think it's quite often though it isn't the norm. A quick look at past reports can verify this (example https://www.fuzzbench.com/reports/2020-07-25/index.html). I don't know why it happens but I would assume that one reason is because some fuzzers OOM. This solution seems much easier than ensuring we have 20 trials for everything.

@Shadoom7
Copy link
Contributor Author

  1. When generating report, fetch the json summary file inside BenchmarkResults class. (Though I don't think it's a good idea to have the class access to the bucket, I can't think of another way around)

Yes, please consider doing the downloading outside of BenchmarkResults, and passing in the object instead.

The thing is that we can't load all the json files ahead, which will cause ooms. So another way is to copy those json files to local directory ahead and then pass their local paths to the class as parameter. And let the class load those local json files. Do you think we should do this way?

  1. As @jonathanmetzman suggested to me, If the trial numbers are different for fuzzers on the same benchmark. We pick the smallest number and truncate other fuzzers' results. This also means we need to store the json summary files per trial level. The second option is that we only pick one trial from each fuzzer-benchmark pair to store and measure.
    Wdyt? @jonathanmetzman @lszekeres @inferno-chromium

I'm not sure it's strictly necessary to add that complexity. How often does it happen that we don't have 20 trials? (also, why does it happen?) If it happens often, we might want to do our best instead to make sure we do have 20, when possible. And if it still happens some time, it's not the end of the world, we already put an "alert" in those cases in the report, no?

Copy link
Collaborator

@inferno-chromium inferno-chromium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely done, lets finish fixing nits, test on a quick 30 min experiment and land.

analysis/benchmark_results.py Show resolved Hide resolved
analysis/benchmark_results.py Outdated Show resolved Hide resolved
analysis/experiment_results.py Outdated Show resolved Hide resolved
analysis/experiment_results.py Outdated Show resolved Hide resolved
analysis/generate_report.py Outdated Show resolved Hide resolved
analysis/generate_report.py Outdated Show resolved Hide resolved
analysis/report_templates/default.html Outdated Show resolved Hide resolved
experiment/reporter.py Outdated Show resolved Hide resolved
analysis/test_coverage_data_utils.py Outdated Show resolved Hide resolved
analysis/coverage_data_utils.py Outdated Show resolved Hide resolved
@inferno-chromium
Copy link
Collaborator

Feel free to commit, once tested on a small experiment.


def get_fuzzer_filestore_path(benchmark_df, fuzzer):
"""Gets the filestore_path for |fuzzer| in |benchmark_df|."""
fuzzer_df = benchmark_df[benchmark_df.fuzzer == fuzzer]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to work with merging experiments right?
Can you confirm that it does? Looking at this function, it feels like intuitively it does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'll test it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I guess this sort of assumes that benchmarks for a fuzzer all come from the same experiment. I don't think we make that assumption elsewhere though. I think the assumption is fine to make though it might break in some rare cases (such as when we add benchmarks). Can you at least document this assumption please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested and it works. The benchmark_df here is the dataframe for one certain benchmark. So we are not assuming all benchmarks for a fuzzer come from the same experiment. We are just assuming all trials for a certain pair of fuzzer-benchmark come from the same experiment, which is what merge_with_clobber does.

"""Returns the filestore name of the |fuzzer_name|."""
filestore_path = self._get_experiment_filestore_path(fuzzer_name)
gcs_prefix = 'gs://'
gcs_http_prefix = 'https://storage.googleapis.com/'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-( we don't make gsutil a dependency for fuzzbench. This method of downloading the data over http allows us not to do this but doesn't support a unix filesystem as a filestore. @lszekeres thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think my understanding of what this is used for is wrong.
@Shadoom7 is this function only used for displaying a link in a report?
Maybe we can implement some of this as a helper function filestore utils that just assumes the http link and we can refactor later when we actually support local experiments.

Copy link
Contributor Author

@Shadoom7 Shadoom7 Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's only used to display the link. Do you mean we should add the function in filestore_utils to return the http link and we just call that function here, ignoring the case where filestore_path is local?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put it into filestore_utils though? I thought filestore_utils only contains possible terminal commands that are related to filestore.

Copy link
Member

@lszekeres lszekeres Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jonathan's original question is more relevant to line 63 of coverage_data_utils.py (filestore_utils.cp(src_file, dst_file)), which copies the JSON file first with gsutil cp or with cp. If we want to eliminate generate_report.py's dependency on gstuil (which would be a good idea), what we could do is:

  1. in case of gs:// path, get the file via http from storage.googleapis.com (eg with urllib)
  2. in case of a local path, simply open the file directly

wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jonathan's original question is more relevant to line 63 of coverage_data_utils.py (filestore_utils.cp(src_file, dst_file)), which copies the JSON file first with gsutil cp or with cp. If we want to eliminate generate_report.py's dependency on gstuil (which would be a good idea), what we could do is:

  1. in case of gs:// path, get the file via http from storage.googleapis.com (eg with urllib)
  2. in case of a local path, simply open the file directly

wdyt?

Yeah can come later but it is probably a good idea. I sort of wished we investigated alternatives to gsutil more since this is another edge case we have to deal with on our own.

Copy link
Contributor

@jonathanmetzman jonathanmetzman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some style nits. This LGTM but I'll let @inferno-chromium or @lszekeres +1 since they have been more involved in this review.

analysis/test_coverage_data_utils.py Outdated Show resolved Hide resolved
experiment/coverage_utils.py Outdated Show resolved Hide resolved
edgecolor='0.2',
ax=axes)

axes.set(ylabel='Reached unique edges coverage')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edges coverage -> edge coverage (or just coverage)

"""Returns the filestore name of the |fuzzer_name|."""
filestore_path = self._get_experiment_filestore_path(fuzzer_name)
gcs_prefix = 'gs://'
gcs_http_prefix = 'https://storage.googleapis.com/'
Copy link
Member

@lszekeres lszekeres Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jonathan's original question is more relevant to line 63 of coverage_data_utils.py (filestore_utils.cp(src_file, dst_file)), which copies the JSON file first with gsutil cp or with cp. If we want to eliminate generate_report.py's dependency on gstuil (which would be a good idea), what we could do is:

  1. in case of gs:// path, get the file via http from storage.googleapis.com (eg with urllib)
  2. in case of a local path, simply open the file directly

wdyt?

analysis/coverage_data_utils.py Show resolved Hide resolved
@Shadoom7
Copy link
Contributor Author

A successful test experiment with 3 benchmarks and 5 fuzzers

@lszekeres
Copy link
Member

A successful test experiment with 3 benchmarks and 5 fuzzers

So nice!!

Super minor potential visual improvement: in the pairwise unique coverage matrix, put the fuzzers in the same order left to right as they are in the ranking plot above.

@jonathanmetzman
Copy link
Contributor

A successful test experiment with 3 benchmarks and 5 fuzzers

So nice!!

Super minor potential visual improvement: in the pairwise unique coverage matrix, put the fuzzers in the same order left to right as they are in the ranking plot above.

Agreed this demo is very impressive.
Some more visual feedback:

  1. The spacing between fuzzer names in the rows would be nice to improve: In https://storage.googleapis.com/my-fuzzbench-reports/differential-plots-3/bloaty_fuzz_target_pairwise_unique_coverage_plot.svg mopt and honggfuzz look like one word.
  2. I think we switched columns from having a color and a dark shadow to being transparent (with borders) and a shadow with colors? Was this intentional, I liked the original version better (do others disagree?).

@Shadoom7
Copy link
Contributor Author

A successful test experiment with 3 benchmarks and 5 fuzzers

So nice!!
Super minor potential visual improvement: in the pairwise unique coverage matrix, put the fuzzers in the same order left to right as they are in the ranking plot above.

Agreed this demo is very impressive.
Some more visual feedback:

  1. The spacing between fuzzer names in the rows would be nice to improve: In https://storage.googleapis.com/my-fuzzbench-reports/differential-plots-3/bloaty_fuzz_target_pairwise_unique_coverage_plot.svg mopt and honggfuzz look like one word.

Yeah good point. This issue may get worse when more fuzzers are added. I think I can change it so that it rotates the words the way we did in Ranking by median reached coverage plot. Wdyt?

  1. I think we switched columns from having a color and a dark shadow to being transparent (with borders) and a shadow with colors? Was this intentional, I liked the original version better (do others disagree?).

Yes it is intentional. I just thought it is clearer since the unique edge coverage is the focus here. Another reason I did this is that it is too hard to put numbers with the origin plot( you can't put it above the colored bar because the number shows the unique coverage; if you put it above the shadow bar then it is a bit messy with all different colors in the background)

@inferno-chromium
Copy link
Collaborator

More enchancements can come in incremental new CLs, merging this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants