Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated fuzz target filtering #185

Merged
merged 4 commits into from
Mar 28, 2024
Merged

Automated fuzz target filtering #185

merged 4 commits into from
Mar 28, 2024

Conversation

DonggeLiu
Copy link
Collaborator

No description provided.

@DonggeLiu DonggeLiu added the Experiment-only A PR only to run experiments, do not merge it to main. label Mar 27, 2024
three types of cases are handled:
- driver that has no coverage increase
- false positives by crashing at INITED or first few rounds
- false positives by crashing at `LLVMFuzzerTestOneInput`
@DonggeLiu
Copy link
Collaborator Author

DonggeLiu commented Mar 27, 2024

@DonggeLiu
Copy link
Collaborator Author

DonggeLiu commented Mar 27, 2024

@happy-qop Started the experiment here.
The report above shows your filter already helped filter out many invalid crashes. Thanks!
(Please do let me know if you cannot access it.)

Several things I wanted to do before merging this to maximize the impact of your filters:

  1. Surface filter categories in the report. The report only shows crash status as a boolean value, it would be more reader-friendly to show why we filtered each crash (e.g., FP_CRASH_NEAR_INIT).
  2. Run a full experiment. Once the above is done, we are ready for a full experiment with benchmarks in the all/ benchmark set.

I can work on both and let you know once the experiment starts.
The following four days are holidays in Australia, so I might be late in replies / commits, hope you won't mind :)

@happy-qop
Copy link
Contributor

happy-qop commented Mar 28, 2024

Glad to hear that it helps!

No, I cannot access these crash reports~

Surface filter categories in the report.

I would like to do this but not sure what is the preferred implementation way to align with your following workflow such as how your experiment scripts work with the new classification data. Besides, this classification information also highly relates with the discussed fix prompt strategies.
If you can hint on this, I can adapt my existing code such as further classification & fix prompts to here in the next few days (perhaps another PR).

The following four days are holidays in Australia, so I might be late in replies / commits, hope you won't mind :)

Have fun for your holiday!

@DonggeLiu
Copy link
Collaborator Author

No, I cannot access these crash reports~

What's your preferred email address to access them?
We can test adding you after returning from holiday (the coming Tuesday).

I would like to do this but not sure what is the preferred implementation way to align with your following workflow such as how your experiment scripts work with the new classification data.

I am thinking to start with a very preliminary changes:

  1. Replace the current crash with your is_driver_fuzz_err.
  2. Show the FP reason in the report HTML.

Given you cannot see the report now, it might be quicker for me to do it.

Besides, this classification information also highly relates with the discussed fix prompt strategies. If you can hint on this, I can adapt my existing code such as further classification & fix prompts in the next few days.

Are there any other more sophisticated changes you'd like to propose?
I'd love to hear more : )

@happy-qop
Copy link
Contributor

happy-qop commented Mar 28, 2024

What's your preferred email address to access them?

The gmail used for the meeting should be fine.

I am thinking to start with a very preliminary changes:

  1. Replace the current crash with your is_driver_fuzz_err.
  2. Show the FP reason in the report HTML.

Cool! Please go ahead for this and I can learn from your changes for further implementation.

Are there any other more sophisticated changes you'd like to propose?

The error filtered here is of driver runtime error, identifying them and proposing corresponding fix prompts is part of the workflow of implementing error type specific fix prompts (FIX_FUZZ_XXX at here).

@DonggeLiu
Copy link
Collaborator Author

Push the first version here: #191

I may adjust it a bit and will update you once it is ready : )

The error filtered here is of driver runtime error, identifying them and proposing corresponding fix prompts is part of the workflow of implementing error type specific fix prompts (FIX_FUZZ_XXX at here).

I see, that seems to be a big change indeed.
Let's do that in a separate PR later, then!

Show crash type in HTML reports and JSON summary.
@DonggeLiu DonggeLiu changed the title [DO NOT MERGE] Experiment with automated fuzz target filtering Automated fuzz target filtering Mar 28, 2024
@DonggeLiu DonggeLiu merged commit b4928d2 into main Mar 28, 2024
3 checks passed
@DonggeLiu DonggeLiu deleted the DonggeLiu-patch-2 branch March 28, 2024 12:26
break

else:
# Another error driver case: no cov increase.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I believe we have seen cases where the fuzz target is legitimate even when there is no cov increase. This is often when a new target fuzzes a very buggy function that has never been fuzzed before, with a very shallow bug that is instantly triggered.

@DonggeLiu to confirm if this is an issue.

We may want to think about some kind of confidence score here instead of a binary yes/no for filtering crashes/fuzz targets.

Copy link
Collaborator Author

@DonggeLiu DonggeLiu Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will check with old known bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverchang @DonggeLiu Is it possible to also share the example cases with me, I'm interested in these cases and wondering if I also can contribute on improving this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, would it be possible to capture LeakSanitizer: detected memory leaks?
They are usually not security-related and caused by the fuzz target, it would be useful to filter them (or give them a low confidence score). E.g.,
https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-04-03-198-dg-comparison/sample/output-hiredis-rediscommand/01

Copy link
Contributor

@happy-qop happy-qop Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! I had that idea when implementing the initial version error driver filtering, but leave it as future work since I'm not sure what is your expected way for handling LEAK cases (link). I'll handle that together.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fantastic, much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Experiment-only A PR only to run experiments, do not merge it to main.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants