Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide for fuzzer-users #1645

Closed
workingjubilee opened this issue Mar 14, 2023 · 10 comments · Fixed by #1646
Closed

Guide for fuzzer-users #1645

workingjubilee opened this issue Mar 14, 2023 · 10 comments · Fixed by #1646
Assignees

Comments

@workingjubilee
Copy link

workingjubilee commented Mar 14, 2023

There's quite a lot of documentation on how to build, run, and debug the compiler. That's good! However, there's a notable subsection of compiler contributors that needs special direction:

People who bring in a fuzzer and make rustc crab rave until the next ICE Age.

And as Rust's profile increases, more of them show up, as more and more people use the Rust compiler in their research projects, and fuzzing happens to have a lot of interesting computer science research attached to it. And they open issues. Lots of issues. Our own @matthiaskrgr has opened over 500 issues. And over 400 of them have been fixed!

Almost-inevitably the first few issues include at least one "total whiff": it's code that produces an ICE, sure. However, it was generated with something like a specific compilation flag combined with internally unstable features used to implement the core library, to produce some ICE that we probably will not bother to fix, despite the local enthusiasm for dropping ICE on GitHub. Usually after a bit of feedback they dial in on issues that are a bit "higher value", but having to have that back-and-forth with each person feels like it's going to involve a lot of repetition. If they're mostly-automating bug-finding, it seems reasonable to automate (or at least cache in documentation) some of our responses. This documentation thus may cover more "how to file an issue that is useful" but still.

In particular, for such documentation, we need to capture the differences in how best to go about. "fuzzing the compiler" and "fuzzing the nightly compiler" and "fuzzing the internal features of the compiler". We don't want people to never fuzz things that require internal features, because something like "the MIR validator breaks on trying to validate some unusual MIR" is really useful to find out, even though that's an internal feature. But if the code requires misusing magic attributes that change layout or use an internal intrinsic outside its parameters, we probably will shrug that off. Likewise there's "fuzzing a nightly feature" vs. "fuzzing a nightly feature that had its prototype skeleton impl with 100 todo!()s landed yesternight".

Pinging Matthias because perhaps he may have some thoughts on what he would say to his clone and/or to himself after traveling back in time.

@matthiaskrgr
Copy link
Member

Well, there are different approaches to this.

What we have
(no guarantee of completeness or correctness 🙈 )

There is https://github.com/dwrensha/fuzz-rustc by @dwrensha which builds rustc with instrumentation and libfuzzer and then performs coverage based fuzzing (this is great for finding bugs in the frontend where the initial parsing of code happens.

@jruderman built on top of that and added a rustc specific mutator https://github.com/jruderman/fuzz-rustc that is smarter than just coverage based fuzzing and I also think his version is able to avoid "noise"-ices a bit better by passing in custom compiler flags.

There is https://github.com/matthiaskrgr/icemaker/ which I can talk most about because I wrote it myself.
This is by itself not a fuzzer, but rather a big parallel

for tool  in tools {
 for flags in rustcflags {
   for file in files {
       exec(tool, flags, file).filter_ice()
     }
   }
}

executor.
The downside is that I need some kind of initial code to work with.
The upside is that I do not rely on instrumented code, so I can also run clippy, miri, rustdoc, rustfmt, kani, or cg_clif etc and filter out the crashes and interesting outputs, UB found by miri, applyable diagnostics that actually break code etc...
About 70% of the crashes I found are probably from code inside the rustc repo (I have checked out every modification of every file contained in the rustc repo side by side which totals to around 15 gb of code iirc) which is a great dataset because most code that was once used as a test probably still shouldn't crash rustc.
I can also do things like "let's try to build the entire rustc testsuite with -Zmir-opt-level=3 or the gat-feature before it gets stabilized" to avoid having to track down and minimize bugs when they appear in "real world" code (through crater or whatever). I usually try to run this on a daily basis on a up to date rustc repo and file any new bugs that show up during this. (I had this automatized in CI so I could just check the log-diff each morning but something broke there and my new laptop is faster than github actions ci right now anyway...)

This week I found out about https://github.com/langston-barrett/tree-splicer by @langston-barrett which as far as I understand is a more advanced code generator and it reads and parses rustc files as input and can then splice together new files from it.
I imagine it a bit like reading a rust file, generating a syntax tree, and then attaching and/or replacing random nodes with/to other nodes to generate new code. (Langston can probably go into more detail here 😄 )
I am currently using this to generate input code that is then checked by icemaker.

Common pitfalls
When using in input data set to generate new code, you most likely don't want to use code that you know already crashes the compiler (to avoid duplicates and ice-noise).
I think the instrument-fuzzer based tools used to first compile all their inputs files and only select the files that compiled without crash, or even without syntax error for further processing.

When generating code, there are a couple of patterns you can filter out from the start, such as break rust ,

#![feature(rustc_attrs)]
#[rustc_error(delay_span_bug_from_inside_query)]

#[rustc_symbol_name]
#![no_std] code

and custom_mir / #[custom_mir.. / mir!(... tests.

sometimes internal rustc tests are also marked as "we know this is a bug that is tracked, but we definitely want to know if we fix it by accident and looks somewhat like this:

// build-fail
// known-bug: #95134
// compile-flags: -Copt-level=0
// failure-status: 101
// dont-check-compiler-stderr

Reporting

My personal opinion is that I would rather have 2 duplicate tickets in my bugtracker than have an unknown bug out in the wild, but of course we should try to avoid duplicates.
I usually try to search for the ICE message in the bugtracker, such as const parameter types cannot be generic, tcx.resolutions(()) is not supported for local crate or self.scc_universes[scc] == ty::UniverseIndex::ROOT but sometimes Option::unwrap() on a None value or index out of bounds: the len is 4 but the index is 2147483649 are not that descriptive, then you can still try to search for the rust compiler source file that the ICE occurred in, but sometimes ICE messages get renamed, code is moved, or something changes so that the same file still crashes but in a different part of the compiler so it's a bit tricky. It is also not uncommon to see a single ICE happen in like 200 different input files and wondering "is that all the same bug?"

I have been thinking a bit if we could have some kind of "ice index" so that at least each intentional ICE, such as assert!() or unimplemented!() gets an ID (similar to the rustc error index) or other unique identifier so that we can identify identical ICEs and can simply search for that in the bugtracker. Perhaps @estebank who has been working on ICE templates has some more ideas here or knows if this is doable? :)

It is usually useful to have a minimal code sample you can try to minimize using https://github.com/Nilstrieb/cargo-minimize by @Nilstrieb or https://github.com/langston-barrett/treereduce Langston but when reducing automatically, it may also happen that the reduction degenerates valid code that triggers an ICE into invalid code that triggers an ICE because all the tool cares for is ICE: yes or no?.

Then, last but not least you can try to bisect the minimal code sample using https://github.com/rust-lang/cargo-bisect-rustc to find out which commit caused this, if this is a regression introduced two days ago or if this is a stable ice that we have been carrying around for years (probably not that critical then since nobody noticed until now).

I hope y'all don't mind the pings but since all the tools here are somewhat related and aim for similar goals I thought you also might have some experiences or ideas that you wanted to share :)

@Nilstrieb
Copy link
Member

Thank you for all your work! But I don't think cargo-minimize is useflu for fuzzed ICEs, as its main goal is being very coarse. If you have an ICE in a 40k loc codebase, cargo-minimize is for you. But for tiny changes I wouldn't recommend it and it won't do much.

@langston-barrett
Copy link
Contributor

Thanks for starting this discussion! As someone who has required and benefited from such guidance from maintainers, I agree that this documentation would be very useful.

Here are some points I think such docs could include:

  • Clarify if the rustc maintainers want people to fuzz. I didn't have to test tree-splicer on rustc. I would have been more likely to if I read "The Rust project appreciates and welcomes fuzzed bugs that conform to these guidelines", or correspondingly less likely to if I read "The Rust project doesn't actively encourage fuzzing but allows fuzzed bug reports according to these guidelines."

  • @compiler-errors recently pointed this out to me: ICEs with the same line in rustc but different query stacks are probably unique bugs.

  • Re: minimization. @matthiaskrgr said above that it can change valid programs into invalid programs. This is true in one sense, but one can also create an effective minimization script that forces the reducer to avoid introducing syntax errors (or any other class of error message that can be greped). I can write a guide to doing this with treereduce-rust, if that's helpful. In any case, perhaps the docs should recommend posting both the original file that found the crash and an auto-minimized version? FWIW, in addition to my own tool (treereduce), halfempty is a high-quality (though syntax-unaware) reduction tool.

  • The test case should be rustfmted, if it maintains the ICE.

  • Pointer to instructions on adding ICEs to https://github.com/rust-lang/glacier

  • Guides for effective fuzzing:

    • Finding high-quality corpora. Like @matthiaskrgr said, these should be not-already-ICEing inputs. Candidates include the rustc/Clippy/rust-analyzer test suites, and fixed glacier tests.
    • Choosing different compiler flags/configurations. For example, there's no reason to actually generate machine code unless you also want to fuzz LLVM.
    • This is a bit more niche, but maybe links to guides on building rustc with PGO/BOLT. In fuzzing, executions/sec matter a lot. You may also want -C target-cpu=native.

Perhaps there should be a different issue template for fuzzed bugs? It could add a "fuzzed" label, contain a link to this documentation, and maybe include a checklist for reporters. Alternatively, the ICE bug template could ask reporters to note in the description if the bug is fuzzed, and also link to these docs.

It might be nice if rustc could emit a different message when rust_attrs, no_core, etc. are active and an ICE is found, perhaps pointing to these docs rather than unconditionally encouraging a bug report.

@compiler-errors
Copy link
Member

Clarify if the rustc maintainers want people to fuzz. I didn't have to test tree-splicer on rustc. I would have been more likely to if I read "The Rust project appreciates and welcomes fuzzed bugs that conform to these guidelines", or correspondingly less likely to if I read "The Rust project doesn't actively encourage fuzzing but allows fuzzed bug reports according to these guidelines."

I think you'd find people all across this spectrum in the project, but my feeling is that ICE reports are welcome (except for those categories pointed out above, like rustc_* attrs or no_core or whatever) as long as it's clear that there was care in reporting the bugs. Things like minimization (ideally, attempting to remove spurious other errors would be the best), formatting, cursory check for duplicates, etc.

Perhaps there should be a different issue template for fuzzed bugs? It could add a "fuzzed" label, contain a link to this documentation, and maybe include a checklist for reporters.

This would be nice. I would like fuzzed issues to start out as "P-low", especially if they use nightly features or obviously sequences of tokens that are not likely encountered in the wild.

This is a bit more niche, but maybe links to guides on building rustc with PGO/BOLT. In fuzzing, executions/sec matter a lot. You may also want -C target-cpu=native.

Alternatively, you could probably do a lot of effective fuzzing with the same emit-metadata-only rustc flags that we run something like cargo check for, since the majority of ICEs are coming from the front end of the compiler.

@compiler-errors
Copy link
Member

Oh! Also advising the fuzzer to bisect with cargo-bisect-rustc would also be great. That cuts out a step that usually is critical when it comes to fuzzer ICEs -- knowing what PR to blame cuts out a bunch of debugging usually.

@matthiaskrgr
Copy link
Member

Didn't we have a bot for that at some point? https://github.com/bjorn3/cargo-bisect-rustc-bot

@matthiaskrgr
Copy link
Member

minimization

I think someone already tried halfempty at some point but it was quite difficult to get it to run for some reason.

I've already spent hours trying to reduce crashes in entire crates somehow and it would be lovely if that worked too, somehow.

Being able to reduce code in in cases that don't crash (like codegen regressions, clippy diagnostics bugs etc) will probably be useful all around. :)

except for those categories pointed out above, like rustc_* attrs or no_core or whatever

I wonder if we could actually intercept these ices (I think these cause usually easy-to-tell-errors?) and hide the "report me" link or change the backtrace a bit to make it more obvious that this is a "we don't care" issue.

This would be nice. I would like fuzzed issues to start out as "P-low", especially if they use nightly features or obviously sequences of tokens that are not likely encountered in the wild.

I don't think this is a good idea as it causes these issues to more likely fly under the radar.

I have already seen a couple of cases (for example the forcing query with already existing DepNode ices) where we already did have a mvce from years ago and people still kept hitting it and sunk hours into reducing entire crates to come up with basically the same snippet that had already been reported.
Sometimes you may also find an a bug via fuzzing and someone else reported this as "my crate crashed rustc" years ago, but that issue was closed as "cannot reproduce" or lays stale. IMO this actually makes fuzzed findings more valuable since you don't have to dig through entire crates but already have a kinda small set of code.

Alternatively, you could probably do a lot of effective fuzzing with the same emit-metadata-only rustc flags that we run something like cargo check for, since the majority of ICEs are coming from the front end of the compiler.

I think if you -o/dev/null, rustc will only run until it tries to codegen and then abort. I use this hack quite extensively 😅

@langston-barrett
Copy link
Contributor

@rustbot claim

@workingjubilee
Copy link
Author

It might be nice if rustc could emit a different message when rust_attrs, no_core, etc. are active and an ICE is found, perhaps pointing to these docs rather than unconditionally encouraging a bug report.

We're moving forward on an internal_features lint so we'll have room to hang more diagnostics on that.

@workingjubilee
Copy link
Author

I agree that "fuzzed issues are low priority" is in a broad sense false. If we find a regression on a "yeah, that looks like something a human might actually write" input through fuzzing, and that bug was introduced recently, that's a stable-to-nightly regression that we want to catch right away and we may not want to have that flagged P-low. Even if it is objectively not a big deal, "fresh" bugs are important to nail down, and often get higher priorities, even when they would have been labeled P-low if it had slipped onto stable-to-stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants