Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upFuzzing incremental compilation against crates.io packages #33454
Comments
nikomatsakis
added
P-medium
E-mentor
T-compiler
A-incr-comp
labels
May 6, 2016
This comment has been minimized.
This comment has been minimized.
|
Hi, Matsakis. It looks fun, I want to work on this. Actually the workflow you described looks clear, I need more details.
|
This comment has been minimized.
This comment has been minimized.
|
FWIW I've got a script which @jonathandturner has also been working with which is essentially "use cargo to iterate over all crates on crates.io and build them", and that may end up being useful here! |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton Hi, could you offer the link of the script? It must be helpful! |
This comment has been minimized.
This comment has been minimized.
|
Certainly! I've pushed it up here -- https://github.com/alexcrichton/cargo-apply It doesn't have much documentation, and it's kinda a big hack, but the gist is:
|
This comment has been minimized.
This comment has been minimized.
|
Great to see so much interest! I want to point out that there are more opportunities for fuzzing, so whatever we do, it might make sense to try and have some common code with multiple modes. Some other examples I've been thinking about:
|
This comment has been minimized.
This comment has been minimized.
|
It seems as though others have expressed interest already, but I'd enjoy helping out if there's space for one more. |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis
|
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis
Hope for ur advice! |
This comment has been minimized.
This comment has been minimized.
What I meant by "load" was "download the sources and expand them into a directory". You're right that once we've done that once there isn't really a need to do anything more after that.
Hmm. :) Good question! The code looks like it sort of roughly does the right thing, but for incremental mode, it will have to first run the compiler (roughly as you are doing now) with the source from v1 and then run the compiler again with the source from v2 (it seems like you don't currently have the notion of two revisions, just one). |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis Now the issue is: |
This comment has been minimized.
This comment has been minimized.
|
@mrmiywj neat, I will try to take a look tomorrow. I think that the compiler output ought to be comparable word-by-word, perhaps with some filename normalization...is that not the case? As for compiling the binary generating, that's a trickier problem. I do not believe they will necessarily be binary identical, because there is some freedom permitted to the linker to reorder symbols and the like. We might be able to dump some of the compiler's internal representation (or the LLVM IR) can compare that, though. I'll have to experiment a bit! |
This comment has been minimized.
This comment has been minimized.
|
@mrmiywj I am sorry it took me so long to get back to you! I just did a detailed read through your crate and ran some experiments. It does indeed seem quite close. A few things:
|
This comment has been minimized.
This comment has been minimized.
|
Thinking more on it let's start with the "just check that compilation succeeds" -- we can scour for ICEs and the like. Then we can add on checking the output, running tests, etc. The main thing then is to add some kind of mode so that we can let this thing run over night and come back to a big list of results, I guess. |
This comment has been minimized.
This comment has been minimized.
|
(Ideally, it would log its progress or leave notes so that it can be aborted and restart without losing its place. I am imaging just a list of tested items, or perhaps generating a file for each crate and rev with some distinctive name, so you can later check for it to know if the work has already been done.) |
arielb1
self-assigned this
May 24, 2016
This comment has been minimized.
This comment has been minimized.
|
I am doing this as a university project. |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis
|
This comment has been minimized.
This comment has been minimized.
|
@arielb1 let's talk, I think we should avoid duplicating effort, but there are lots of kinds of fuzzing to pursue. For example, fuzzing based on git commits or maybe some kind of random changes. |
This comment has been minimized.
This comment has been minimized.
Yes. |
nikomatsakis
added this to the
Incremental compilation alpha milestone
Aug 10, 2016
This comment has been minimized.
This comment has been minimized.
|
So, I started hacking on a related, but different tool: https://github.com/nikomatsakis/cargo-fozzy/ It basically works now though I plan to make a number of enhancements (you can see the open issues). I'm going to close this issue in favor of enhancing that tool -- if anyone would like to contribute, please check it out! |
nikomatsakis
closed this
Aug 31, 2016
This comment has been minimized.
This comment has been minimized.
|
Ah, let me leave some notes for posterity. I did some investigation into crates.io metadata but found that it was more difficult than I expected to do the kinds of "incremental update and rebuild/run-tests" than I expected. For example, crates.io will rewrite metadata in the repo which means that, in general, you can't necessarily run tests on packages pulled out from crates.io (at least according to @alexcrichton). In addition, git histories seem to offer a more fine-grained history, so I decided to just tackle that first. @mrmiywj I wanted to thank you for your energy in producing the initial script :) please check out cargo fozzy! |
nikomatsakis commentedMay 6, 2016
To really have confidence in incremental compilation, we need to do large-scale testing. One specific way to do this is to fuzz against different versions of crates from crates.io. This corresponds to a very particular, and common, workflow:
cargo updateto get new packagescargo buildto build with themThe idea here would be to write a tool which iterates over the last N published crates (that is, the last N versions that have been published to crates.io). For each version V that has been published, we would:
-Z incremental=some-temporary-directory-Z incremental=some-temporary-directoryAs of this writing, we're not yet at the point where incremental compilation actually works, but we are already tracking dependencies throughout the front-end; just tracking those dependencies sometimes causes ICEs, which means this tool would be of immediate use. Plus, we expect to be tracking dependencies soon.
I imagine that this tool would be written in Rust. Rather than placing it in
src/test/tools, it might be better if it lived as its own crate (maybe inrust-lang?), so that it can draw on the full crates.io ecosystem for utility functions and so forth. We would eventually just run it on a regular and continous basis and open issues for any problems that we find.Since it has relatively few prerequisites, this bug seems like a good "entry vector" for contributing to Rust. I would love to "mentor" someone to work on it.