Enable reuse of `.o` files if nothing has changed #34956

Merged
merged 9 commits into from Jul 29, 2016

Projects

None yet

6 participants

@nikomatsakis
Contributor
nikomatsakis commented Jul 21, 2016 edited

This PR completes a first "spike" for incremental compilation by enabling us to reuse .o files when nothing has changed. When in incr. mode, we will save .o files into the temporary directory, then copy them back out again if they are still valid. The code is still a bit rough but it does seem to work. =)

r? @michaelwoerister

Fixes #34036
Fixes #34037
Fixes #34038

@michaelwoerister
Contributor

Excellent :) I'll take a look at it first thing tomorrow.

@jonas-schievink
Contributor

I just tried to build glium with this and I'm hitting multiple ICEs about local/foreign items:

https://gist.github.com/jonas-schievink/536a1ad7aa065193e4954242c9885cbd

(I've set RUSTFLAGS=-Zincremental=target/INCREMENTAL before running cargo)

@michaelwoerister michaelwoerister commented on an outdated diff Jul 22, 2016
src/librustc/dep_graph/dep_node.rs
@@ -229,3 +239,14 @@ impl<D: Clone + Debug> DepNode<D> {
}
}
}
+
+/// A "work product" corresponds to a `.o` (or other) file that we
+/// save in between runs. These ids do not have a DefId but rather
+/// some independent path or string that persists between runs without
+/// the need to be mapped or unmapped. (This ensures we can serialize
+/// them even in the absence of a tcx.)
+#[derive(Clone, Debug, PartialEq, Eq, Hash, RustcEncodable, RustcDecodable)]
+pub enum WorkProductId {
+ PartitionObjectFile(String), // see (*TransPartition) below
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

Nit: Our nomenclature around "partitions", "codegen units", "translation units", etc is a bit inconsistent. We are using these names more or less interchangeably and it would probably be good to decide on the canonical term.

@michaelwoerister michaelwoerister commented on an outdated diff Jul 22, 2016
src/librustc/dep_graph/graph.rs
- data: Rc<DepGraphThreadData>
+ data: Rc<DepGraphData>
+}
+
+struct DepGraphData {
+ /// we send messages to the thread to let it build up the dep-graph
+ /// from the current run
+ thread: DepGraphThreadData,
+
+ /// when we load, there may be `.o` files, cached mir, or other such
+ /// things available to us. If we find that they are not dirty, we
+ /// load the path to the file storing those work-products here into
+ /// this map. We can later look for and extract that data.
+ previous_work_products: RefCell<FnvHashMap<Arc<WorkProductId>, WorkProduct>>,
+
+ /// work-products that we generate in this run
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

Nit: These doc-comments should start with an uppercase letter.

@michaelwoerister michaelwoerister commented on an outdated diff Jul 22, 2016
src/librustc/dep_graph/graph.rs
+ /// Access the map of work-products created during this run. Only
+ /// used during saving of the dep-graph.
+ pub fn work_products(&self) -> Ref<FnvHashMap<Arc<WorkProductId>, WorkProduct>> {
+ self.data.work_products.borrow()
+ }
+}
+
+/// A "work product" is an intermediate result that we save into the
+/// incremental directory for later re-use. The primary example are
+/// the object files that we save for each partition at code
+/// generation time.
+///
+/// Each work product is associated with a dep-node, representing the
+/// process that produced the work-product. If that dep-node is found
+/// to be dirty when we load up, then we will delete the work-product
+/// at load time. If the work-product is found to be clean, the we
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

s/the/then

@michaelwoerister michaelwoerister commented on an outdated diff Jul 22, 2016
src/librustc_driver/driver.rs
@@ -88,7 +88,7 @@ pub fn compile_input(sess: &Session,
// We need nested scopes here, because the intermediate results can keep
// large chunks of memory alive and we want to free them as soon as
// possible to keep the peak memory usage low
- let (outputs, trans) = {
+ let (outputs, trans, id) = {
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

crate_name would be a better variable name than id

@michaelwoerister michaelwoerister commented on an outdated diff Jul 22, 2016
src/librustc_incremental/persist/load.rs
+fn load_dep_graph_if_exists<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>) {
+ let dep_graph_path = dep_graph_path(tcx).unwrap();
+ let dep_graph_data = match load_data(tcx.sess, &dep_graph_path) {
+ Some(p) => p,
+ None => return // no file
+ };
+
+ let work_products_path = tcx_work_products_path(tcx).unwrap();
+ let work_products_data = match load_data(tcx.sess, &work_products_path) {
+ Some(p) => p,
+ None => return // no file
+ };
+
+ match decode_dep_graph(tcx, &dep_graph_data, &work_products_data) {
+ Ok(()) => return,
+ Err(err) => bug!("decoding error in dep-graph from `{}` and `{}`: {}",
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

Is this necessarily a bug? At least it seems like something the compiler can recover from by just ignoring it.

@michaelwoerister michaelwoerister commented on the diff Jul 22, 2016
src/librustc_incremental/persist/load.rs
}
}
-pub fn load_dep_graph_if_exists<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>, path: &Path) {
+fn load_data(sess: &Session, path: &Path) -> Option<Vec<u8>> {
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

At some point we'll want to use memory maps for things like these. (just an observation, not related to this PR specifically)

@michaelwoerister michaelwoerister commented on an outdated diff Jul 22, 2016
src/librustc_incremental/persist/load.rs
}
+/// Decode the dep graph and load the edges/nodes that are still clean
+/// into `tcx.dep_graph`. On success, returns a hashset containing all
+/// the paths of work-products from clean nodes (any work-products not
+/// in this set can be deleted).
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

It looks like there is no hashset returned and the function takes care of deleting invalidated work-products itself.

@michaelwoerister michaelwoerister commented on an outdated diff Jul 22, 2016
src/librustc_trans/back/write.rs
@@ -913,13 +929,46 @@ fn build_work_item(sess: &Session,
}
}
+fn link_or_copy<P: AsRef<Path>, Q: AsRef<Path>>(p: P, q: Q) -> io::Result<()> {
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

This looks like something that could go into some utils module.

@michaelwoerister michaelwoerister commented on an outdated diff Jul 22, 2016
src/librustc_trans/base.rs
@@ -2279,7 +2283,14 @@ fn internalize_symbols(cx: &CrateContextList, reachable: &HashSet<&str>) {
// Examine each external definition. If the definition is not used in
// any other compilation unit, and is not reachable from other crates,
// then give it internal linkage.
- for ccx in cx.iter() {
+ for ccx in cx.iter_need_trans() {
+ if ccx.sess().opts.debugging_opts.incremental.is_some() {
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

Why not just exit at the beginning of the function?

@michaelwoerister michaelwoerister commented on an outdated diff Jul 22, 2016
src/librustc_trans/partitioning.rs
@@ -319,10 +362,8 @@ fn place_root_translation_items<'a, 'tcx, I>(tcx: TyCtxt<'a, 'tcx, 'tcx>,
if codegen_units.is_empty() {
let codegen_unit_name = InternedString::new(FALLBACK_CODEGEN_UNIT);
codegen_units.entry(codegen_unit_name.clone())
- .or_insert_with(|| CodegenUnit {
- name: codegen_unit_name.clone(),
- items: FnvHashMap(),
- });
+ .or_insert_with(|| CodegenUnit::new(codegen_unit_name.clone(),
+ FnvHashMap()));
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

Could use CodegenUnit::empty(codegen_unit_name.clone()) here too.

@michaelwoerister michaelwoerister and 1 other commented on an outdated diff Jul 22, 2016
src/librustc_trans/trans_item.rs
@@ -88,6 +99,13 @@ impl<'a, 'tcx> TransItem<'tcx> {
}
}
TransItem::Fn(instance) => {
+ let _task;
+
+ if instance.def.is_local() {
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

Why only for local items?

@nikomatsakis
nikomatsakis Jul 22, 2016 Contributor

Why only for local items?

Yeah, hmm, a good question. :) That might be wrong.

@michaelwoerister michaelwoerister commented on the diff Jul 22, 2016
src/librustc_trans/assert_module_sources.rs
+//
+// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+// option. This file may not be copied, modified, or distributed
+// except according to those terms.
+
+//! This pass is only used for UNIT TESTS related to incremental
+//! compilation. It tests whether a particular `.o` file will be re-used
+//! from a previous compilation or whether it must be regenerated.
+//!
+//! The user adds annotations to the crate of the following form:
+//!
+//! ```
+//! #![rustc_partition_reused(module="spike", cfg="rpass2")]
+//! #![rustc_partition_translated(module="spike-x", cfg="rpass2")]
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

I personally would find rustc_partition_reused vs rustc_partition_not_reused be clearer than rustc_partition_reused vs rustc_partition_translated.

@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

I just figured that there is the third case of a partition being neither re-used nor re-translated but removed entirely. If this third deleted case was added to the list then reused vs translated makes more sense.

@nikomatsakis
nikomatsakis Jul 22, 2016 Contributor

I just figured that there is the third case of a partition being neither re-used nor re-translated but removed entirely.

I was anticipating a third-case when I did the names, but I'm not sure now if that makes sense. The third case I had in mind was one where we had saved the LLVM IR pre-optimization -- but in that case, I realize now you might still want Translated. Or at least the backend, optimizing code doesn't care where the LLVM IR came from (though we might use another variant just for unit testing).

I hadn't thought about the case of deleting a partition -- I don't know if we would need that for any reason? Just to cleanup the saved file, are you thinking? I had imagined we would do some sort of "GC" where, on startup--or maybe termination--we just crawl the incr-comp directory and delete random files that don't correspond to known work-products. In fact, I had such code, but I remember now I wound up deleting it because in the newer scheme I didn't need it to get things working.

@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

I don't know if we would need that for any reason? Just to cleanup the saved file, are you thinking?

Yes that, but also logically it's another case, right? There'd be something wrong if I removed a module from the source code but there'd still be a WorkProduct listed for it.

I had imagined we would do some sort of "GC"

Isn't this covered by reconcile_work_products() anyway?

@nikomatsakis
nikomatsakis Jul 22, 2016 Contributor

Yes that, but also logically it's another case, right?

Hmm, well, logically yes, though (at least now) we don't create a ModuleTranslation for modules that used to exist in some previous compilation but no longer do, since there is no work to do in such a case -- but I guess we could just assert that (indeed) no module exists with that name.

Isn't this covered by reconcile_work_products() anyway?

Yeah, I forgot about that. Indeed, for the most part it is, which is why I delete the other code I had that just scraped up files with "suspicious looking" names.

@michaelwoerister michaelwoerister commented on the diff Jul 22, 2016
src/librustc_trans/assert_module_sources.rs
+//! from a previous compilation or whether it must be regenerated.
+//!
+//! The user adds annotations to the crate of the following form:
+//!
+//! ```
+//! #![rustc_partition_reused(module="spike", cfg="rpass2")]
+//! #![rustc_partition_translated(module="spike-x", cfg="rpass2")]
+//! ```
+//!
+//! The first indicates (in the cfg `rpass2`) that `spike.o` will be
+//! reused, the second that `spike-x.o` will be recreated. If these
+//! annotations are inaccurate, errors are reported.
+//!
+//! The reason that we use `cfg=...` and not `#[cfg_attr]` is so that
+//! the HIR doesn't change as a result of the annotations, which might
+//! perturb the reuse results.
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

Good thinking :)

@michaelwoerister michaelwoerister and 1 other commented on an outdated diff Jul 22, 2016
src/test/incremental/spike-neg1.rs
+ pub struct X {
+ x: u32, y: u32,
+ }
+
+ #[cfg(rpass1)]
+ fn make() -> X {
+ X { x: 22, y: 0 }
+ }
+
+ #[cfg(rpass2)]
+ fn make() -> X {
+ X { x: 11, y: 11 }
+ }
+
+ #[rustc_dirty(label="TypeckItemBody", cfg="rpass2")]
+ #[rustc_clean(label="ItemSignature", cfg="rpass2")]
@michaelwoerister
michaelwoerister Jul 22, 2016 Contributor

Does it make sense to have these tests here? If they failed for some reason, they might hide that the rustc_partition_reused framework is broken.

@nikomatsakis
nikomatsakis Jul 22, 2016 Contributor

Does it make sense to have these tests here? If they failed for some reason, they might hide that the rustc_partition_reused framework is broken.

True. Removed.

@michaelwoerister
Contributor

OK, I've done a first pass over this. It's so nice to see all the work from the last 10 months finally coming together! (I mean, not that we are anywhere near being done yet, of course :))

The PR looks very good. The only thing I'm not so sure about is the factoring of the WorkProduct functionality. It seems a bit strange to have the partitioning-hashes in there, something which will only be used for object files (and maybe .bc files), but not for something like MIR. At the same time the WorkProductId does not scale to things with a DefId. It seems to me that there is room for improvement. However, I won't block the PR on this.

Another thing that isn't quite clear to me is the relationship between the "temp" directory and the "incr. comp.". Are they the same when incr. comp. is enabled? Or do we copy back and forth between them? I guess, we'll clarify these questions when we tackle #34957, #34955, and #32754.

@nikomatsakis
Contributor

The only thing I'm not so sure about is the factoring of the WorkProduct functionality. It seems a bit strange to have the partitioning-hashes in there, something which will only be used for object files (and maybe .bc files), but not for something like MIR. At the same time the WorkProductId does not scale to things with a DefId. It seems to me that there is room for improvement.

As I said on IRC, I also felt like the WorkProduct thing wasn't as general as I originally thought it would be. I suspect it might be better to just rip it out and change all that stuff to be specific to object files -- though I guess what is there could work just as well for bc files as it does for o files, so it's at least general enough for that.

(OTOH, it won't work for MIR, since we'd want something with a def-id as key for that; they also probably don't need the auxiliary hash info.)

I'm inclined to leave it now but just expect it to evolve, but that's mostly because I'm lazy. I'd also be happy to rip it out.

@michaelwoerister
Contributor

I'm inclined to leave it now but just expect it to evolve, but that's mostly because I'm lazy. I'd also be happy to rip it out.

I'm fine with leaving it as it is in the PR for now. If we had a clear vision of what it ought to look like, that would be different.

@bors
Contributor
bors commented Jul 22, 2016

☔️ The latest upstream changes (presumably #34917) made this pull request unmergeable. Please resolve the merge conflicts.

@nikomatsakis
Contributor

@bors r=mw

@bors
Contributor
bors commented Jul 22, 2016

📌 Commit 5c56fca has been approved by mw

@bors
Contributor
bors commented Jul 23, 2016

⌛️ Testing commit 5c56fca with merge f62c12b...

@bors bors added a commit that referenced this pull request Jul 23, 2016
@bors bors Auto merge of #34956 - nikomatsakis:incr-comp-o-files, r=mw
Enable reuse of `.o` files if nothing has changed

This PR completes a first "spike" for incremental compilation by enabling us to reuse `.o` files when nothing has changed. When in incr. mode, we will save `.o` files into the temporary directory, then copy them back out again if they are still valid. The code is still a bit rough but it does seem to work. =)

r? @michaelwoerister

Fixes #34036
Fixes #34037
f62c12b
@bors
Contributor
bors commented Jul 24, 2016

💔 Test failed - auto-linux-musl-64-opt

@nikomatsakis
Contributor

@michaelwoerister care to review the final commit?

@michaelwoerister
Contributor

@bors r+ ea13edb

LGTM.

@bors bors added a commit that referenced this pull request Jul 26, 2016
@bors bors Auto merge of #34956 - nikomatsakis:incr-comp-o-files, r=michaelwoeri…
…ster

Enable reuse of `.o` files if nothing has changed

This PR completes a first "spike" for incremental compilation by enabling us to reuse `.o` files when nothing has changed. When in incr. mode, we will save `.o` files into the temporary directory, then copy them back out again if they are still valid. The code is still a bit rough but it does seem to work. =)

r? @michaelwoerister

Fixes #34036
Fixes #34037
223858e
@bors
Contributor
bors commented Jul 26, 2016

⌛️ Testing commit ea13edb with merge 223858e...

@bors
Contributor
bors commented Jul 26, 2016

💔 Test failed - auto-win-msvc-64-opt

@michaelwoerister
Contributor

@bors retry

@bors
Contributor
bors commented Jul 26, 2016

⌛️ Testing commit ea13edb with merge 549c00d...

@bors bors added a commit that referenced this pull request Jul 26, 2016
@bors bors Auto merge of #34956 - nikomatsakis:incr-comp-o-files, r=michaelwoeri…
…ster

Enable reuse of `.o` files if nothing has changed

This PR completes a first "spike" for incremental compilation by enabling us to reuse `.o` files when nothing has changed. When in incr. mode, we will save `.o` files into the temporary directory, then copy them back out again if they are still valid. The code is still a bit rough but it does seem to work. =)

r? @michaelwoerister

Fixes #34036
Fixes #34037
549c00d
@bors
Contributor
bors commented Jul 26, 2016

💔 Test failed - auto-win-msvc-64-opt

@nikomatsakis
Contributor

@bors retry

@bors
Contributor
bors commented Jul 26, 2016

⌛️ Testing commit ea13edb with merge 5adcc49...

@bors bors added a commit that referenced this pull request Jul 26, 2016
@bors bors Auto merge of #34956 - nikomatsakis:incr-comp-o-files, r=michaelwoeri…
…ster

Enable reuse of `.o` files if nothing has changed

This PR completes a first "spike" for incremental compilation by enabling us to reuse `.o` files when nothing has changed. When in incr. mode, we will save `.o` files into the temporary directory, then copy them back out again if they are still valid. The code is still a bit rough but it does seem to work. =)

r? @michaelwoerister

Fixes #34036
Fixes #34037
5adcc49
@bors
Contributor
bors commented Jul 26, 2016

💔 Test failed - auto-win-msvc-64-opt-rustbuild

@alexcrichton
Member

@bors: retry

On Tue, Jul 26, 2016 at 5:23 AM, bors notifications@github.com wrote:

💔 Test failed - auto-win-msvc-64-opt-rustbuild
https://buildbot.rust-lang.org/builders/auto-win-msvc-64-opt-rustbuild/builds/1979


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#34956 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAD95JZkQsxuMpO50qTI2ocLXsY7R55Nks5qZfw2gaJpZM4JSAbm
.

@bors
Contributor
bors commented Jul 28, 2016

☔️ The latest upstream changes (presumably #34907) made this pull request unmergeable. Please resolve the merge conflicts.

nikomatsakis added some commits Jul 21, 2016
@nikomatsakis nikomatsakis Extend DepGraph so it can track "work-products"
A work product right now is just a `.o` file. In the future it probably
includes other kinds of files, such as `.bc` files saving the
unoptimized LLVM IR.

However, because WorkProductIds must be independent of DefIds, so that
they don't need translation, this system may not be suitable *as is* for
storing fine-grained information (such as the MIR for individual defs),
as it was originally intended. We will want to refactor some for that.
2b38c4b
@nikomatsakis nikomatsakis Store `crate_disambiguator` as an `InternedString`
We used to use `Name`, but the session outlives the tokenizer, which
means that attempts to read this field after trans has complete
otherwise panic. All reads want an `InternedString` anyhow.
ffc13b2
@nikomatsakis nikomatsakis Code to save/load the work-products map from disk
Work products are deleted if any of their inputs are dirty.
cca4804
@nikomatsakis nikomatsakis Modify trans to skip generating `.o` files
This checks the `previous_work_products` data from the dep-graph and
tries to simply copy a `.o` file if possible.  We also add new
work-products into the dep-graph, and create edges to/from the dep-node
for a work-product.
58d4b8e
@nikomatsakis nikomatsakis Add a testing mechanism and a simple spike test 08a72d2
@nikomatsakis nikomatsakis Address mw nits ceeb158
@nikomatsakis nikomatsakis Keep multiple files per work-product
In the older version, a `.o` and ` .bc` file were separate
work-products.  This newer version keeps, for each codegen-unit, a set
of files of different kinds. We assume that if any kinds are available
then all the kinds we need are available, since the precise set of
switches will depend on attributes and command-line switches.

Should probably test this: the effect of changing attributes in
particular might not be successfully tracked?
2f9fff2
@nikomatsakis
Contributor

@bors r=mw

@bors
Contributor
bors commented Jul 28, 2016

📌 Commit 2f9fff2 has been approved by mw

@nikomatsakis
Contributor

@bors r-

seeing make check errors locally :(

nikomatsakis added some commits Jul 28, 2016
@nikomatsakis nikomatsakis hash def-path's better
actually we shouldn't even hash nested items at all, but that is
addressed in a followup PR
e359147
@nikomatsakis nikomatsakis make it possible to track where hash diverges
42cd5d4
@nikomatsakis
Contributor

@bors r=mw

@bors
Contributor
bors commented Jul 28, 2016

📌 Commit 42cd5d4 has been approved by mw

@bors
Contributor
bors commented Jul 28, 2016

⌛️ Testing commit 42cd5d4 with merge 54c0dcf...

@bors bors added a commit that referenced this pull request Jul 28, 2016
@bors bors Auto merge of #34956 - nikomatsakis:incr-comp-o-files, r=mw
Enable reuse of `.o` files if nothing has changed

This PR completes a first "spike" for incremental compilation by enabling us to reuse `.o` files when nothing has changed. When in incr. mode, we will save `.o` files into the temporary directory, then copy them back out again if they are still valid. The code is still a bit rough but it does seem to work. =)

r? @michaelwoerister

Fixes #34036
Fixes #34037
Fixes #34038
54c0dcf
@bors bors merged commit 42cd5d4 into rust-lang:master Jul 29, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@WiSaGaN
Contributor
WiSaGaN commented Aug 5, 2016

Will this reduce the time of cargo check if only one of several files changes?

@nikomatsakis
Contributor

@WiSaGaN it should, yes, but we are still fixing some known shortcomings there.

@nikomatsakis nikomatsakis deleted the nikomatsakis:incr-comp-o-files branch Oct 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment