Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upImprove Cargo's scheduling of builds #5100
Conversation
rust-highfive
assigned
matklad
Mar 1, 2018
This comment has been minimized.
This comment has been minimized.
rust-highfive
commented
Mar 1, 2018
|
r? @matklad (rust_highfive has picked a reviewer for you, use r? to override) |
matklad
reviewed
Mar 1, 2018
|
I don't think this works as described, because the depth does not look like depth :-) Some general thoughts: It would be great if we could unit-test such algorithmic-y bits of Cargo. While I am absolutely in love with our integration tests suite, it's hard to use it to exercise such more internal bits. Another similar piece of computer-sciency stuff is, of course, the dependency resolution algorithm. I wonder if it makes sense to invent some sort of "intermediate language" representation of the dependencies graph, the patch section, features and other resolve related stuff. If we than can dump and load this IR in json, it could help us to write unit tests for stuff like "resolver does not do too many steps for this case", "the optimal order of building these crates is this". It might also be useful for the build-plan work, and could unlock experimentation with SAT solvers for resolver. |
|
|
||
| // add one to the depth of all our dependencies because, well, we're | ||
| // depending on them | ||
| *self.depth.entry(dep.clone()).or_insert(0) += 1; |
This comment has been minimized.
This comment has been minimized.
matklad
Mar 1, 2018
Member
This... doesn't calculate depth at all? It looks more like an in-degree? There must be a max somewhere to calculate the depth.
| } | ||
| self.depth.insert(key.clone(), 0); |
This comment has been minimized.
This comment has been minimized.
matklad
Mar 1, 2018
Member
This might override the number already in hash map. If this can't happen due to the order in which dependencies are processed, let's add an assert here.
This comment has been minimized.
This comment has been minimized.
|
Hmm, I tried this on my cargo-cache crate (it uses codegen-units = 1 and monolithic lto) on a dual core system and build time went from
( cargo 0.26.0-nightly (1d6dfea 2018-01-26))
( release profile compiled from git + this PR). The cargo dependency was still compiled last though (altough there should be room for better parallelism) :/ |
This comment has been minimized.
This comment has been minimized.
Yeah, prioritizing packages which have a lot of direct dependencies should improve scheduling I think, but the true depth heuristic should be even better. |
This comment has been minimized.
This comment has been minimized.
urgh too tired
There's nothing stopping us from using |
alexcrichton
force-pushed the
alexcrichton:better-dep-ordering
branch
from
02c907b
to
65beaed
Mar 1, 2018
This comment has been minimized.
This comment has been minimized.
|
Updated! |
matklad
reviewed
Mar 1, 2018
| depth(dep, map, results, cur_depth + 1); | ||
| } | ||
| } | ||
| } |
This comment has been minimized.
This comment has been minimized.
matklad
Mar 1, 2018
•
Member
I believe this is O(N^2) worst case. Not sure if it matters, but I couldn't resist the temptation of implementing a linear time dynamic programming algorithm :-)
pub fn queue_finished(&mut self) {
for key in self.dep_map.keys() {
depth(key, &self.reverse_dep_map, &mut self.depth);
}
fn depth<K: Hash + Eq + Clone>(
key: &K,
map: &HashMap<K, HashSet<K>>,
results: &mut HashMap<K, usize>,
) -> usize {
const IN_PROGRESS: usize = !0;
if let Some(&depth) = results.get(key) {
assert_ne!(depth, IN_PROGRESS, "cycle in DependencyQueue");
return depth;
}
results.insert(key.clone(), IN_PROGRESS);
let depth = map.get(&key).into_iter().flat_map(|it| it)
.map(|dep| depth(dep, map, results))
.max().unwrap_or(0) + 1;
*results.get_mut(key).unwrap() = depth;
depth
}
}The idea is to use reverse map instead the usual one, calculate the depth as max of in-edges plus one and do a sanity check for absence of cycles just in case.
This comment has been minimized.
This comment has been minimized.
alexcrichton
force-pushed the
alexcrichton:better-dep-ordering
branch
from
65beaed
to
e54b5f8
Mar 1, 2018
This comment has been minimized.
This comment has been minimized.
|
@bors r+ |
This comment has been minimized.
This comment has been minimized.
|
|
bors
added a commit
that referenced
this pull request
Mar 1, 2018
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
@mattgathu it would be interesting to see the numbers with this PR merged! You can install Cargo off the master branch with
|
This comment has been minimized.
This comment has been minimized.
|
@matklad was that for me? |
This comment has been minimized.
This comment has been minimized.
|
cargo 0.26.0-nightly (1d6dfea 2018-01-26)
cargo from git ( 558c7d2 )
so no noticeable improvements here. :/ |
This comment has been minimized.
This comment has been minimized.
|
Hm, in my example cargo is still compiled last and alone (with no ther crate in paralel) even after I added another entirely unneeded dep (tokio) to the crate deps.
and it would still compile tokio and all its deps first and then cargo and then the base crate. edit: cargo tree:
|
alexcrichton commentedMar 1, 2018
Historically Cargo has been pretty naive about scheduling builds, basically just
greedily scheduling as much work as possible. As pointed out in #5014, however,
this isn't guaranteed to always have the best results. If we've got a very deep
dependency tree that would otherwise fill up our CPUs Cargo should ideally
schedule these dependencies first. That way when we reach higher up in the
dependency tree we should have more work available to fill in the cracks if
there's spare cpus.
Closes #5014