Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Add build profile. #6577
I have some concerns about this, so I wanted to open this for discussion.
With this feature, if a dev build is made, and then a release build is made, all of the
I'm very worried about this causing shared dependencies to now be built multiple times.
I've been doing some analysis on crates.io to try to understand the impact.
3187 of 21098 crates have at least one overlapping dependency. gist. crossgen has a whopping 161 crates in common worst case.
I did some tests "without" build profile and "with" build profile with either 12 or 2 concurrency. All times in seconds. These is just a rough idea. I ran each attempt multiple times, but this is particular to my hardware, and running on MacOS.
As you can see, sometimes it is a little faster, but usually it is slower (sometimes much slower).
Here's a few more pieces of data that seemed interesting (of 21098 crates):
The current defaults may not be the best. Turning off debug improves speed and reduces disk space, but then you lose good backtraces. It's also questionable if it matters if debug-assertions or overflow-checks are off. Setting opt-level=1 had a noticeable increase in compile time on the few projects I tried, so I left it at 0.
alexcrichton left a comment
Thanks so much for doing the analysis here!
To make sure I understand this, the PR proposed as-is changes the default build settings for build scripts/procedural macros in both debug/release modes. This means that entire dependency trees rooted in procedural macros and build scripts are now compiled differently, and any previous sharing which happend no longer occurs, accounting for longer build times.
I'm curious if you know if there are some particularly bad "root offenders"? How do crates like
FWIW absolute compile times aren't always the most interesting metric in my opinion. Incremental builds almost always occur because there's previous artifacts and/or build times were already bad enough to motivate tools like
Here is some analysis of root offenders: https://gist.github.com/ehuss/0c9fb074d4b8720316b8ede243006f78. I tried to weight them by how often they are used and how many shared dependencies they tend to have. Maybe not the best weighting strategy.
The top offender is
Here is a detailed look at cargo-crev: https://gist.github.com/ehuss/a15704fc8c9d9a345a0d71739e3db32e. It's interesting because there isn't one bad offender, but a bunch of them (bindgen, clear_on_drop, failure, phf_codegen, cc, rand).
Ok thanks for that analysis! I agree it's pretty hard to draw a trend from that. My main conclusion is largely just that the ecosystem of build dependencies is basically the same as normal dependencies, they themselves are built on a number of crates in the ecosystem and there's some big ones and some small ones.
When thinking about the build as a whole, as mentioned before this change is basically irrelevant for incremental builds. It's also largely irrelevant for builds using caching solutions like
One aspect of those builds I've often noticed is that for larger projects all hardware parallelism is eaten up during the first half-or-so of the build, but the second half is often more serial as dependencies become chained and all the quick crates are out of the way. The relatively small percentage increase in build times you measured above I think may be explainable that the "time to a serial build" is moving back and we're making use of the unused parallelism at that portion of the build to finish up build dependencies. Now of course those same dependencies can also push back the build because the serial chain of crates could depend on everything being finished.
Overall I still personally feel pretty good about this change. Local projects can always reconfigure back to today's configuration if cold builds matter a lot, and otherwise this should provide a general improvement for working with build scripts and procedural macros.
Spot on. Do you have any thoughts about how to organize the artifact directory? To address something like #1774 it would need to change so that dev/release will share the same build artifacts.
My preference would be to remove the debug/release directory separation. I suspect there might be opposition to that, though it could maybe be done in a backwards compatible fashion with links. From a functional standpoint of using the
If that is untenable, a dedicated
Or it could just stay as-is, which allows for sharing, but causes rebuilds when switching dev/release.
Or maybe some other option, like build artifacts are always in the
I definitely think we should solve the rebuiding problem, but I think we could either do that by placing output in a new directory or by hashing more into the filename. I'm actually somewhat surprised that their filenames are conflicting today, do you know what's not being hashed to cause the filenames to be different and avoid colliding into the same filename?
We definitely can't easily remove debug/release folders as they're so widely ingrained today. What I think we could do, however, is move towards a world where those folders only contain final output artifacts rather than intermediate ones. Sort of like how we have
I'm a little confused. I was saying that they don't conflict, so there should be no reason they need to be in separate directories.
Yea, that's what I meant by "backwards compatible fashion with links" — it would keep the debug/release directories and just link final artifacts there for any tools that expect them.
I'll take a look soon at implementing that soon and see if there are any major drawbacks. I expect there to be a lot of little changes throughout the code, but overall to be straightforward. I'd like to do that in a separate PR if that's OK?
Oh sorry I was misunderstanding the rebuild point. It's not that we're thrashing a cache but the same artifacts are cached in two locations. That doesn't happen today as the settings are basically always different, but after this change the build profile for dev/release is the same so the artifacts are actually the same.
In the long term I think we're going to move to a global build cache for Cargo, so I think it's fine to go ahead and experiment with it ahead of time. I'm thinking something along the lines of "everything stays exactly the same as it is today", but all files are just hard links to a build cache elsewhere. The build cache is just a dump of everything Cargo ever does, compeltely unorganized.
I implemented a unified deps directory, but ran into some problems dealing with backwards compatibility. I've been trying a few different approaches, but they all have drawbacks.
If we break very old Windows I think that's fine, I thought that
I don't actually know any systems that don't support hard links on the same filesystem, but have we hit some in the wild we wanted to handle?
I think breaking rustbuild is fine (especially if we see the breakage coming!).
Overall I think we definitely need to preserve backcompat to ensure that the current patterns for finding a test binary works somewhat (although we have broken this before...). Otherwise it should be fine to ignore older Windows and I think it's fine to assume hard links for perf (although I may be forgetting something critical there).
If we only hard link/copy the final binaries that could mitigate the impact of systems without hard links perhaps and overall reduce the amount of traffic on the filesystem?
It is fairly recent. Creating symlinks historically required admin permissions until Windows 10 Creators Update (released mid 2017). The reason you can run on older systems is because
A. I don't think we every try to link directories on Windows. I can only think of macos with dSYM.
I believe some network filesystems do not support it.
Sometime soonish, unless you have any other feedback, I'll try out the hybrid approach and see how it goes.
Oh sorry right yeah symlinks won't work but I think that directory junctions are supported much further back on Windows, right? (I forget if that's what
Hm network filesystems is a bummer... I think the hybrid approach would be best there though long-term!