New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 2196, "metabuild: semantic build scripts for Cargo" #49803

Open
Centril opened this Issue Apr 9, 2018 · 9 comments

Comments

Projects
None yet
6 participants
@Centril
Contributor

Centril commented Apr 9, 2018

This is a tracking issue for the RFC "metabuild: semantic build scripts for Cargo" (rust-lang/rfcs#2196).

Steps:

Unresolved questions:

None

@withoutboats

This comment has been minimized.

Contributor

withoutboats commented Apr 9, 2018

I wanted to register a concern on this RFC but I didn't realize how quickly it was approved.

I know that the goal of moving forward here is to make it the manner in which cargo processes native dependencies more declarative and easier to process by other build systems. I 100% approve of that goal. I'd love to see a future where the difference between a dependency implemented in C and one implemented in Rust was essentially insignificant to the end user.

The RFC states:

At the same time, we don't yet have sufficiently precise information about the needs of such systems to design an ideal set of Cargo metadata on the first try. Rather than attempt to architect the perfect solution from the start, and potentially create an intermediate state that will require long-term support, we propose to allow experimentation with declarative build systems within the crates.io ecosystem, in crates supplying modular components similar to build.rs scripts.

As a nightly-only means of experimenting toward finding a long term solution to native dependencies, I am totally behind this RFC. In contrast, I feel a lot of concern about providing "metabuild" in this form as a stable feature because of the other ways this feature can be used.

I find the idea of declaratively listing crates in Cargo.toml and calling them when you cargo build very opaque. For use cases that are not literally doing what cargo build says on the tin (ie building a dependency), I am worried about this being a confusing feature that obscures how the build pipeline for a project works. Other kinds of build processing that happen during building in my opinion ought to appear as code somewhere in the project - that is, in the build.rs. For use cases other than building dependencies, I'm more in favor of code gen solutions which keep that step very discoverable to end users than a declarative system like this.


Without making any sort of "slippery slope" analogy, I want to share a frustrating experience I had with a Ruby on Rails project because of the multiple layers of opaque "declarative" build/exec processing that have developed in that ecosystem.

The rspec command took around 10 seconds to initialize all of our app's dependencies before running tests for me, and I wanted to reduce that time. My problem was eventually solved by realizing my shell wasn't properly invoking the rails spring binaries, but I was trying to find a cleaner solution in which we just didn't initialize the entire application before running every test.

It took me quite a while to figure out how rspec even figured out it was supposed to load the application; eventually I discovered that the root directory had a .rspec file in it which listed subcommands rspec appended every time it was run. Once I deleted that file, I got the boot time down to about 2 seconds, which was still far too long for not doing any work at all.

Eventually, I discovered that the 2 seconds were because I was using rvm to manage my Ruby versions, and rvm dynamically injects some code into your version of rspec and ruby and so on in order to treat it as if when you call rspec you actually call bundle exec rspec, and the 2 seconds came from processing our project's Gemfile.

In other words, when I ran rspec, at two different layers (in a dotfile in the project and dotfiles in my home directory), different programs were declaratively injecting behavior into my command, neither of which were designed to be discoverable and obvious.


To recap, I want to draw a clear distinction between building native dependencies and arbitrary build-time processing. I think its completely correct for the first to be handled declaratively, even implicitly. But when it comes to executing arbitrary code at build time to do anything at all, I think it is important that it be obvious and discoverable what additional behavior is being run at build time. The build script solves this by having literal source code you can read. But having to spelunk into other repositories (if there are even repositories linked from crates.io for your metabuild dependencies) is a real step back in this regard.

@joshtriplett

This comment has been minimized.

Member

joshtriplett commented Apr 9, 2018

@withoutboats First, I do want to emphasize that the goal is to experiment in the Cargo ecosystem, not to immediately stabilize it. That was what ultimately led to moving forward with the RFC: the desire to enable that experimentation and development.

I do understand the concern about builds becoming more opaque. On the other hand, if you see a metabuild key pulling in lalrpop-build or similar, you can feel confident that a crate uses the standard lalrpop-build mechanism to build a parser, not something custom or non-standard. And if there's a bug or performance issue, it can be fixed in that one place, not in numerous copy/pasted/tweaked build.rs scripts.

So I do want to see every component of build.rs using metabuild crates, not just dependencies. Those are just the most important target to standardize.

I don't think this obscures the build pipeline or makes it less discoverable, any more than having other functionality factored out into crates obscures the code using those crates.

@raphaelcohn

This comment has been minimized.

raphaelcohn commented Apr 11, 2018

To make builds truly reproducible and remove the sorts of issues @withoutboats experienced, one needs a reproducible build chain that is completely independent of the binaries on the host. That requires versioning of even the smallest build dependency - the version of an autoconf m4 macro (not the generated configure) or a hardcoded reference to /bin/sh - and can be extremely challenging indeed. It can be done - as my experimental Libertine Linux shows - but its very hard indeed. The most important principle in getting it right is to always cross-compile - even the build toolchain.

For a more general build system, features make it even harder. Take the DPDK project or rdma-core library - they have some many ways of building them there's no sane way to abstract in a way that sorts more than a very narrow subset of uses.

@withoutboats

This comment has been minimized.

Contributor

withoutboats commented Apr 11, 2018

@raphaelcohn I don't see the connection between reproducibility and the issue I was talking about - discoverability.

@raphaelcohn

This comment has been minimized.

raphaelcohn commented Apr 11, 2018

"dotfiles in my home directory". Something which is not reproducible is not easily discoverable.

@ehuss

This comment has been minimized.

Contributor

ehuss commented Jun 4, 2018

Is anyone working on this? I have some free time and am willing to help.

@aturon

This comment has been minimized.

Member

aturon commented Jun 6, 2018

@ehuss not that I know of; it'd be great to see some action here! cc @joshtriplett

@ehuss

This comment has been minimized.

Contributor

ehuss commented Jun 11, 2018

Thanks @aturon. I have posted a preliminary PR at rust-lang/cargo#5628.

@ehuss

This comment has been minimized.

Contributor

ehuss commented Sep 10, 2018

This is now available on nightly (documentation here). Some things that probably should be decided before stabilizing:

  • Should there be a structured way to pass metadata directly to the script so it doesn't need to parse the manifest?
  • Should there be a more explicit way indicate errors?
  • How should cargo metadata behave? Currently it includes a metabuild key in the package, but the metabuild target is hidden. This is my preference, but I'm not sure if that will confuse any tools to have a hidden target.
  • How should "build plan" behave? Currently it generates the metabuild script if necessary and includes instructions on how to build it. I think this is the best approach, since without that information I think it would be almost impossible to use crates with this feature with external build systems. However, it is a little strange to have a mostly internal implementation detail exposed like this.
  • How should JSON artifact messages behave? Currently the metabuild target is included in the JSON artifact message, along with the path to the internal file (target/.metabuild/metabuild-$PKG-$HASH.rs). I'm not sure what the use cases are for the JSON artifact messages, so I'm not sure if it should be hidden or not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment