Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eRFC: Cargo build system integration #2136

Merged
merged 12 commits into from Feb 1, 2018
151 changes: 80 additions & 71 deletions text/0000-build-systems.md
Expand Up @@ -31,13 +31,13 @@ After extensive discussion with stakeholders, there appear to be two distinct
kinds of use-cases (or "customers") involved here:

- **Mixed build systems**, where building already involves a variety of
language- or proeject-specific build systems. For this use case, the desire is
language- or project-specific build systems. For this use case, the desire is
to use Cargo as-is, except for some specific concerns. Those concerns take a
variety of shapes: customizing caching, having a local crate registry, custom
handling for native dependencies, and so on. Addressing these concerns well
means adding new points of extensibility or control to Cargo.

- **Homogenous build systems** like [Bazel], where there is a single prevailing
- **Homogeneous build systems** like [Bazel], where there is a single prevailing
build system and methodology that works across languages and projects and is
expected to drive all aspects of the build. In such cases the goal of Cargo
integration is largely *interoperability*, including easy use of the crates.io
Expand Down Expand Up @@ -164,22 +164,32 @@ functionality into numerous small pieces that can be re-used when integrating
into a larger build system. This finer division is left as a question for
experimentation.

## Specifics for the homogenous build system case

For homogenous build systems, there's a key question: how is the Rust code
itself managed, through a crate registry or though some external system? Any
integration has to handle the first case (to have access to crates.io or a
mirror thereof), but organizations can choose whether to manage their own crates
through a custom registry (more on that below) or some other means.

### Using crates managed by a crate registry

Using a crate registry implies using Cargo's dependency resolution, and, in
particular, `Cargo.toml`. In this case, the external build system should invoke
Cargo for *at least* the dependency resolution and build configuration steps,
and likely the build lowering step as well. In such a world, Cargo is
responsible for *planning* the build (which involves largely Rust-specific
concerns), but the external build system is responsible for *executing* it.
## Specifics for the homogeneous build system case

For homogeneous build systems, there are two kinds of code that must be dealt
with: code originally written using vanilla Cargo and a crate registry, and code
written "natively" in the context of the external build system. Any integration
has to handle the first case (to have access to crates.io or a vendored mirror
thereof).

### Using crates vendored from or managed by a crate registry

Whether using a registry server or a vendored copy, if you're building Rust code
that is written using vanilla Cargo, you will at some level need to use Cargo's
dependency resolution and `Cargo.toml` files. In this case, the external build
system should invoke Cargo for *at least* the dependency resolution and build
configuration steps, and likely the build lowering step as well. In such a
world, Cargo is responsible for *planning* the build (which involves largely
Rust-specific concerns), but the external build system is responsible for
*executing* it.

A typical pattern of usage is to have a whitelist of "root dependencies" from an
external registry which will be permitted as dependencies within the
organization, often pinning to a specific version and set of Cargo
features. This whitelist can be described as a single `Cargo.toml` file, which
can then drive Cargo's dependency resolution just once for the entire registry.
The resulting lockfile can be used to guide vendoring and construction of a
build plan for consumption by the external build system.

One important concern is: how do you depend on code from other languages, which
is being managed by the external build system? That's a narrow version of a more
Expand All @@ -189,64 +199,54 @@ separately in a later section.
#### Workflow and interop story

On the external build system side, a rule or plugin will need to be written that
knows how to invoke Cargo to produce a build plan, then translate that build
plan back into appropriate rules for the build system. Thus, when doing normal
builds, the external build system drives the entire process, but invokes Cargo
for guidance during the planning stage.
knows how to invoke Cargo to produce a build plan corresponding to a whitelisted
(and potentially vendored) registry, then translate that build plan back into
appropriate rules for the build system. Thus, when doing normal builds, the
external build system drives the entire process, but invokes Cargo for guidance
during the planning stage.

### Using crates managed by the build system

Many organization want to employ a their own strategy for maintaining and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "employ a their own strategy"

versioning code, and for resolving dependencies. In this case, they may wish to
entirely forgo producing a meaningful Cargo.toml for the code the write, instead
having one that just forwards to a plugin. The description of dependencies is
then written in the external build system's rule format. Here, Cargo acts
primarily as a *workflow and tool orchestrator*, since it is not involved in
either planning or executing the build.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An organization that chooses to have their own versioning and dependency system (such as the facebooks and googles of the world) are most likely not going to use Cargo at all. Instead they will have their own tooling that calls rustc directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A significant motivation for this RFC -- which was designed in coordination with FB and Google build engineers -- is precisely to allow them to manage the build process while still (1) getting access to crates.io and (2) integrating with Rust tooling. Indeed, the sentence you attached this to is specifically talking about how, in these cases, all that Cargo is doing is providing a common way for Rust tooling to get information about a Rust project, even if that information is just being provided by an external build system.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aturon while I know that that's what "workflow and tool orchestrator" means, I'm not sure a casual reader would. It would be useful to make this clearer -- specifically that Cargo in this situation will just be the API that RLS/rustfmt etc use.


#### Workflow and interop story

Even though the external build system is entirely handling both dependency
resolution and build execution for the crates under its management, it may still
use Cargo for *lowering*, i.e. to produce the actual `rustc` invocations from a
higher-level configuration. Cargo will provide a way to do this.

When *developing* a crate, it should be possible to invoke Cargo commands as
usual. We do this via a plugin. When invoking, for example, `cargo build`, the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to invoke the native build system directly instead of using cargo build?


I'm not sure if this is too low-level for this RFC, but it might be worth talking a bit about where build artifacts go. With Buck, build artifacts always go into buck-out in the root of the monorepo. This is advantageous for a few reasons:

  • it avoids polluting random subdirectories
  • it saves on Watchman file monitoring resources

Would tools like the RLS know to read build artifacts from the buck-out directory instead of from target/rls?

plugin will translate that to a request to the external build system, which will
in turn re-invoke Cargo to request a build plan (the exact mechanics here are
TBD), and then execute the build. For `cargo run`, the same steps are followed
by putting the resulting build artifact in an appropriate location, and then
following Cargo's usual logic. And so on.
then execute the build (possibly re-invoking Cargo for lowering). For `cargo
run`, the same steps are followed by putting the resulting build artifact in an
appropriate location, and then following Cargo's usual logic. And so on.

A similar story plays out when using, for example, the RLS or rustfmt. Ideally,
these tools will have no idea that a Cargo plugin is in play; the information
and artifacts they need can be obtained by using Cargo in the appropriate way,
transparently.
and artifacts they need can be obtained by using Cargo's in a standard way,
transparently -- but the underlying information will be coming from the external
build system, via the plugin. Thus the plugin for the external build system must
be able to translate its dependencies back into something equivalent to a
lockfile, at least.

While the details here are quite hazy, the overall point is that control swaps
back and forth between Cargo and the external build system, depending on the
concerns at play. We set things up so that the Rust-specific pieces (including
Cargo workflows) continue to be handled by Cargo whenever possible.
### The complete picture

### Using "unmanaged" crates
In general, any integration with a homogeneous build system needs to be able to
handle (vendored) crate registries, because access to crates.io is a hard constraint.

In some cases, an organization may want to employ a their own strategy for
maintaining and versioning code, and for resolving dependencies. In this case,
they may wish to entirely forgo writing a meaningful Cargo.toml, instead having
one that simply forwards to a plugin. The description of dependencies is then
written in the external build system's rule format. Here, Cargo acts primarily
as a *workflow and tool orchestrator*, since it is not involved in either
planning or executing the build.

#### Workflow and interop story

As with the workflow for crate registries, a plugin is used to manage control
passing back-and-forth between Cargo and the external build system. The main
difference is that Cargo is used for fewer steps. So, for example, when running
`cargo test`, the external build system is invoked directly (in an appropriate
mode for building tests) and performs the build without consulting Cargo at all
(or, perhaps it uses Cargo strictly for lowering, i.e. to determine `rustc`
invocations from higher-level configuration). Once the build is complete, Cargo
takes over, actually executing the resulting test binary.

For the RLS or other tools that need to explore the dependency structure of the
crate, again they should work with a clear Cargo interface that hides any use of
plugins. The plugin for the external build system must be able to translate its
dependencies back into something equivalent to a lockfile, at least.

### A hybrid

In general, any integration with a homogenous build system needs to be able to
handle crate registries, because access to crates.io is a hard constraint.

However, it's possible to *mix* this model for crates.io with the model for
unmanaged crates. All that's needed is a distinction within the external build
system between these two kinds of dependencies, which then drives the plugin
interactions accordingly.
Usually, you'll want to combine the handling of these external registries with
crates managed purely by the external build system, meaning that there are
effectively *two* modes of building crates at play overall. All that's needed to
do this is a distinction within the external build system between these two
kinds of dependencies, which then drives the plugin interactions accordingly.

## Cross-cutting concern: native dependencies

Expand Down Expand Up @@ -285,13 +285,13 @@ in the first place.

Reliably building native dependencies in a cross-platform way
is... challenging. Today, Rust offers some help with this through crates like
[`gcc`] and `[pkgconfig]`, which provide building blocks for writing build
[`gcc`] and [`pkgconfig`], which provide building blocks for writing build
scripts that discover or build native dependencies. But still, today, each build
script is a bespoke affair, customizing the use of these crates in arbitrary
ways. It's difficult, error-prone work.

[`gcc`]: https://docs.rs/gcc
`[pkgconfig`]: https://docs.rs/pkg-config
[`pkgconfig`]: https://docs.rs/pkg-config

This RFC proposes to start a *long term* effort to provide a more first-class
way of specifying native dependencies. The hope is that we can get coverage of,
Expand All @@ -311,6 +311,15 @@ Needless to say, this approach will need significant experimentation. But if
successful, it would have benefits not just for build system integration, but
for using external dependencies *anywhere*.

### The story for externally-managed native dependencies

Finally, in the case where the external build system is the one specifying and
providing a native dependency, all we need is for that to result in the
appropriate flags to the lowered `rustc` invocations. If the external build
system is producing those lowered calls itself, it can completely manage this
concern. Otherwise, we will need for the plugin interface to provide a way to
plumb this information through to Cargo.

## Specifics for the mixed build system case

Switching gears, let's look at mixed build systems. Here, we generally don't
Expand Down Expand Up @@ -353,7 +362,7 @@ work. For example:
- **Profiles**. Putting the idea of the "build configuration" step on firmer
footing will require clarifying the precise role of profiles, which today blur
the line somewhat between *workflows* (e.g. `test` vs `bench`) and flags
(e.g. `--release`). Moreover, integration with a homogenous build system
(e.g. `--release`). Moreover, integration with a homogeneous build system
effectively requires that we can translate profiles on the Cargo side back and
forth to *something* meaningful to the external build system, so that for
example we can make `cargo test` invoke the external build system in a
Expand All @@ -362,7 +371,7 @@ work. For example:
possible to control enough about the `rustc` invocation for at least some
integration cases, and the answer may in part lie in improvements to profiles.

- **Build scripts**. Especially for homogenous build systems, build scripts can
- **Build scripts**. Especially for homogeneous build systems, build scripts can
pose some serious pain, because in general they may depend on numerous
environmental factors invisibly. It may be useful to grow some ways of telling
Cargo the precise inputs and outputs of the build script, declaratively.
Expand Down Expand Up @@ -392,7 +401,7 @@ follow-up RFCs after experimentation has concluded.
It's somewhat difficult to state drawbacks for such a high-level plan; they're
more likely to arise through the particulars.

That said, it's unquestionable that following the plan in this RFC will result
That said, it's plausible that following the plan in this RFC will result
in greater overall complexity for Cargo. The key to managing this complexity
will be ensuring that it's surfaced only on an as-needed basis. That is, uses of
Cargo in the pure crates.io ecosystem should not become more complex -- if
Expand Down