New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental compilation RFC #1298

Merged
merged 7 commits into from Nov 6, 2015

Conversation

Projects
None yet
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Sep 28, 2015

High-level strategy for incremental compilation.

cc @rust-lang/compiler

Rendered

@larsbergstrom

This comment has been minimized.

Copy link

larsbergstrom commented Sep 28, 2015

Awesome!

This is about incremental builds for a single-crate, right? If so, it's worth calling that out.

Also, if I'm correct, these caches are not meant to be shared across build machines, right?

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Sep 28, 2015

@nikomatsakis The summary talks about debug builds specifically, but IIRC we discussed how this would apply to release builds as well? (I.e., a story a bit like parallel codegen units, where you'd be trading incrementality against optimization potential due to passing LLVM smaller units of code)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 28, 2015

@larsbergstrom actually, I believe incremental builds across crates can be done relatively easily, though I didn't discuss it. I will add a TODO item to summarize how that would work.

@aturon yes I updated the summary, my mistake.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 28, 2015

@larsbergstrom added a brief note about cross-crate dependencies

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 28, 2015

@larsbergstrom

Also, if I'm correct, these caches are not meant to be shared across build machines, right?

That is correct.

@eefriedman

This comment has been minimized.

Copy link
Contributor

eefriedman commented Sep 28, 2015

How does member function name (x.foo()) lookup work in this scheme, particularly in the case of autoderef? Presumably a failed lookup has to create a dependency on something, but it's not clear what exactly that "something" is.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 28, 2015

@eefriedman

How does member function name (x.foo()) lookup work in this scheme, particularly in the case of autoderef? Presumably a failed lookup has to create a dependency on something, but it's not clear what exactly that "something" is.

That "something" is the IR tables that indicate what traits are in scope at a given point, as well as those that collect all the impls for a trait (I did not add an exhaustive listing to the RFC). Those will presumably be linked up something like the following:

  • there will be an edge from the containing module/scope to the tables indicating what traits are in scope, such that if a new use statement is added, portions of those tables are invalidated.
  • method search will be adding edges from the table of traits to the fns that include method calls.
  • the coherence pass will add edges from each impl to IR node representing the set of traits of that impl
  • trait search will add edges from the set of traits for a given impl to the fn using it

That is roughly the idea. Make sense?

@eefriedman

This comment has been minimized.

Copy link
Contributor

eefriedman commented Sep 28, 2015

Yes, that makes sense; thanks.

strategies can be used to enable lazy or parallel compilation at later
times. (Eventually, though, it might be nice to restructure the
compiler so that it operates in more of a demand driven style, rather
than a series of sweeping passes.)

This comment has been minimized.

@bstrie

bstrie Sep 28, 2015

Contributor

Mind elaborating on what "demand driven style" entails and how it differs from our current approach?

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Sep 28, 2015

As an example of what I think is incremental compilation done right, see MSVC. Not only does it have an incremental compilation + linking mode that works fairly well, but it also has an incremental LTCG mode where it does full link time optimization, just incrementally.

impl Type { // Path: <root>::foo::<impl1>
fn bar() {..} // Path: <root>::foo::<impl1>::bar
}
impl Type { } // Path: <root>::foo::<impl2>

This comment has been minimized.

@shepmaster

shepmaster Sep 28, 2015

Member

Since you don't indicate that every path has a unique integer, this seems to imply that you'd have to know if there are any duplicate children before you start naming, or have some amount of mutability to go back and "fix" the first child when you see the second child.

Is there a possibility to simply leave the first one as <impl> and then mark the second one as <impl2>?

in which they appear. This does mean that reordering impls may cause
spurious recompilations. We can try to mitigate this somewhat by making the
path entry for an impl include some sort of hash for its header or its contents,
but that will be something we can add later.

This comment has been minimized.

@shepmaster

shepmaster Sep 28, 2015

Member

I can't think of any concrete cases, but it seems unlikely that this would be the only spurious recompilation. If you know of others, it might be good to list those somewhere. Maybe even as a staged list (corresponding to your list above?), stating that when we cache object files, X and Y cause unneeded recompilation, but after that only X does.

The nodes fall into the following categories:

- **HIR nodes.** Represent some portion of the input HIR. For example,
the body of a fn as a HIR node (or, perhaps, HIR node). These are

This comment has been minimized.

@shepmaster

shepmaster Sep 28, 2015

Member

As a HIR node, or a HIR node?

As you can see, this graph indicates that if the signature of either
function changes, we will need to rebuild the MIR for `foo`. But there
is no path from the body of `bar` to the MIR for foo, so changes there
need not trigger a rebuild.

This comment has been minimized.

@shepmaster

shepmaster Sep 28, 2015

Member

I believe this is an understood limitation, but it may be worth pointing out again that this wouldn't allow bar to be inlined into foo, otherwise you'd end up with two versions of bar.

Indeed, this is mentioned in optimization below, so a link down there may be all that is needed.

Actually, this may have more nuance, as if the functions are in the same codegen unit, then there should be a link, I believe.

In terms of the dependency graph, we would create one IR node
representing the codegen unit. This would have the object code as an
associated artifact. We would also have edges from each component of
the codegen unit. As today. generic or inlined functions would not

This comment has been minimized.

@shepmaster

shepmaster Sep 28, 2015

Member

Misplaced . for ,.


# Alternatives

This design is an evolution from a prior RFC.

This comment has been minimized.

@shepmaster

shepmaster Sep 28, 2015

Member

Should that be linked here?

@shepmaster

This comment has been minimized.

Copy link
Member

shepmaster commented Sep 28, 2015

As I understand it, a large benefit of incremental compilation is speed, but there's no mention of tests that attempt to quantify or ensure that the new world order will be faster. Is there anything more beyond time cargo build?

- Object files
- This represents the final result of running LLVM. It may be that
the best strategy is to "cache" compiled code in the form of an
rlib that is progessively patched, or it may be easier to store

This comment has been minimized.

@frewsxcv

frewsxcv Sep 29, 2015

Member

"progessively" → "progressively"

currently indexes into the HIR of the appropriate crate, becomes an
index into the crate's list of paths.

For the most part, these paths match up with user's intutions. So a

This comment has been minimized.

@frewsxcv

frewsxcv Sep 29, 2015

Member

"intutions" → "intuitions"

}
```

Note that the impls were arbitarily assigned indices based on the order

This comment has been minimized.

@frewsxcv

frewsxcv Sep 29, 2015

Member

"arbitarily" → "arbitrarily"

In general, while I described the general case of a stack of procedure
nodes, it may be desirable to try and maintain the invariant that
there is only ever one procedure node on the stack at a
time. Otherwise, failing to push/pop a procdure at the right time

This comment has been minimized.

@frewsxcv

frewsxcv Sep 29, 2015

Member

"procdure" → "procedure"

@comex

This comment has been minimized.

Copy link

comex commented Sep 29, 2015

Not a very helpful comment, but: 👍👍👍👍👍

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 29, 2015

@bstrie

Mind elaborating on what "demand driven style" entails and how it differs from our current approach?

By demand-driven style, what I meant was that we would build a dependency graph that we use to drive compilation. So, for example, we would begin by saying "we need to trans the main function" (assuming an application), so let's try to do that. But to trans the main function, we must know that it is correct, so that would require us to borrow check main. But to borrow check main, we must know it is type correct, so we would first type check it. This in turn would require knowing what its names refer to, so we would run name resolution. During type-checking, we would collect the signature of each fn that gets called would then get explored as well, since we can't type-check main without knowing that. Once we're done with main, we'd do the same procedure for every other fn in the crate. At the end, we'd have translated everything, but we do so depth-first rather than breadth-first. Make sense?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 29, 2015

@shepmaster

Since you don't indicate that every path has a unique integer, this seems to imply that you'd have to know if there are any duplicate children before you start naming, or have some amount of mutability to go back and "fix" the first child when you see the second child.

In the actual implementation, every path element also has a disambiguating integer. This begins as zero, but when we create a new def-id, we check if the parent already has a child with that name and, if so, increment the disambiguating integer as many times as we have to until we get a unique name. I can tweak the RFC to reflect the impl more precisely.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 29, 2015

@shepmaster

I believe this is an understood limitation, but it may be worth pointing out again that this wouldn't allow bar to be inlined into foo, otherwise you'd end up with two versions of bar.

I don't know what you mean here, actually. Do you mean that if we inlined, then the graph would be wrong? Because that is not the case: this graph refers to the front end's view of things, which is before inlining etc will take place. When we actually do codegen, if foo and bar are placed into the same codegen unit, then yes LLVM may choose to do inlining (and that would be reflected in the dependency graph). I don't think I have a good example graph showing how that would work, but it's described textually in the section on optimization.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Oct 16, 2015

Hear ye, hear ye. This RFC is now entering final comment period.

@michaelwoerister

This comment has been minimized.

Copy link

michaelwoerister commented Oct 21, 2015

I think this RFC is good to go. Conceptually it seems sound to me and it contains enough of a concrete outline to start implementing.

@Ericson2314

This comment has been minimized.

Copy link
Contributor

Ericson2314 commented Oct 21, 2015

I get that as a mere rust user that has never contributed to rustc, I'm basically pontificating on these design decisions that don't affect any public interface. But might somebody comment on whether the alternative of building the dependency graph explicitly and then processing it (lazily or otherwise) as I wrote earlier was considered?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Oct 21, 2015

But might somebody comment on whether the alternative of building the
dependency graph explicitly and then processing it (lazily or otherwise) as
I wrote earlier was considered?

Sorry, I meant to reply to your comment earlier. I did consider that design
and I suspect that ultimately we will actually do a bit of both ---
however, I very much want to prevent the dependency graph and the code from
falling out of sync. We have definitely had bad experience in this respect.
Simply building a graph a priori can very easily fall into this trap. If we
do build up a graph up-front, I want to try and refactor the code such that
requesting data where there is no graph edge fails (perhaps by asserting
that the graph edge exists, or by restructuring the API in some way that
it's not even possible).

On Wed, Oct 21, 2015 at 1:33 PM, John Ericson notifications@github.com
wrote:

I get that as a mere rust user that has never contributed to rustc, I'm
basically pontificating on these design decisions that don't affect any
public interface. But might somebody comment on whether the alternative of
building the dependency graph explicitly and then processing it (lazily or
otherwise) as I wrote earlier was considered?


Reply to this email directly or view it on GitHub
#1298 (comment).

@Ericson2314

This comment has been minimized.

Copy link
Contributor

Ericson2314 commented Oct 22, 2015

@nikomatsakis Thank you, that is very reassuring. I absolutely agree on the soundness issue; in fact I'd say without refactoring to make sure the graph traversals are correct by construction, there's hardly any point in taking my route. Sounds like your view is the implicit dependency route is a good way to accurately catch all dependencies without forcing the big refactor, but explicit dependencies is a decent end goal?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Oct 22, 2015

I think there will always be some of both. Some dependencies at least
cannot be constructed "up front" but rather must be discovered -- for
example, we have to do method resolution and type-checking to know what
other fns are referenced and hence which dependencies exist.

On Wed, Oct 21, 2015 at 7:33 PM, John Ericson notifications@github.com
wrote:

@nikomatsakis https://github.com/nikomatsakis Thank you, that is very
reassuring. I absolutely agree on the soundness issue, and refactoring to
make sure the graph traversals are correct by construction. Sounds like
your view is the implicit dependency route is a good way to accurately
catch all dependencies without forcing the big refactor, but explicit
dependencies is a decent end goal?


Reply to this email directly or view it on GitHub
#1298 (comment).

@Ericson2314

This comment has been minimized.

Copy link
Contributor

Ericson2314 commented Oct 22, 2015

Ah. I envisioned stuff like that working by the traversal of one graph creating another.

@Ericson2314

This comment has been minimized.

Copy link
Contributor

Ericson2314 commented Oct 22, 2015

To clarify. Suppose we have something like token tree -(macros)-> collection of items -(type-checking and method resolution...)-> collection of MIR -(llvm)-> collection of bitcode.

To really do laziness right with this, not only would the graphs be traversed lazily, but also created lazily. The MIR for each function would be bundled with a thunk to generate the MIR for all referenced functions.

[For any Nix users out there (cough @eddyb cough) this is related to doing things like import (import ./foo.nix).]

@Ericson2314

This comment has been minimized.

Copy link
Contributor

Ericson2314 commented Oct 22, 2015

Finally, I mentioned earlier I'd love to right some generic library to persist/cache all that. To make that a bit more concrete I was thinking of something like https://github.com/dmbarbour/haskell-vcache or https://github.com/mirage/irmin along with some infrastructure to serialize thunks.

@bkoropoff

This comment has been minimized.

Copy link

bkoropoff commented Oct 23, 2015

This looks great to me. The greatest challenge is going to be building a dependency graph that is as precise as possible (to get maximum benefit) without introducing unsoundness. I don't see any silver bullets here; just "be really careful" and "test a lot".

There may be an interesting class of source code changes affecting lifetime or variance inference where typechecking artifacts are invalidated, but it is theoretically possible to avoid invalidating trans artifacts since lifetimes are erased by then. I haven't thought of any concrete examples that would be worth exploiting, however.

@michaelwoerister

This comment has been minimized.

Copy link

michaelwoerister commented Oct 23, 2015

One thing that is not mentioned in the RFC at all yet is monomorphization and the consequences it has.
The general case of a dependency graph with generic items will look more like the following:

BODY(foo) ----------------------------> TYPECK(foo) ----------------> MIR(foo)
                                          ^ ^ ^ ^                      |
SIG(foo) ----> COLLECT(foo)               | | | |         +------------+------------+
                 |                        | | | |         |            |            |
                 +--> ITEM_TYPE(foo) -----+ | | |         v            v            v
                 +--> PREDICATES(foo) ------+ | |      LLVM(foo'1)  LLVM(foo'2)  LLVM(foo'3)
                                              | |         |            |            |
SIG(bar) ----> COLLECT(bar)                   | |         v            v            v
                 |                            | |    OBJECT(foo'1) OBJECT(foo'2) OBJECT(foo'3)
                 +--> ITEM_TYPE(bar) ---------+ |
                 +--> PREDICATES(bar) ----------+

One complication I can see here is that we can only know after type-checking which monomorphizations are still used, but the proposed algorithm already wants to garbage-collect the on-disk cache right after building the HIR. This has to be accounted for somehow.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Oct 23, 2015

True. We don't actually know what monomorphizations we want until trans.
Type-checking doesn't expand things out. I was thinking about
monomorphizations at some point, but I don't remember just what I had in
mind. Regarding GCing of monomorphizations, I think I was originally
thinking that we would just keep all monomorphizations of foo until foo
changed. This does mean though that we might keep some monomorphizations we
no longer need (because they were only being used by bar, and bar
changed). It's also true that the "cache" on disk would have to include the
types in the key, something that the RFC doesn't really discuss explicitly.

I've also been thinking about what it would take to do an early target that
JUST saves LLVM IR and object code. This will require doing a few things
slightly differently, but seems like a good first "spike goal":

  1. We would always recompute the signatures for all items, whether they've
    changed or not. This is because
  2. As you point out, we'll have to type-check the bodies for generic fns
    that are potentially called, as we may need new monomorphizations thereof.
    Probably the easiest way to start would be type-checking all bodies too, or
    at least all generic bodies. I think the easiest way to address this would
    be by saving and re-loading the MIR, which once it is in use ought not to
    be that hard.

On Fri, Oct 23, 2015 at 10:20 AM, Michael Woerister <
notifications@github.com> wrote:

One thing that is not mentioned in the RFC at all yet is monomorphization
and the consequences it has.
The general case of a dependency graph with generic items will look more
like the following:

BODY(foo) ----------------------------> TYPECK(foo) ----------------> MIR(foo)
^ ^ ^ ^ |
SIG(foo) ----> COLLECT(foo) | | | | +------------+------------+
| | | | | | | |
+--> ITEM_TYPE(foo) -----+ | | | v v v
+--> PREDICATES(foo) ------+ | | LLVM(foo'1) LLVM(foo'2) LLVM(foo'3)
| | | | |
SIG(bar) ----> COLLECT(bar) | | v v v
| | | OBJECT(foo'1) OBJECT(foo'2) OBJECT(foo'3)
+--> ITEM_TYPE(bar) ---------+ |
+--> PREDICATES(bar) ----------+

One complication I can see here is that we can only know after
type-checking which monomorphizations are still used, but the proposed
algorithm already wants to garbage-collect the on-disk cache right after
building the HIR. This has to be accounted for somehow.


Reply to this email directly or view it on GitHub
#1298 (comment).

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Oct 23, 2015

We already save the type-checked body of monomorphizable fns.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Oct 23, 2015

@arielb1

We already save the body of monomorphizable fns.

Yes, but what we are mostly talking about is preserving the monomorphized
LLVM bitcode.

Well, I guess I was saying that for a first draft, it might not be worth
trying to reuse the type-checked body at first. This is because currently
we save the body as part of the metadata in the final end-product, and it
would be work (however little) to save that data somewhere else. Clearly
eventually we want to. I'm mostly just trying to work out what is the
smallest thing we can get working to start.

On Fri, Oct 23, 2015 at 1:18 PM, arielb1 notifications@github.com wrote:

We already save the body of monomorphizable fns.


Reply to this email directly or view it on GitHub
#1298 (comment).

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Oct 23, 2015

@nikomatsakis

Maybe convert all translation to use inlining and save the serialized data (we would also need to have some way of stably comparing it for this to work). Using serialized MIR instead of serialized AST may make this easier, but I feel like the issues are orthogonal.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Oct 24, 2015

@arielb1 I'm not clear on what problem you are proposing to solve here? (I
don't even see that there is a problem that needs solving)

On Fri, Oct 23, 2015 at 4:16 PM, arielb1 notifications@github.com wrote:

@nikomatsakis https://github.com/nikomatsakis

Maybe convert all translation to use inlining and save the serialized data
(we would also need to have some way of stably comparing it for this to
work).


Reply to this email directly or view it on GitHub
#1298 (comment).

@michaelwoerister

This comment has been minimized.

Copy link

michaelwoerister commented Oct 24, 2015

  1. We would always recompute the signatures for all items, whether they've changed or not. This is because
  2. As you point out, we'll have to type-check the bodies for generic fns that are potentially called, as we may need new monomorphizations thereof. Probably the easiest way to start would be type-checking all bodies too, or at least all generic bodies. I think the easiest way to address this would be by saving and re-loading the MIR, which once it is in use ought not to be that hard.

Isn't it proposed anyway that the complete set of items hashed on every compilation?
I think it should not be a problem to just cache object code for starters. Only the dependency graph must be complete and not produce false negatives.

## Basic usage

The basic usage will be that one enables incremental compilation using
a compiler flag like `-C incremental-compilation=TMPDIR`. The `TMPDIR`

This comment has been minimized.

@bstrie

bstrie Nov 3, 2015

Contributor

Do you expect that Cargo will pass this flag by default for all projects?

directory is intended to be an empty directory that the compiler can
use to store intermediate by-products; the compiler will automatically
"GC" this directory, deleting older files that are no longer relevant
and creating new ones.

This comment has been minimized.

@bstrie

bstrie Nov 3, 2015

Contributor

When does this GC happen? Will it be sort of like Git where GCs can potentially happen whenever you type any command?

This comment has been minimized.

@michaelwoerister

michaelwoerister Nov 3, 2015

I would expect it to run on every compiler run using the given directory.


Regardless of whether it is invoked in incremental compilation mode or
not, the compiler will always parse and macro expand the entire crate,
resulting in a HIR tree. Once we have a complete HIR tree, and if we

This comment has been minimized.

@bstrie

bstrie Nov 3, 2015

Contributor

Experience suggests that an important contributor to compilation time are syntax extensions like regex! and Serde's annotations, which will seemingly be left out in the cold here. Do you have any ideas for incrementalizing/caching things in this area?

This comment has been minimized.

@eddyb

eddyb Nov 3, 2015

Member

IME most of the time spent in those cases is on later stages, processing the large amounts of code generated by the syntax extension.

This comment has been minimized.

@bstrie

bstrie Nov 3, 2015

Contributor

Ah, I should have profiled instead of assuming that the slowdown was in the expansion phase itself. :P

When we come to the final LLVM stages, we must
[separate the functions into distinct "codegen units"](#optimization)
for the purpose of LLVM code generation. This will build on the
existing "codegen-units" used for parallel code generation. LLVM may

This comment has been minimized.

@bstrie

bstrie Nov 3, 2015

Contributor

Does it merely build on the existing codegen-units, or does it replace it entirely? Would seem a little odd to have both exposed.

This comment has been minimized.

@michaelwoerister

michaelwoerister Nov 3, 2015

I think, it doesn't make much sense to allow the user to specify the number of codegen-units when compiling incrementally. The compiler needs more control over what ends up where and, at least in the beginning, I would expect the -Ccodgegen-units option to be ignored (with a note to the user) for incremental builds.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Nov 6, 2015

Huzzah! The compiler team has decided to accept this RFC. The expectation is that the actual impl will discover numerous surprises (we've already found a few) that require adjustments, and that we will come back and update the RFC to be more inline with the final design when that has shaken out a bit.

@nikomatsakis nikomatsakis merged commit 59b01f1 into rust-lang:master Nov 6, 2015

@matthewhammer

This comment has been minimized.

Copy link

matthewhammer commented Dec 6, 2015

There's lots of interesting talk about incremental computation in this thread, which is great!

In case anyone was wondering about PL research literature on this topic, these researchers have also been thinking about incremental, demand-driven compilation / computation:

The first paper is more recent, and specialized to a situation similar to the one described in the discussion above (incremental compilation, using demand-driven, dynamic dependency graphs). The second paper gives a general approach for such incremental, demand-driven computations. There is follow-on work on adapton.org.

@White-Oak

This comment has been minimized.

Copy link

White-Oak commented Dec 30, 2015

Any update on the state of implementation?

@jonas-schievink

This comment has been minimized.

Copy link
Member

jonas-schievink commented Dec 30, 2015

@White-Oak Creation of a dependency graph is being done in rust-lang/rust#30532

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment