Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

construct MIR for all crates up to and including rustdoc #27893

Merged
merged 12 commits into from Sep 6, 2015

Conversation

Projects
None yet
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Aug 18, 2015

This PR contains a new crate, rustc_mir, which implements the MIR as specified in the RFC (more or less). There are no targeted unit tests at the moment, as I didn't decide what kind of infrastructure would be best and didn't take the time to implement it.

NB: In packaging up this PR, I realized that MIR construction code is not triggering for methods right now, I think it's only for fixed fns. I'll push a fix for this soon. Hopefully it doesn't stop any crates from building. :) Fixed. Everything still seems to work.

However, the MIR construction code (librustc_mir/build) is intentionally quite distinct from the code which munges the compiler's data structures (librustc_mir/tcx). The interface between the two is the HIR trait (librustc_mir/hir). To avoid confusion with @nrc's work, perhaps a better name for this trait is warranted, although ultimately this trait will be connected to the HIR, I imagine, so in a way the name is perfect. Anyway, I'm open to suggestions. The initial motivation for this split was to allow for the MIR construction code to be unit-tested. But while I didn't end up writing unit tests (yet), I did find the split made the code immensely easier to think about, since the messiness of our existing system, with its myriad hashtables, punning, and so forth, is confined to one part, which simply transforms to a more fully explicit AST-like form. I tried to separate out the commits somewhat, but since this mostly new code, it mostly winds up coming in one fell swoop in the MIR commit.

Quick guide to the MIR crate:

  • repr.rs defines the MIR itself; each MIR instance is parameterized by some HIR H
  • build/ is the MIR construction code, parameterized by a particular HIR
  • hir/ is the definition of the HIR interface
  • tcx/ is the impl of the HIR interface for the tcx
  • dump.rs is the minimal compiler pass that invokes the HIR

One open question:

  • In the HIR trait, I used exclusively struct-like variants. I found I like this more, since things have names. Should I convert the repr code?
@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Aug 18, 2015

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Aug 18, 2015

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@nikomatsakis nikomatsakis force-pushed the nikomatsakis:mir branch from b32e41a to e2ba889 Aug 18, 2015

@steveklabnik steveklabnik referenced this pull request Aug 18, 2015

Closed

Tracking issue for MIR (RFC #1211) #27840

8 of 16 tasks complete
@@ -479,6 +494,16 @@ pub trait Labeller<'a,N,E> {
}
}

/// Escape tags in such a way that it is suitable for inclusion in a
/// Graphviz HTML label.
pub fn escape_html(s: &str) -> String {

This comment has been minimized.

@steveklabnik

steveklabnik Aug 18, 2015

Member

... this isn't a great escape function, but I guess it's not really untrusted input.

This comment has been minimized.

@nikomatsakis

nikomatsakis Aug 19, 2015

Author Contributor

heh, yes. :)

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Aug 18, 2015

@nikomatsakis

The HIR in this patch is fairly distant from @nrc's HIR: his HIR is very close to the (expanded) syntax, and intended to separate type-checking from the details of expansion/resolution, while the HIR in this patch is more like an elaborated, desugared version of Rust.

Vec { fields: Vec<ExprRef<H>> },
Tuple { fields: Vec<ExprRef<H>> },
Adt { adt_def: H::AdtDef,
variant_index: usize,

This comment has been minimized.

@arielb1

arielb1 Aug 18, 2015

Contributor

why don't you store the variant itself?

This comment has been minimized.

@nikomatsakis

nikomatsakis Aug 19, 2015

Author Contributor

@arielb1

why don't you store the variant itself?

Because it was inconvenient to get from the variant to the containing enum and recompute its index, I think. Maybe I should just add those two bits of information to the VariantDef though.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Aug 19, 2015

@arielb1

The HIR in this patch is fairly distant from @nrc's HIR: his HIR is very close to the (expanded) syntax, and intended to separate type-checking from the details of expansion/resolution, while the HIR in this patch is more like an elaborated, desugared version of Rust.

Yes, sure. It may be that we evolve @nrc's HIR downwards (I hope we do), but in any case I agree they are not identical, though in my head they play pretty much the same role: a relatively high-level, AST-oriented view on Rust. Anyway I didn't see a suggestion for a better name in your comment. :P

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Aug 19, 2015

MHIR or HMIR? :p

EDIT: Perhaps AIR, for Abstract Intermediate Representation. Or HAIR.

@Diggsey

This comment has been minimized.

Copy link
Contributor

Diggsey commented Aug 19, 2015

EIR: Elevated Intermediate Representation...

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Aug 19, 2015

@glaebhoerl Her Majesty's Intermediate Representation? /British

pub fn escape_html(s: &str) -> String {
s
.replace("\"", "&quot;")
.replace("&", "&amp;")

This comment has been minimized.

@Ms2ger

Ms2ger Aug 19, 2015

Contributor

Always replace & first, to avoid &amp;quot;.

This comment has been minimized.

@nikomatsakis

nikomatsakis Aug 19, 2015

Author Contributor

@Ms2ger

Always replace & first, to avoid &quot;.

heh, yes.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Aug 19, 2015

@glaebhoerl hmm, I do like "HAIR" :)

}

/// Compile `expr`, yielding a compile-time constant. Assumes that
/// `expr` is a valid compile-time constant!

This comment has been minimized.

@oli-obk

oli-obk Aug 20, 2015

Contributor

is the idea that after check_const runs, everything marked as const will get turned into a const-rvalue?

I'd have thought the const-evaluator would be like an llvm-optimization-pass, simply running on the MIR without much more information (global consts + extern consts + const-fn are needed of course).

This comment has been minimized.

@nikomatsakis

nikomatsakis Aug 21, 2015

Author Contributor

@oli-obk I started writing a response and found I had more to say than seemed appropriate for a GitHub comment. See this thread: https://internals.rust-lang.org/t/how-to-handle-constants-in-the-compiler/2543

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Aug 21, 2015

r? @pnkfelix

after discussing with @huonw, decided to change official reviewer to pnkfelix, but I figure everyone will kibbitz anyhow.

@rust-highfive rust-highfive assigned pnkfelix and unassigned huonw Aug 21, 2015

Use(Lvalue<H>),

// [x; 32]
Repeat(Lvalue<H>, Lvalue<H>),

This comment has been minimized.

@oli-obk

oli-obk Aug 21, 2015

Contributor

so the second value should be a Constant<H>?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Aug 21, 2015

@arielb1

Also, no, it's not only primitive operators, but all operators except for explicit lvalue projections (i.e. addr-of, deref, field access, match, the LHS of array indexing) have their arguments copied to temporaries (up to optimizations) rather than accessed directly.

Hmm, yes, I see, I see. Of course I copied this approach from how trans does things, but I agree it's not right. I'm modifying the MIR construction now, but it seems to push me more in favor of introducing an Operand category, so that "by-value" Rvalue things take Operand rather than Lvalue. This has a few advantages: we can inline constants more easily (since Operand can include constants) and also we force ourselves to through as_operand (or something similar) where appropriate, which will ensure we don't accidentally wind up referencing some data in place that was (logically) moved. And then, as you say, we can later strip out intermediates that are not needed to achieve better re-use. I'll do some experiments and see how this works out.

@nikomatsakis nikomatsakis force-pushed the nikomatsakis:mir branch 2 times, most recently from fe02dd8 to 107f822 Aug 24, 2015

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Aug 24, 2015

OK, I just did a major rebase that folded in all of the WIP comments. In particular I introduced a new concept into the MIR, Operand, that corresponds (at construction time) to a value. It can be either an lvalue (as before) or a constant. This means that when you have an expression like x + y that references lvalues, you'll have temporaries introduced:

tmp0 = x
tmp1 = y
result = tmp0 + tmp1

It also means we do NOT introduce temporaries for constants in expressions like x + 1, those can now be directly folded in.

It seems like these properties are worthy of unit testing, so I may turn some attention to how we can build up a unit test framework. That said, introducing the notion of Operand makes it clear in the type system where something is being used "by value" versus "by reference" (what we have sometimes called an "lvalue context"). This means that the construction code is, at least, known to invoke as_operand for each "by value" use, so so long as we can convince ourselves that it introduces a temporary for lvalues, we're all set.

Also, it should be no problem to purge unnecessary temporaries later, once we have ascertained for sure that there is no mutation to the source of the temporary. As @arielb1 said, I'd rather do that in a separate, dedicated pass (this can also cover the optimizations around pattern matching we now do in trans, I imagine.)

This may be worth updating the RFC for as well, since it marks the first departure, but I'll hold off on that. (And the RFC was clearly intended to evolve anyhow.)

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 25, 2015

☔️ The latest upstream changes (presumably #27943) made this pull request unmergeable. Please resolve the merge conflicts.


#[derive(Clone, PartialEq)]
pub enum Operand<H:HIR> {
Lvalue(Lvalue<H>),

This comment has been minimized.

@eddyb

eddyb Aug 25, 2015

Member

I'd almost want to name this variant Move or Consume.

This comment has been minimized.

@nikomatsakis

nikomatsakis Aug 31, 2015

Author Contributor

@eddyb

I'd almost want to name this variant Move or Consume.

Seems good.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Aug 25, 2015

@nikomatsakis

this can also cover the optimizations around pattern matching we now do in trans, I imagine.

Of course this won't do the decision-tree optimizations, but the temporary-elimination optimizations we do in trans are basically the reverse of RVO.

We translate code like

    match foo {
        Some(foo_inner) => return Some(f(foo_inner));
        None => return None
    }

into MIR (sugared-assmebly-format, rather than graph-format, temporaries are %name, ordinary variables var)

    %discr: Option::Discr = foo.@discr ; match takes an lvalue
    switch %discr, None=>.none, Some=>.some
.some:
    foo_inner = foo.inner ; this isn't even a temporary - borrowck ensures no trouble
.some_body:
    %f_arg : T = foo_inner
    %f_ret : U = call f, %f_arg, onerr <snip> 
    %ret_val_2 : Option<U> = Some(%f_ret)
    *%ret = %ret_val_2
    jmp exit
.none:
.none_body:
    %ret_val : Option<U> = None
    *%ret = %ret_val
exit:
    ret

Standard RVO would replaces temporaries that are written somewhere with the place they are written to:

    %discr: Option::Discr = foo.@discr ; match takes an lvalue
    switch %discr, None=>.none, Some=>.some
.some:
    foo_inner = foo.inner
.some_body:
    %f_arg : Option<T> = foo_inner
    ; %ret_val_2.as<Some>.0 = %f_ret
    ; (*%ret) = %ret_val_2
    (*%ret).as<Some>.0 = call f, %f_arg, onerr <snip> 
    *%ret = Some((*%ret).as<Some>.0) ; lowering ignores the self-assignment
    jmp exit
.none:
.none_body:
    *%ret = None
exit:
    ret

Match optimization substitutes temporaries with their values

    %discr: Option::Discr = foo.@discr ; match takes an lvalue
    switch %discr, None=>.none, Some=>.some
.some:
.some_body:
    ; foo_inner = foo.inner - not really a temporary, but good enough
    ; %f_arg = foo_inner
    (*%ret).as<Some>.0 = call f, foo.inner, onerr <snip> 
    *%ret = Some((*%ret).as<Some>.0) ; lowering ignores the self-assignment
    jmp exit
.none:
.none_body:
    *%ret = None
exit:
    ret

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Aug 26, 2015

Maybe DVIR (desugared value-level IR) for the "HIR"?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Aug 31, 2015

@arielb1

Of course this won't do the decision-tree optimizations, but the temporary-elimination optimizations...

Yes, this is what I meant.

@nikomatsakis nikomatsakis force-pushed the nikomatsakis:mir branch 2 times, most recently from 2cee71c to d44473b Sep 1, 2015

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 5, 2015

I rebased and implement support for slice patterns, using the strategy described in https://internals.rust-lang.org/t/slice-patterns-in-the-land-of-mir/2525/. This means that rustdoc builds and (hopefully) that Windows will build. I'm not sure whether this merits a re-r? -- I would say yes but that this code is somewhat preliminary anyway. It required some mild restructuring of the pattern matching because now simplification and other steps can produce temporaries (storing the slice).

@nikomatsakis nikomatsakis force-pushed the nikomatsakis:mir branch from af818c0 to 2f00086 Sep 6, 2015

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 6, 2015

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 6, 2015

⌛️ Trying commit 2f00086 with merge ea3892f...

bors added a commit that referenced this pull request Sep 6, 2015

Auto merge of #27893 - nikomatsakis:mir, r=<try>
This PR contains a new crate, `rustc_mir`, which implements the MIR as specified in the RFC (more or less). There are no targeted unit tests at the moment, as I didn't decide what kind of infrastructure would be best and didn't take the time to implement it. 

~~NB: In packaging up this PR, I realized that MIR construction code is not triggering for methods right now, I think it's only for fixed fns. I'll push a fix for this soon. Hopefully it doesn't stop any crates from building. :)~~ Fixed. Everything still seems to work.

However, the MIR construction code (`librustc_mir/build`) is intentionally quite distinct from the code which munges the compiler's data structures (`librustc_mir/tcx`). The interface between the two is the `HIR` trait (`librustc_mir/hir`). To avoid confusion with @nrc's work, perhaps a better name for this trait is warranted, although ultimately this trait *will* be connected to the HIR, I imagine, so in a way the name is perfect. Anyway, I'm open to suggestions. The initial motivation for this split was to allow for the MIR construction code to be unit-tested. But while I didn't end up writing unit tests (yet), I did find the split made the code immensely easier to think about, since the messiness of our existing system, with its myriad hashtables, punning, and so forth, is confined to one part, which simply transforms to a more fully explicit AST-like form. I tried to separate out the commits somewhat, but since this mostly new code, it mostly winds up coming in one fell swoop in the MIR commit.

Quick guide to the MIR crate:

- `repr.rs` defines the MIR itself; each MIR instance is parameterized by some HIR `H`
- `build/` is the MIR construction code, parameterized by a particular HIR
- `hir/` is the definition of the HIR interface
- `tcx/` is the impl of the HIR interface for the tcx
- `dump.rs` is the minimal compiler pass that invokes the HIR

One open question:

- In the HIR trait, I used exclusively struct-like variants. I found I like this more, since things have names. Should I convert the repr code?
@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 6, 2015

@bors r=nrc

(@nrc gave thumbs up over IRC)

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 6, 2015

📌 Commit 2f00086 has been approved by nrc

@nikomatsakis nikomatsakis force-pushed the nikomatsakis:mir branch from 2f00086 to c8a6618 Sep 6, 2015

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Sep 6, 2015

@bors r=nrc

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 6, 2015

📌 Commit c8a6618 has been approved by nrc

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 6, 2015

⌛️ Testing commit c8a6618 with merge 3dd1a48...

bors added a commit that referenced this pull request Sep 6, 2015

Auto merge of #27893 - nikomatsakis:mir, r=nrc
This PR contains a new crate, `rustc_mir`, which implements the MIR as specified in the RFC (more or less). There are no targeted unit tests at the moment, as I didn't decide what kind of infrastructure would be best and didn't take the time to implement it. 

~~NB: In packaging up this PR, I realized that MIR construction code is not triggering for methods right now, I think it's only for fixed fns. I'll push a fix for this soon. Hopefully it doesn't stop any crates from building. :)~~ Fixed. Everything still seems to work.

However, the MIR construction code (`librustc_mir/build`) is intentionally quite distinct from the code which munges the compiler's data structures (`librustc_mir/tcx`). The interface between the two is the `HIR` trait (`librustc_mir/hir`). To avoid confusion with @nrc's work, perhaps a better name for this trait is warranted, although ultimately this trait *will* be connected to the HIR, I imagine, so in a way the name is perfect. Anyway, I'm open to suggestions. The initial motivation for this split was to allow for the MIR construction code to be unit-tested. But while I didn't end up writing unit tests (yet), I did find the split made the code immensely easier to think about, since the messiness of our existing system, with its myriad hashtables, punning, and so forth, is confined to one part, which simply transforms to a more fully explicit AST-like form. I tried to separate out the commits somewhat, but since this mostly new code, it mostly winds up coming in one fell swoop in the MIR commit.

Quick guide to the MIR crate:

- `repr.rs` defines the MIR itself; each MIR instance is parameterized by some HIR `H`
- `build/` is the MIR construction code, parameterized by a particular HIR
- `hir/` is the definition of the HIR interface
- `tcx/` is the impl of the HIR interface for the tcx
- `dump.rs` is the minimal compiler pass that invokes the HIR

One open question:

- In the HIR trait, I used exclusively struct-like variants. I found I like this more, since things have names. Should I convert the repr code?

@bors bors merged commit c8a6618 into rust-lang:master Sep 6, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@nikomatsakis nikomatsakis deleted the nikomatsakis:mir branch Mar 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.