Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the core MIR datastructures to librustc. #29929

Merged
merged 2 commits into from
Nov 30, 2015

Conversation

michaelwoerister
Copy link
Member

This is done mostly so that we can refer to MIR types in csearch and other metadata related area.

Heads up, @rust-lang/compiler!

r? @nikomatsakis

@michaelwoerister michaelwoerister added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-compiler labels Nov 19, 2015
@arielb1
Copy link
Contributor

arielb1 commented Nov 19, 2015

@michaelwoerister

I would prefer not to move stuff to librustc because it is already too big. Maybe split csearch into a trait object and put it in another crate instead?

@michaelwoerister
Copy link
Member Author

@arielb1 csearch and most of the other metadata stuff could probably be split out of librustc. But anything that can be restored from metadata would still have to be in librustc, so that a csearch trait could refer to it.

@michaelwoerister
Copy link
Member Author

Another option would be not to have a centralized facility for loading things from metadata (i.e. csearch) but to separate that into their associated modules: codemap::metadata, hir::metadata, ty::metadata, mir::metadata and what have you. Not the worst option, in my opinion.

@nikomatsakis
Copy link
Contributor

@arielb1 the main point of the rustc crate is to house common type definitions, I think. It seems like the logical place for these to go to me.

@brson
Copy link
Contributor

brson commented Nov 19, 2015

I know I don't have much stake in this but I too find this surprising. I think we want code moving the other direction to dismantle the monolith.

(Of course I understand that metadata is a tangled nexus of interdependencies in rustc).

@arielb1
Copy link
Contributor

arielb1 commented Nov 19, 2015

@michaelwoerister

mir::tcx and mir::repr are small enough I am not worried about adding them. I would prefer to split metadata.

@michaelwoerister
Copy link
Member Author

To me this looks as if the whole metadata infrastructure (including automated serialization) is in need of a refactoring. However, I wouldn't want to rush this. It's something that would profit from putting some thought into and maybe producing an RFC that shows some alternatives and takes stock of all the requirements that we actually have here.

Regarding this PR: As I said above, putting all 'common' types into librustc is not the only option. A more staged approach that shields earlier compilation passes (typechecking, etc) from later ones (e.g. involving MIR) would be possible too. It would mean splitting metadata handling into a common core module that doesn't know anything specific and then having each of the successively later modules providing their specialized implementations for loading codemaps, type information, MIR, etc. If everything follows the same scheme, this could be a clean solution. (Just not something we'd want to do in an ad hoc manner).

@arielb1
Copy link
Contributor

arielb1 commented Nov 19, 2015

automated serialization? I am working on splitting metadata to a trait and a crate that implements it. We can do something like astencode for MIR (but hopefully less horrible).

@michaelwoerister
Copy link
Member Author

automated serialization?

Something in the vain of RustcEncodable but that allows to also serialize/deserialize things that need more context (e.g. ty::Ty<>) in order to get rid of as much boilerplate code as possible.

I am working on splitting metadata to a trait and a crate that implements it.

Sounds good :)

@arielb1
Copy link
Contributor

arielb1 commented Nov 20, 2015

@michaelwoerister

Our serialization includes a non-trivial amount of non-trivial context - the dumb "dump tcx tables" part is relatively small.

@nikomatsakis
Copy link
Contributor

On this specific scenario:

The goal was to get MIR serialization working quickly and painlessly. I think everyone agrees that maintaining astencode.rs (and to a lesser extent tyencode.rs) is an awful pain. In general, our strategy with crates has been that code that needs to shared across multiple crates is moved to librustc, and things that do not remain in their own individual crates, but it's certainly possible to imagine a more elaborate structure as well.

@arielb1 I'm not sure precisely what you are working on but it sounds interesting. Moving metadata to its own crate and having it communicate back via a trait object seems reasonable -- though in general I am wary of introducing too many layers of abstraction and traits, I think it's a sign of "over-crate-ification". How far away are you from having a PR ready for this?

My preference is to keep moving forward, basically. I think my preferred option is to land this PR so that @michaelwoerister can keep making progress and then move the MIR definitions back as part of the (proposed) PR that separates metadata out. This seems reasonable to me.

Some more general thoughts:

I see splitting into crates as a means to an end, not an end in itself. The original motivation for splitting up rustc was primarily reducing memory use. But I think the split is good so long as it helps to keep the code base more approachable. In that sense, having multiple crates helps to lay bare the structure of the codebase, which is nice.

But having lots of crates has costs, too. If you have too many, there stops to be a lot of benefit to the crates over modules in terms of offering a quick overview. They impose acyclicity requirements which, while sometimes helpful, can also be limiting -- e.g. the situation we are discussing right here, where metadata needs to intercommunicate with the ty types but the also other, later parts of the compiler. This can be resolved (as @arielb1 proposed) with a trait, and I actually think this will probably work pretty well in this situation, but it doesn't work very well in all situations, particularly if there are lots of types needed.

In general, the way I have understood crates and Rust is that the purpose of a crate is to be "the unit of code that gets shipped". In that sense, having multiple crates in the compiler is confusing, because it suggests that an end-user might "use" librustc_borrowck (or whatever) independently from the rest of the compiler, which we actively do not want to support, at least not today.

Honestly, if memory use were not a concern, and incremental compilation were working, I am not entirely sure whether I would want to continue having multiple crates. But memory use is a concern, and incremental compilation is not yet working, so obviously we should continue with multiple crates for now at least. But I still don't think we have to be dogmatic about it: we can promote shared types as needed.

@nikomatsakis
Copy link
Contributor

Sorry, not sure why I felt the need to write that gigantic essay. :)

Let me just post back a few notes on serialization. I think we should definitely not hand-write code like astencode.rs for MIR -- maintaining that code is such a pain. I want to be able to #[derive(SomethingOrOther)]. Using TLS is certainly the easiest way to get the required context, but my preference has always been to extend the derive syntax so that we can add more information (like specifying the "extra context" that needs to be passed down), but that is a larger bit of design that hasn't happened. It might be that we could use something like https://crates.io/crates/custom_derive though, which might be very nice indeed.

@michaelwoerister
Copy link
Member Author

I've rebased this and also moved the visitor module to librustc.

@arielb1
Copy link
Contributor

arielb1 commented Nov 24, 2015

@nikomatsakis

I have a PR almost ready.

@bors
Copy link
Contributor

bors commented Nov 25, 2015

☔ The latest upstream changes (presumably #30044) made this pull request unmergeable. Please resolve the merge conflicts.

@nikomatsakis
Copy link
Contributor

OK, I think we should close this in favor of #30043, despite my impassioned defense of monolithic crates. ;) @michaelwoerister do you agree?

@michaelwoerister
Copy link
Member Author

Well, splitting out metadata is mostly orthogonal to make MIR type definitions available in rustc, so I wouldn't close this PR, just rebase it, so I can later add MIR to the new CrateStore trait.

This is done mostly so that we can refer to MIR types in csearch and other metadata related area.
@nikomatsakis
Copy link
Contributor

So now that @arielb1 has landed his PR, I see two options:

  1. Initially I assumed we'd add a second metadata trait, probably in librustc_mir, that extends the original metadata trait with methods for loading MIR metadata. Since this trait object is not known to the tcx, it would have to be threaded to the later passes that need access to it. If we take this to the extreme, we basically wind up with a layered series of contexts -- but it's possible that we'll never need more than two layers.
  2. We continue to move the data structures to rustc, so that we don't need these layered contexts, and instead can take more a "promise-based" approach. But we move the code out to separate crates where possible.

I am actually not sure which option I prefer. I was basically arguing for option 2 earlier, but option 1 has its appeal, in that the dag structure is more clear. But it also introduces more layers and more indirection, which can make the overall structure harder to discern, particularly for newcomers. I think in the end there is no firm rule, it will depend a lot on the particulars. Luckily, we don't have to decide this permanently now, this is basically something we can refactor over time.

Discussing with @michaelwoerister on IRC, we were thinking it probably makes sense to land this PR after all and push forward with incr. comp. work. If we want to refactor to option 1, we can do that too. Note though that process on incr. comp. also affects this decision, since bootstrap time may be affected (though I think it'll be some time before we get a lot of progress there).

EDIT: to be clear, about incr. comp. I mean it'll be some time before it's effective at helping with rustc and the whole stage0/stage1/stage2 situation, though I've not thought about this super hard.

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Nov 30, 2015

📌 Commit f28a4e9 has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented Nov 30, 2015

⌛ Testing commit f28a4e9 with merge 8bc43ed...

bors added a commit that referenced this pull request Nov 30, 2015
…atsakis

This is done mostly so that we can refer to MIR types in csearch and other metadata related area.

Heads up, @rust-lang/compiler!

r? @nikomatsakis
@bors bors merged commit f28a4e9 into rust-lang:master Nov 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants