Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign up[MIR] Initial implementation of inlining #36593
Conversation
rust-highfive
assigned
arielb1
Sep 20, 2016
This comment has been minimized.
This comment has been minimized.
|
r? @arielb1 (rust_highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
|
/cc @pcwalton |
This comment has been minimized.
This comment has been minimized.
|
I'm currently changing the spans in the inlined MIR to the span of the callsite, which isn't ideal. However, keeping the spans from the original function triggers a multibyte-related assertion. @michaelwoerister any ideas about what might cause that? |
Aatch
referenced this pull request
Sep 20, 2016
Open
Reachable symbols should be determined by the MIR #36594
This comment has been minimized.
This comment has been minimized.
|
Where can I read about motivation for this? Why does MIR inlining make sense in presence on LLVM inlining? |
This comment has been minimized.
This comment has been minimized.
|
@petrochenkov From what I've been told, the main motivation for any optimization on MIR is to improve generic code before it's copied all over the place as part of monomorphization (similar reasoning applies to (I asked on IRC because, like you, I couldn't find a document describing this. There might not be one.) |
This comment has been minimized.
This comment has been minimized.
|
One major motivation is that it improves compile times. Not by a huge amount, but I saw about a 10% improvement in codegen+LLVM pass times when compiling libstd. |
nagisa
reviewed
Sep 20, 2016
|
|
||
| impl fmt::Debug for Location { | ||
| fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result { | ||
| write!(fmt, "{:?}[{}]", self.block, self.statement_index) | ||
| } | ||
| } | ||
|
|
||
| impl<'tcx> TypeFoldable<'tcx> for Mir<'tcx> { |
This comment has been minimized.
This comment has been minimized.
| let terminator = caller_mir[callsite.bb].terminator.take().unwrap(); | ||
| match terminator.kind { | ||
| TerminatorKind::Call { | ||
| func: _, args, destination: Some(destination), cleanup } => { |
This comment has been minimized.
This comment has been minimized.
| let mut first_block = true; | ||
| let mut cost = 0; | ||
|
|
||
| for blk in callee_mir.basic_blocks() { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Aatch
Sep 20, 2016
Author
Contributor
The early returns makes the direct way a bit easier. It could be a visitor, but I'd rather land this and then refactor once the early returns aren't necessary any more.
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
(I don't really care, but it seems like we could use cost.saturating_add() and just set cost to INT_MAX or something instead of early return.)
| } | ||
|
|
||
| struct Integrator<'a, 'tcx: 'a> { | ||
| block_idx: usize, |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Not off the top of my head, unfortunately. |
This comment has been minimized.
This comment has been minimized.
|
@Aatch Handling unwinding should be pretty easy as far as I can tell. Namely you have three patterns:
Note, how you do not need to handle any of the unwind edges inside the callee or caller other than remembering which block the |
This comment has been minimized.
This comment has been minimized.
|
@petrochenkov To add to what others have said, it improves other optimizations. For example (and this is what I want it for), if you inline |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
@nagisa I did have it inlining in more cases, but I ran into issues. I reduced the scope of the pass so I can get something landed first. Handling more cases can be done later. |
This comment has been minimized.
This comment has been minimized.
|
@nagisa I'm not sure about that third case. If there's no cleanup to be done, then there won't be a |
This comment has been minimized.
This comment has been minimized.
|
Ah right, you are absolutely right. Its also a problem in the other cases Ideally we'd want something like the nounwind LLVM attribute but relying on On Sep 21, 2016 5:34 AM, "James Miller" notifications@github.com wrote:
|
This comment has been minimized.
This comment has been minimized.
|
Anything I can do to help this along? |
This comment has been minimized.
This comment has been minimized.
|
@pcwalton I'm just about to push a rebased version with a few other improvements (I fixed the unwinding-related problem). I still need to figure out the debuginfo thing. It's not strictly necessary, as the pass is currently opt-in, but I'd like to at least figure out why it's being breaking, even if it doesn't get fixed so I can make a note of it like I did with the reachable symbols thing. So investigating debuginfo would probably be a big help. |
Aatch
force-pushed the
Aatch:mir-inlining
branch
from
d16e135
to
312b9df
Sep 22, 2016
This comment has been minimized.
This comment has been minimized.
|
@Aatch How do I reproduce this bug? I've tried in various ways on Linux and can't seem to. |
This comment has been minimized.
This comment has been minimized.
|
@bors: try |
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Sep 23, 2016
Aatch
changed the title
[WIP] Initial implementation of inlining
Initial implementation of inlining
Sep 24, 2016
This comment has been minimized.
This comment has been minimized.
|
I feel confident in saying this is good enough for a first pass, with the understanding that the pass itself is still unstable and may have unknown bugs. |
Aatch
changed the title
Initial implementation of inlining
[MIR] Initial implementation of inlining
Sep 24, 2016
pcwalton
reviewed
Sep 24, 2016
|
This looks great! Only had a few nits. Thanks so much for your work! @nikomatsakis I think you should take a look at this too. |
| return_ty: self.return_ty.fold_with(folder), | ||
| var_decls: self.var_decls.fold_with(folder), | ||
| arg_decls: self.arg_decls.fold_with(folder), | ||
| temp_decls: self.temp_decls.fold_with(folder), |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
||
| let kind = match self.kind { | ||
| Assign(ref lval, ref rval) => Assign(lval.fold_with(folder), rval.fold_with(folder)), | ||
| SetDiscriminant { ref lvalue, variant_index } => SetDiscriminant{ |
This comment has been minimized.
This comment has been minimized.
pcwalton
Sep 24, 2016
Contributor
nit: put a space before the { at end of line to be consistent with the pattern
This comment has been minimized.
This comment has been minimized.
| Call { ref func, ref args, ref destination, cleanup } => { | ||
| let dest = if let Some((ref loc, dest)) = *destination { | ||
| Some((loc.fold_with(folder), dest)) | ||
| } else { None }; |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
oh but then you need destination.as_ref().map() -- but that's ok
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| #[inline] | ||
| pub fn get_mut<'a>(&'a mut self, index: I) -> Option<&'a mut T> { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| callgraph | ||
| } | ||
|
|
||
| pub fn scc_iter<'g>(&'g self) -> SCCIterator<'g> { |
This comment has been minimized.
This comment has been minimized.
| false | ||
| } else { | ||
| true | ||
| }) |
This comment has been minimized.
This comment has been minimized.
pcwalton
Sep 24, 2016
Contributor
Maybe we should just have an is_nop() function on Statement. I probably should have written one, sorry about that.
This comment has been minimized.
This comment has been minimized.
|
|
||
| if Some(*target) == *cleanup { | ||
| *cleanup == None; | ||
| } else if !self.in_cleanup_block{ |
This comment has been minimized.
This comment has been minimized.
| @@ -532,8 +528,58 @@ impl<'a, 'tcx> Inliner<'a, 'tcx> { | |||
| let return_block = destination.1; | |||
|
|
|||
| // Copy the arguments if needed. | |||
| let args : Vec<_> = { | |||
| let args : Vec<_> = if is_box_free { | |||
This comment has been minimized.
This comment has been minimized.
pcwalton
Sep 24, 2016
•
Contributor
Oh, yuck! Shouldn't this be done at HIR→MIR lowering time?
This is fine for now, but I'd factor it out into a separate function, with a big ol' FIXME on top of it :)
| highlight_end: h_end, | ||
| } | ||
| h_end: usize) -> Option<DiagnosticSpanLine> { | ||
| fm.get_line(index).map(|text| { |
This comment has been minimized.
This comment has been minimized.
pcwalton
Sep 24, 2016
Contributor
Add a comment here saying what could cause this to return None (as the commit message does).
| note_const_eval_err(bcx.tcx(), &err, span, "expression", &mut diag); | ||
| diag.emit(); | ||
| if let Some(span) = self.diag_span(span, terminator.source_info.scope) { | ||
| let err = ConstEvalErr{ span: span, kind: err }; |
This comment has been minimized.
This comment has been minimized.
pcwalton
Sep 24, 2016
Contributor
nit: put a space before {. (I know you just copy and pasted code from above, but might as well clean this up while you're here.)
This comment has been minimized.
This comment has been minimized.
|
|
Aatch
added some commits
Sep 20, 2016
nikomatsakis
reviewed
Sep 27, 2016
|
|
||
| match self.kind { | ||
| Assign(ref lval, ref rval) => { lval.visit_with(visitor) || rval.visit_with(visitor) } | ||
| SetDiscriminant { ref lvalue, .. } | |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
nikomatsakis
reviewed
Sep 27, 2016
| use mir::repr::TerminatorKind::*; | ||
|
|
||
| match self.kind { | ||
| If { ref cond, .. } => cond.visit_with(visitor), |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
Nit: similarly here I'd avoid the .. if we can; it's so easy for these visitors to go wrong, and so hard to track down
nikomatsakis
reviewed
Sep 27, 2016
| fn super_fold_with<'gcx: 'tcx, F: TypeFolder<'gcx, 'tcx>>(&self, folder: &mut F) -> Self { | ||
| match self { | ||
| &Lvalue::Projection(ref p) => Lvalue::Projection(p.fold_with(folder)), | ||
| _ => self.clone() |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
nikomatsakis
reviewed
Sep 27, 2016
| kind: &TerminatorKind<'tcx>, _loc: Location) { | ||
| if let TerminatorKind::Call { | ||
| func: Operand::Constant(ref f) | ||
| , .. } = *kind { |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
Nit: this formatting is pretty hard to read... can't we get it on one line? maybe use a match? Anything but this :P
nikomatsakis
reviewed
Sep 27, 2016
| children: graph::AdjacentTargets<'g, DefId, ()> | ||
| } | ||
|
|
||
| pub struct SCCIterator<'g> { |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
This should be factored into librustc_data_structures, and ideally the graph algorithms portion of that code. But for now just open a FIXME about it.
nikomatsakis
reviewed
Sep 27, 2016
|
|
||
| //! MIR-based callgraph. | ||
| //! | ||
| //! This only considers direct calls |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
This is not an issue, but it's interesting that this is an "optimistic" call graph -- i.e., it is only those things that we know will be called. Often people would assume a pessimistic approach. Maybe worth highlighting just a bit in the comment here.
nikomatsakis
reviewed
Sep 27, 2016
| }; | ||
| let src = MirSource::from_node(self.tcx, id); | ||
| if let MirSource::Fn(_) = src { | ||
| let mir = if let Some(m) = map.map.get(&def_id) { |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
is there a reason for this to be absent? I guess cross-crate cases, eh?
nikomatsakis
reviewed
Sep 27, 2016
| continue; | ||
| }; | ||
| let src = MirSource::from_node(self.tcx, id); | ||
| if let MirSource::Fn(_) = src { |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
Nit: I'd rather see
match MirSource::from_node(self.tcx, id) {
MirSource::Fn(_) => continue,
_ => (),
}
nikomatsakis
reviewed
Sep 27, 2016
| let mut changed = false; | ||
|
|
||
| loop { | ||
| local_change = false; |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
reviewed
Sep 27, 2016
|
|
||
| csi -= 1; | ||
| if scc.len() == 1 { | ||
| callsites.swap_remove(csi); |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
•
Contributor
Huh, I feel like the ordering tricks here are getting a bit subtle.
First we sorted callsites so that things in the SCC were at the end -- but now we are calling swap_remove, which usually only makes sense if ordering doesn't matter -- and this is part of a loop? So on the next iteration, the ordering is going to be messed up?
Perhaps the scc.len() == 1 condition means this is not true?
Maybe it'd be simpler to just keep a parallel "merged" array of booleans and set merged[csi] = true or something?
EDIT: Oh, and we are growing the array too...
This comment has been minimized.
This comment has been minimized.
Aatch
Sep 28, 2016
Author
Contributor
If scc.len() == 1 then there's only one function in the current SCC, so the order doesn't matter. The order is just between "not in SCC" and "in SCC". The thing is, most SCCs are a single function, so in that case, the order doesn't matter at all. It's just a heuristic, the order doesn't matter for correctness, it just prioritises inlining from outside the SCC first.
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 28, 2016
Contributor
If scc.len() == 1 then there's only one function in the current SCC, so the order doesn't matter. The order is just between "not in SCC" and "in SCC". The thing is, most SCCs are a single function, so in that case, the order doesn't matter at all. It's just a heuristic, the order doesn't matter for correctness, it just prioritises inlining from outside the SCC first.
OK, that makes sense. This should at minimum be a comment. We can iterate on the overall structure later.
nikomatsakis
reviewed
Sep 27, 2016
| foreign_mir.as_ref().map(|m| &**m) | ||
| }; | ||
|
|
||
| let callee_mir = if let Some(m) = callee_mir { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Aatch
Sep 28, 2016
Author
Contributor
Calls to functions that don't have a MIR. Foreign functions, for example.
nikomatsakis
reviewed
Sep 27, 2016
| let terminator = bb_data.terminator(); | ||
| if let TerminatorKind::Call { | ||
| func: Operand::Constant(ref f), .. } = terminator.kind { | ||
| if let ty::TyFnDef(callee_def_id, substs, _) = f.ty.sty { |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
This sort of duplicates the logic in the callgraph code, right? Perhaps we can have a helper that enumerates the direct callees of a given MIR block?
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
Also duplicates the logic just a half page up in this same function, right?
This comment has been minimized.
This comment has been minimized.
arielb1
Sep 27, 2016
•
Contributor
Why is the ordering important? Won't we iterate up to a fixed point anyway?
nikomatsakis
reviewed
Sep 27, 2016
| if let TerminatorKind::Call { | ||
| func: Operand::Constant(ref f), .. } = terminator.kind { | ||
| if let ty::TyFnDef(callee_def_id, substs, _) = f.ty.sty { | ||
| // Don't inline the same function multiple times. |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Sep 27, 2016
Contributor
Well, more specifically this prevents us from inlining recursive calls from the callee -- but these could be indirect and still be a problem, right?
For example:
fn foo() {
bar();
}
fn bar() {
baz();
}
fn baz() {
bar();
}What prevents us from inlining bar into foo, then inlining baz into foo, then bar again?
This comment has been minimized.
This comment has been minimized.
eddyb
Sep 27, 2016
Member
I'm not sure it's even cost-effective across the board to inline non-leaves without a more complex analysis.
This comment has been minimized.
This comment has been minimized.
nikomatsakis
reviewed
Sep 27, 2016
| // Attempts to get an appropriate span for diagnostics. We might not have the source | ||
| // available if the span is from an inlined crate, but there's usually a decent span | ||
| // in an ancestor scope somewhere | ||
| fn diag_span(&self, span: Span, scope: mir::VisibilityScope) -> Option<Span> { |
This comment has been minimized.
This comment has been minimized.
| * Iterator over strongly-connected-components using Tarjan's algorithm[1] | ||
| * | ||
| * [1]: https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm | ||
| */ |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Why is the ordering important? Won't we iterate up to a fixed point anyway? I think the nicest way to do this is:
We might want to keep a cache for I think we need something to avoid being utterly slow with huge SCCs (e.g. not inlining any functions with >30 basic blocks, and not even trying conditional inlining). |
This comment has been minimized.
This comment has been minimized.
I confess I don't quite know. @Aatch wrote this: "It's just a heuristic, the order doesn't matter for correctness, it just prioritises inlining from outside the SCC first." I think I'm sort of lacking the big picture here.
Probably so. That said, I think I'd like to be landing nice patches like this, particularly since it's gated on -O2. I guess there is some question but I feel like iterating in tree (behind a flag) is usually better than spinning on PRs for too long. We do however need to build up some good infrastructure for doing crater-like testing that tracks compile time though -- we wound up kind of skimping there for MIR and, while I don't regret it, it's a shame. It'd be great to be able to build up a MIR optimization infrastructure and test it before we "throw the switch". I think though that cargo-apply is getting pretty robust between the improvements that @brson and I made. So most of the pieces are probably in place. |
This comment has been minimized.
This comment has been minimized.
|
I'm not sure SCC inlining is such a good idea - for one, I can't figure out a good way of extending it to a case where we try to inline (non-specializable) trait methods, and for another, I can't find any good "killer app" for it. I would prefer to just never inline functions in the same SCC - this can deal with trait dependencies by executing Tarjan's algorithm online. |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
@Aatch Let me know if you're going to fix this up soon—if not then I can do it :) |
This comment has been minimized.
This comment has been minimized.
|
ping, what's the status of this? |
This comment has been minimized.
This comment has been minimized.
|
@pcwalton are you still interested in following up on this work? Seems important this keeps moving. |
This comment has been minimized.
This comment has been minimized.
alexg117
commented
Nov 16, 2016
|
@pcwalton Are you still doing this? If not I'd like to pick it up. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@alexg117 looks like you can pick this up. What say you? |
This comment has been minimized.
This comment has been minimized.
|
@alexg117 @nikomatsakis @pcwalton @Aatch I will update this. I'll let you know when you can close this PR in lieu of a freshly-rebased PR. |
This comment has been minimized.
This comment has been minimized.
|
@mrhota great! I think I'll just close it now because of the long period of inactivity, but please do open another PR. |
Aatch commentedSep 20, 2016
•
edited
Implements basic inlining for MIR functions.
Inlining is currently only done if
mir-opt-levelis set to 2.Does not handle debuginfo completely correctly at the moment.