New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

const_eval: Predetermine the layout of all locals when pushing a stack frame #57677

Merged
merged 1 commit into from Jan 22, 2019

Conversation

Projects
None yet
8 participants
@dotdash
Copy link
Contributor

dotdash commented Jan 16, 2019

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 16, 2019

r? @michaelwoerister

(rust_highfive has picked a reviewer for you, use r? to override)

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 16, 2019

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 16, 2019

⌛️ Trying commit fafeda6 with merge 4f2ef25...

bors added a commit that referenced this pull request Jan 16, 2019

Auto merge of #57677 - dotdash:locals, r=<try>
const_eval: Predetermine the layout of all locals when pushing a stack frame

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By calculating the layouts at the time a stack frame is calculated
and caching them, we can reduce the number of layout queries speeding up
code that is heavy on const_eval.
@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Jan 17, 2019

Thanks for the PR, @dotdash! Looking forward to the perf results :)

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Jan 17, 2019

@rust-timer build 4f2ef25
(I think the try build actually finished successfully a while ago)

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Jan 17, 2019

Success: Queued 4f2ef25 with parent ceb2512, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Jan 17, 2019

Finished benchmarking try commit 4f2ef25

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Jan 17, 2019

This looks pretty good. For the synthetic const-eval heavy tests it looks spectacular even :)

I'll give it a proper review tomorrow. Meanwhile, could you check whether there's a problem with compiling the unicode_normalization crate? perf.rlo shows no numbers for it which might indicate a compilation error.

cc @oli-obk @rust-lang/wg-compiler-performance

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 17, 2019

The numbers are only missing for the parent commit. I suppose that it just predates the addition of that benchmark, which was yesterday.

let local_ty = self.monomorphize(local_ty, frame.instance.substs);
self.layout_of(local_ty)
) -> TyLayout<'tcx> {
frame.local_layouts[local]

This comment has been minimized.

@oli-obk

oli-obk Jan 18, 2019

Contributor

While definitely an improvement for all currently possible const eval code, I think this is suboptimal for code with branching, as a lot of locals will never be touched during evaluation. I think it would be better to make local_layouts be IndexVec<mir::Local, Option<TyLayout<'tcx>>> and only fill it in on demand. To make this really work with reading I presume it would even need to be IndexVec<mir::Local, RefCell<Option<TyLayout<'tcx>>>> at which point we can just move to IndexVec<mir::Local, Once<TyLayout<'tcx>>> or some non-Sync variant of that.

@@ -463,12 +463,10 @@ impl<'a, 'mir, 'tcx, M: Machine<'a, 'mir, 'tcx>> EvalContext<'a, 'mir, 'tcx, M>
&self,

This comment has been minimized.

@oli-obk

oli-obk Jan 18, 2019

Contributor

drive-by fix: access_local doesn't need to be pub anymore

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Jan 18, 2019

@oli-obk's remark regarding branching code makes sense. Would it be possible to make some or most EvalContext methods &mut self instead of &self so we don't have to use RefCell? It seems like that should be possible in principle, right?

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Jan 18, 2019

Would it be possible to make some or most EvalContext methods &mut self instead of &self so we don't have to use RefCell? It seems like that should be possible in principle, right?

The problem is that we separated reading and writing into &mut self and &self methods. So any method trying to read from the local would need to be &mut self and that would infect everything. The immutability of the read paths has been very helpful in the development of miri. I think it would be fine to have some interior mutability just for memoization. Since TyLayout is Copy, we can even use Cell<Option<TyLayout<'tcx>>> making the entire operation zero cost both in memory and performance.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Jan 18, 2019

OK, I'm fine with the Cell approach.

I'd be interested though in how using &self has made things easier.

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Jan 18, 2019

I'd be interested though in how using &self has made things easier.

Mirroring the mutability of the virtual miri memory in the mutability of the memory datastructure itself allows various things like holding references to different parts of the virtual memory at the same time. If we made everything take mutable references, we could only ever access one miri allocation, even if we just wanted to write to one. While we could have worked around this with unsafe code (and we still have to do it in one situation

// This checks relocation edges on the src.
let src_bytes = self.get(src.alloc_id)?
.get_bytes_with_undef_and_ptr(&tcx, src, size)?
.as_ptr();
let dest_bytes = self.get_mut(dest.alloc_id)?
.get_bytes_mut(&tcx, dest, size * length)?
.as_mut_ptr();
// SAFE: The above indexing would have panicked if there weren't at least `size` bytes
// behind `src` and `dest`. Also, we use the overlapping-safe `ptr::copy` if `src` and
// `dest` could possibly overlap.
// The pointers above remain valid even if the `HashMap` table is moved around because they
// point into the `Vec` storing the bytes.
unsafe {
assert_eq!(size.bytes() as usize as u64, size.bytes());
if src.alloc_id == dest.alloc_id {
if nonoverlapping {
if (src.offset <= dest.offset && src.offset + size > dest.offset) ||
(dest.offset <= src.offset && dest.offset + size > src.offset)
{
return err!(Intrinsic(
"copy_nonoverlapping called on overlapping ranges".to_string(),
));
}
}
for i in 0..length {
ptr::copy(src_bytes,
dest_bytes.offset((size.bytes() * i) as isize),
size.bytes() as usize);
}
} else {
for i in 0..length {
ptr::copy_nonoverlapping(src_bytes,
dest_bytes.offset((size.bytes() * i) as isize),
size.bytes() as usize);
}
}
}
), and while two phase borrows could have made many situations safe, we would still have to fight the borrow checker at ever turn.

Also it just "felt right" to make sure we don't accidentally modify our virtual memory in functions that should just read from it.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Jan 18, 2019

Yeah, I can see how this can become tricky. Thanks for the explanation, @oli-obk!

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 19, 2019

using Cell gets me lifetime errors, which I don't understand right away.

  --> src/librustc_mir/interpret/snapshot.rs:75:34
   |
46 | impl<'a, 'mir, 'tcx> InfiniteLoopDetector<'a, 'mir, 'tcx>
   |      --  ---- lifetime `'mir` defined here
   |      |
   |      lifetime `'a` defined here
...
75 |         if self.snapshots.insert(EvalSnapshot::new(memory, stack)) {
   |                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'mir` must outlive `'a`

error: unsatisfied lifetime constraints
   --> src/librustc_mir/interpret/snapshot.rs:410:1
    |
410 |   impl_stable_hash_for!(impl<'tcx, 'b, 'mir> for struct EvalSnapshot<'b, 'mir, 'tcx> {
    |   ^                          ----      ---- lifetime `'mir` defined here
    |   |                          |
    |  _|                          lifetime `'tcx` defined here
    | |
411 | |     // Not hashing memory: Avoid hashing memory all the time during execution
412 | |     memory -> _,
413 | |     stack,
414 | | });
    | |___^ type annotation requires that `'mir` must outlive `'tcx`
    |
    = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

Is there some magic in the stable hash macro that might affect this?

const_eval: Predetermine the layout of all locals when pushing a stac…
…k frame

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.

@dotdash dotdash force-pushed the dotdash:locals branch from fafeda6 to 98d4f33 Jan 20, 2019

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 20, 2019

Seems that the constraint on the stable hash impl was just swapped around, asking for 'mir to outlive 'tcx instead of the other way around.

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 20, 2019

@bors try

We should measure this again, my local setup seems a bit funky right now, and showed a massive slowdown for this version which seems unlikely to be right.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 20, 2019

⌛️ Trying commit 98d4f33 with merge a0b24f2...

bors added a commit that referenced this pull request Jan 20, 2019

Auto merge of #57677 - dotdash:locals, r=<try>
const_eval: Predetermine the layout of all locals when pushing a stack frame

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By calculating the layouts at the time a stack frame is calculated
and caching them, we can reduce the number of layout queries speeding up
code that is heavy on const_eval.
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 20, 2019

☀️ Test successful - checks-travis
State: approved= try=True

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Jan 21, 2019

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Jan 21, 2019

Success: Queued a0b24f2 with parent 4db2394, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Jan 21, 2019

Finished benchmarking try commit a0b24f2

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Jan 21, 2019

Perf is great (though no effect on unicode-normalization). Impl looks good.

The commit message is still slightly outdated.

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 21, 2019

Ah, I had updated the commit message, but not the message for the PR itself. Or did I still miss something in the commit message as well?

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Jan 21, 2019

@bors r+

Thanks, @dotdash!

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 21, 2019

📌 Commit 98d4f33 has been approved by michaelwoerister

Centril added a commit to Centril/rust that referenced this pull request Jan 21, 2019

Rollup merge of rust-lang#57677 - dotdash:locals, r=michaelwoerister
const_eval: Predetermine the layout of all locals when pushing a stack frame

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.

bors added a commit that referenced this pull request Jan 22, 2019

Auto merge of #57823 - Centril:rollup, r=Centril
Rollup of 8 pull requests

Successful merges:

 - #57294 (When using value after move, point at span of local)
 - #57552 (Default images)
 - #57667 (Fix memory leak in P::filter_map)
 - #57677 (const_eval: Predetermine the layout of all locals when pushing a stack frame)
 - #57730 (Merge visitors in AST validation)
 - #57791 (Add regression test for #54582)
 - #57795 (Use structured suggestion in stead of notes)
 - #57809 (Add powerpc64-unknown-freebsd)

Failed merges:

r? @ghost

Centril added a commit to Centril/rust that referenced this pull request Jan 22, 2019

Rollup merge of rust-lang#57677 - dotdash:locals, r=michaelwoerister
const_eval: Predetermine the layout of all locals when pushing a stack frame

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.

bors added a commit that referenced this pull request Jan 22, 2019

Auto merge of #57830 - Centril:rollup, r=Centril
Rollup of 9 pull requests

Successful merges:

 - #57537 (Small perf improvement for fmt)
 - #57552 (Default images)
 - #57604 (Make `str` indexing generic on `SliceIndex`.)
 - #57667 (Fix memory leak in P::filter_map)
 - #57677 (const_eval: Predetermine the layout of all locals when pushing a stack frame)
 - #57791 (Add regression test for #54582)
 - #57798 (Corrected spelling inconsistency)
 - #57809 (Add powerpc64-unknown-freebsd)
 - #57813 (fix validation range printing when encountering undef)

Failed merges:

r? @ghost

@bors bors merged commit 98d4f33 into rust-lang:master Jan 22, 2019

1 check passed

homu Test successful
Details
@@ -510,7 +504,7 @@ impl<'a, 'mir, 'tcx, M: Machine<'a, 'mir, 'tcx>> EvalContext<'a, 'mir, 'tcx, M>
// FIXME: do some more logic on `move` to invalidate the old location
Copy(ref place) |
Move(ref place) =>
self.eval_place_to_op(place, layout)?,
self.eval_place_to_op(place)?,

This comment has been minimized.

@RalfJung

RalfJung Jan 30, 2019

Member

Hm, but this means we might have to compute the layout even if we already know it, right?

This comment has been minimized.

@oli-obk

oli-obk Jan 30, 2019

Contributor

To change this we'd need to pass a precomputed layout to layout_of_local, which probably doesn't do anything, because any initialized local will already have had called layout_of_local on it before. We can still benchmark it and see if it changes anything

@@ -72,6 +72,7 @@ fn mk_eval_cx_inner<'a, 'mir, 'tcx>(
ecx.stack.push(interpret::Frame {
block: mir::START_BLOCK,
locals: IndexVec::new(),
local_layouts: IndexVec::new(),

This comment has been minimized.

@eddyb

eddyb Jan 30, 2019

Member

Can't the information be included in locals itself?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment