Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upResult-based error handling optimizes very poorly for large payloads #42047
Comments
This comment has been minimized.
This comment has been minimized.
|
For reference, this is basically a simplified version of what webrender parsing its display lists with bincode looks like. |
This comment has been minimized.
This comment has been minimized.
|
I expect the bottleneck is that the indirect and direct versions appear to have very different error semantics without a full picture: direct preserves the old value on Err, while indirect partially overwrites the old value on Err. In our case we simply don't care what the value is on Err, so the two are semantically identical. However knowing this requires fairly strong alias information. |
This comment has been minimized.
This comment has been minimized.
sanmai-NL
commented
May 17, 2017
|
On what hardware were the benchmark results obtained? |
Mark-Simulacrum
added
the
I-slow
label
Jun 23, 2017
This comment has been minimized.
This comment has been minimized.
|
Reverting #31545 improves things for me from: to: test bench_direct ... bench: 92,552 ns/iter (+/- 4,365) |
jrmuizel
referenced this issue
Jul 20, 2017
Closed
Mark &mut pointers as noalias once LLVM no longer miscompiles them #31681
Mark-Simulacrum
added
the
C-enhancement
label
Jul 26, 2017
arielb1
self-assigned this
Aug 6, 2017
This comment has been minimized.
This comment has been minimized.
|
Assuming we can write directly in the return destination, I came up with different benchmarks: Click for code#![feature(test, box_syntax)]
extern crate test;
use std::io::{Write, Error, ErrorKind};
const DATA_SIZE: usize = 200;
#[derive(Copy, Clone)]
struct Huge {
data: [u8; DATA_SIZE],
}
struct HugeIter<'a> {
cur_val: Huge,
buf: &'a [u8],
}
impl<'a> HugeIter<'a> {
fn next(&mut self) -> Option<&Huge> {
if let Ok(()) = parse_huge(&mut self.buf, &mut self.cur_val) {
Some(&self.cur_val)
} else {
None
}
}
fn next_roundtrip_caller(&mut self) -> Option<&Huge> {
unsafe {
let mut val = Huge { data: ::std::mem::uninitialized() };
let result = match parse_huge(&mut self.buf, &mut val) {
Ok(()) => Ok(val),
Err(e) => Err(e)
};
if let Ok(ref val) = result {
self.cur_val = *val;
Some(&self.cur_val)
} else {
None
}
}
}
}
struct HugeResultIter<'a> {
cur_val: Result<Huge, Error>,
buf: &'a [u8],
}
impl<'a> HugeResultIter<'a> {
fn next(&mut self) -> Option<&Huge> {
parse_huge_result(&mut self.buf, &mut self.cur_val);
self.cur_val.as_ref().ok()
}
fn next_copy(&mut self) -> Option<&Huge> {
parse_huge_result_copy(&mut self.buf, &mut self.cur_val);
self.cur_val.as_ref().ok()
}
fn next_copy_return(&mut self) -> Option<&Huge> {
self.cur_val = parse_huge_result_copy_return(&mut self.buf);
self.cur_val.as_ref().ok()
}
}
fn parse_huge(src: &mut &[u8], dest: &mut Huge) -> Result<(), Error> {
if src.len() < DATA_SIZE { return Err(Error::new(ErrorKind::UnexpectedEof, "OH NO")) }
(&mut dest.data[..]).write_all(&src[..DATA_SIZE])?;
*src = &src[DATA_SIZE..];
Ok(())
}
fn parse_huge_result(src: &mut &[u8], dest: &mut Result<Huge, Error>) {
if src.len() < DATA_SIZE {
*dest = Err(Error::new(ErrorKind::UnexpectedEof, "OH NO"));
return;
}
let mut result = Ok(());
if let Ok(ref mut val) = *dest {
result = (&mut val.data[..]).write_all(&src[..DATA_SIZE]);
*src = &src[DATA_SIZE..];
}
if let Err(e) = result {
*dest = Err(e);
}
}
fn parse_huge_result_copy(src: &mut &[u8], dest: &mut Result<Huge, Error>) {
if src.len() < DATA_SIZE {
*dest = Err(Error::new(ErrorKind::UnexpectedEof, "OH NO"));
return;
}
unsafe {
let mut val = Huge { data: ::std::mem::uninitialized() };
let result = (&mut val.data[..]).write_all(&src[..DATA_SIZE]);
if let Err(e) = result {
*dest = Err(e);
} else {
*dest = Ok(val);
}
*src = &src[DATA_SIZE..];
}
}
fn parse_huge_result_copy_return(src: &mut &[u8]) -> Result<Huge, Error> {
if src.len() < DATA_SIZE { return Err(Error::new(ErrorKind::UnexpectedEof, "OH NO")); }
unsafe {
let mut val = Huge { data: ::std::mem::uninitialized() };
(&mut val.data[..]).write_all(&src[..DATA_SIZE])?;
*src = &src[DATA_SIZE..];
Ok(val)
}
}
#[bench]
fn bench_huge(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeIter { cur_val: Huge { data: [0; 200] }, buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
#[bench]
fn bench_huge_roundtrip_caller(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeIter { cur_val: Huge { data: [0; 200] }, buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next_roundtrip_caller() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
#[bench]
fn bench_huge_result(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeResultIter { cur_val: Ok(Huge { data: [0; 200] }), buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
#[bench]
fn bench_huge_result_copy(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeResultIter { cur_val: Ok(Huge { data: [0; 200] }), buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next_copy() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
#[bench]
fn bench_huge_result_copy_return(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeResultIter { cur_val: Ok(Huge { data: [0; 200] }), buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next_copy_return() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
The difference between EDIT: For doing the EDIT2: I've added |
This comment has been minimized.
This comment has been minimized.
|
Yes, that matches what I'd been thinking the solution would be. (not sure on the precise MIR details) To be more concrete: fn callee()->Result<T, E>;would lower to (using an &uninit to represent a pointer that doesn't have a destructor) fn callee_abi(ok: &uninit T, err: &uninit E) -> bool;So that the callee knows that it can freely clobber However the hard part isn't the callee, but rather the caller. It needs to prove that it's semantically valid to feed its locals into such an ABI. Consider for instance: fn caller() -> Result<(), E> {
let mut x = vec![1, 2, 3];
println!("{:?}", x);
x = callee()?;
println!("{:?}", x);
}Naively this desugars to: fn caller_abi(err: &uninit E) -> bool {
let mut x = vec![1, 2, 3];
println!("{:?}", x);
let temp_ok;
if !callee_abi(&uninit temp_ok, err) {
return false;
}
drop(x); // Drop the old value *after* the call!
x = temp_ok;
println!("{:?}", x);
drop(x);
return true;
}But what we really want is: fn caller_abi(err: &uninit E) -> bool {
let mut x = vec![1, 2, 3];
println!("{:?}", x);
drop(x); // Drop the old value *before* the call
if !callee_abi(&uninit x, err) {
return false;
}
println!("{:?}", x);
drop(x);
return true;
}Specifically Rust isn't within its rights to arbitrarily reorder destructors between calls, especially when Currently the real-world use-case that's motivating this issue (webrender's display list deserializing iterator) has several nice properties that make this transform easier:
But the last case isn't something I can fully guarantee in the future. In particular we might add a (rare) case to our top-level enum that has a destructor, making the whole thing have a destructor. It's possible we could do a match to check if the enum is "currently POD" ("dynamically needs_drop") and then take a fast no-copy path or a slow path with a temporary? But that's a really sketchy application-specific optimization. |
This comment has been minimized.
This comment has been minimized.
|
Oh you will also need to answer the question of if the fn foo() -> Result<T, E>;
let result = box foo();to be lowered to let result = malloc();
let is_ok = foo_abi(&uninit result.repr.ok_payload, &uninit result.repr.err_payload);
result.repr.tag = if is_ok { Ok } else { Err };On balance I think the answer should be "no" (would probably pessimize the callee too much?) |
This comment has been minimized.
This comment has been minimized.
|
Potential steps that we could take to achieve this: pub fn foo() -> u64 {
let mut x = 0;
// Doesn't currently propagate destination because of the borrow.
drop(&mut x);
x
}pub struct Newtype<T>(T);
pub fn foo() -> Newtype<u64> {
let mut x = 0;
drop(&mut x);
// Does not currently propagate the destination because of the field projection.
Newtype(x)
}The last example should generalize to all enums (random note: deaggregation doesn't work on tuples for some reason? maybe it doesn't work on arrays either). EDIT: wait, I said "doesn't currently", but we don't seem to actually have any kind of destination propagation MIR optimization. |
This comment has been minimized.
This comment has been minimized.
|
@Gankro If they can't alias we can't actually use this with a regular |
This comment has been minimized.
This comment has been minimized.
|
So I've been working on a |
This comment has been minimized.
This comment has been minimized.
|
So here's the best counter-argument I can come up with for aliasing T and E: fn foo() -> Result<Vec<T>, E> {
let mut result = vec![];
for x in values() {
// I forget how placers work but I want to use them for p e r f
result.place_back() <- try_process(x)?;
}
Ok(result)
}lowers to: fn foo_abi(ok: &uninit Vec<T>, err: &uninit E) -> bool {
*ok = vec![];
for x in values() {
let place = result.place_back();
let temp_err;
if !try_process_abi(&place, &temp_err, x) {
// Can't construct in-place because we can't reorder drop of `ok`
let temp_err2 = convert_into_other_type(temp_err);
drop_in_place(ok);
*err = temp_err2;
return false;
}
return true;
}In this example we see |
This comment has been minimized.
This comment has been minimized.
|
FWIW I hope not to expose the separate locations which would hamper that optimization indeed. However, that's closer to today than guaranteeing the variants don't overlap, and I don't mind being conservative while trying such a scheme out. |
This comment has been minimized.
This comment has been minimized.
|
EDIT: I just realized that the measurements in that comment aren't what I remember, that's weird. |
Gankro commentedMay 16, 2017
The overhead seems to mostly be extra memcopies.
Method
I tested three cases:
direct() -> Result<Huge, ()>direct_boxed() -> Result<Box<Huge>, ()>indirect(&mut Huge) -> Result<(), ()>Click For Code
Bench Results:
ASM
ASM for bench_direct_boxed's closure
ASM for bench_direct's closure
ASM for bench_indirect's closure