Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced performance when using question mark operator instead of `try!` #37939

Open
birkenfeld opened this Issue Nov 22, 2016 · 18 comments

Comments

Projects
None yet
@birkenfeld
Copy link
Contributor

birkenfeld commented Nov 22, 2016

This was reported on the users forum , and I don't want it to get lost. Basically, replacing try! by ? resulted in ~20% performance loss in benchmarks:

ratel-rust/ratel-core#48 (comment)

I've reproduced, but not further investigated, these findings. Is that expected right now? It's not a good argument for adopting the question mark :)

@arthurprs

This comment has been minimized.

Copy link
Contributor

arthurprs commented Nov 24, 2016

I didn't read the implementation until now, I though they expanded to the exact same thing as try!()

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Nov 24, 2016

Might just need to stick a couple of #[inline]s on Carrier methods.

@StefanoD

This comment has been minimized.

Copy link

StefanoD commented Nov 26, 2016

I'm curious why this issue doesn't even get a label like I-Slow.

@birkenfeld

This comment has been minimized.

Copy link
Contributor Author

birkenfeld commented Dec 3, 2016

@sfackler I tried this (#[inline] on the trait method definitions and the impl methods), but didn't seem to get any different bench numbers.

@dotdash dotdash added the I-slow label Dec 8, 2016

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Dec 8, 2016

I've found out why this is slow. Suppose you have an expression res? of type Result<U, V>.
In librustc/hir/lowering.rs it will be desugared to:

match Carrier::translate(res) {
    Ok(val) => val,
    Err(err) => return Carrier::from_error(From::from(err)),
}

The real culprit is Carrier::translate. It contains a match expression itself, which doesn't get optimized away, even though it's implemented for Result<U, V> as the identity function (in src/libcore/ops.rs):

impl<U, V> Carrier for Result<U, V> {
    type Success = U;
    type Error = V;

    fn from_success(u: U) -> Result<U, V> {
        Ok(u)
    }

    fn from_error(e: V) -> Result<U, V> {
        Err(e)
    }

    fn translate<T>(self) -> T
        where T: Carrier<Success=U, Error=V>
    {
        match self {
            Ok(u) => T::from_success(u),
            Err(e) => T::from_error(e),
        }
    }
}

In lowering.rs we have this piece of desugaring process:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        let sub_expr = self.lower_expr(sub_expr);

                        let path = &["ops", "Carrier", "translate"];
                        let path = P(self.expr_std_path(unstable_span, path, ThinVec::new()));
                        P(self.expr_call(e.span, path, hir_vec![sub_expr]))
                    };

Suppose we remove translation and implement the piece simply like this:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        P(self.lower_expr(sub_expr))
                    };

If we do that, there will be no slowdown in ratel-core due to question mark operators (I benchmarked it).
Of course, this is a non-solution and the problem is that rustc and LLVM don't optimize away the redundant match statement.

I created a simple example that demonstrates the issue: https://is.gd/Q7PGzm
Compile into ASM in Release mode. Function identity compiles to:

identity:
	.cfi_startproc
	xorl	%eax, %eax
	cmpq	$0, (%rsi)
	movq	8(%rsi), %rcx
	setne	%al
	movq	%rax, (%rdi)
	movq	%rcx, 8(%rdi)
	movq	%rdi, %rax
	retq

You can see there are unncesseary cmpq and setne instructions. The function should ideally (in order to be fast) simply return the input argument using a few move instructions, that's all.

I'm a total beginner at rustc, so no idea how to proceed further. Should we perhaps detect identity functions within a MIR optimization pass?

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Dec 11, 2016

@arielb1 I've implemented a test version of the "variant copy prop" by hacking into CopyPropagation. However @eddyb has unfortunately 😉 said that it's much better to make a more general version of that that can propagate copies of aggregates (somehow).

scottlamb added a commit to scottlamb/moonfire-nvr that referenced this issue Feb 27, 2017

improve build_index performance by 5-10%
I just switched a couple inner loop ?s back to try!(...) to work around
rust-lang/rust#37939

frewsxcv added a commit to frewsxcv/rust that referenced this issue Jun 1, 2017

Rollup merge of rust-lang#42275 - scottmcm:try-trait, r=nikomatsakis
Lower `?` to `Try` instead of `Carrier`

The easy parts of rust-lang/rfcs#1859, whose FCP completed without further comments.

Just the trait and the lowering -- neither the error message improvements nor the insta-stable impl for Option nor exhaustive docs.

Based on a [github search](https://github.com/search?l=rust&p=1&q=question_mark_carrier&type=Code&utf8=%E2%9C%93), this will break the following:

- https://github.com/pfpacket/rust-9p/blob/00206e34c680198a0ac7c2f066cc2954187d4fac/src/serialize.rs#L38
- https://github.com/peterdelevoryas/bufparse/blob/b1325898f4fc2c67658049196c12da82548af350/src/result.rs#L50

The other results appear to be files from libcore or its tests.  I could also leave Carrier around after stage0 and `impl<T:Carrier> Try for T` if that would be better.

r? @nikomatsakis

Edit: Oh, and it might accidentally improve perf, based on rust-lang#37939 (comment), since `Try::into_result` for `Result` is an obvious no-op, unlike `Carrier::translate`.

frewsxcv added a commit to frewsxcv/rust that referenced this issue Jun 1, 2017

Rollup merge of rust-lang#42275 - scottmcm:try-trait, r=nikomatsakis
Lower `?` to `Try` instead of `Carrier`

The easy parts of rust-lang/rfcs#1859, whose FCP completed without further comments.

Just the trait and the lowering -- neither the error message improvements nor the insta-stable impl for Option nor exhaustive docs.

Based on a [github search](https://github.com/search?l=rust&p=1&q=question_mark_carrier&type=Code&utf8=%E2%9C%93), this will break the following:

- https://github.com/pfpacket/rust-9p/blob/00206e34c680198a0ac7c2f066cc2954187d4fac/src/serialize.rs#L38
- https://github.com/peterdelevoryas/bufparse/blob/b1325898f4fc2c67658049196c12da82548af350/src/result.rs#L50

The other results appear to be files from libcore or its tests.  I could also leave Carrier around after stage0 and `impl<T:Carrier> Try for T` if that would be better.

r? @nikomatsakis

Edit: Oh, and it might accidentally improve perf, based on rust-lang#37939 (comment), since `Try::into_result` for `Result` is an obvious no-op, unlike `Carrier::translate`.

frewsxcv added a commit to frewsxcv/rust that referenced this issue Jun 1, 2017

Rollup merge of rust-lang#42275 - scottmcm:try-trait, r=nikomatsakis
Lower `?` to `Try` instead of `Carrier`

The easy parts of rust-lang/rfcs#1859, whose FCP completed without further comments.

Just the trait and the lowering -- neither the error message improvements nor the insta-stable impl for Option nor exhaustive docs.

Based on a [github search](https://github.com/search?l=rust&p=1&q=question_mark_carrier&type=Code&utf8=%E2%9C%93), this will break the following:

- https://github.com/pfpacket/rust-9p/blob/00206e34c680198a0ac7c2f066cc2954187d4fac/src/serialize.rs#L38
- https://github.com/peterdelevoryas/bufparse/blob/b1325898f4fc2c67658049196c12da82548af350/src/result.rs#L50

The other results appear to be files from libcore or its tests.  I could also leave Carrier around after stage0 and `impl<T:Carrier> Try for T` if that would be better.

r? @nikomatsakis

Edit: Oh, and it might accidentally improve perf, based on rust-lang#37939 (comment), since `Try::into_result` for `Result` is an obvious no-op, unlike `Carrier::translate`.
@kdy1

This comment has been minimized.

Copy link

kdy1 commented Sep 28, 2017

I think this issue is fixed.

Generated asm is same for try! and ?.

https://play.rust-lang.org/?gist=625d88df305ace951a088e1cee2ec13a

Edit: Typo

@StefanoD

This comment has been minimized.

Copy link

StefanoD commented Sep 28, 2017

Even if this is fixed: Is there a unit test which prevents regression?

@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Sep 28, 2017

We could have a codegen test for this.

Someone tag this as E-needstest please...

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Sep 28, 2017

It looks like they both have the same bad code generation now, needs looking into.

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Sep 28, 2017

Example where it is compared with the identity function. Returning Ok(x?) is the same as the identity for the result. playground link

But it seems this example is not a good condensation of the issue, because I can't find any previous Rust version (in rust.godbolt) that has the desired identity function code gen even for the try!() macro.

It doesn't show the difference between try and ?, but it shows something we can fix to improve them both.


Edited: Update to another example code link (configurable Result type).

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Sep 28, 2017

@bluss

I think https://reviews.llvm.org/D37216 should fix this, but it's a little stuck in the LLVM review queue.

@mati865

This comment has been minimized.

Copy link
Contributor

mati865 commented Feb 1, 2018

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Feb 1, 2018

@mati865

I know that. It has a bug that I couldn't quite pin out.

@kennytm kennytm removed the E-needstest label Feb 1, 2018

@dtolnay dtolnay referenced this issue Mar 20, 2018

Merged

Support for Flattening #1179

8 of 8 tasks complete
@kornelski

This comment has been minimized.

Copy link
Contributor

kornelski commented Apr 24, 2018

@kennytm kennytm added the WG-codegen label Apr 24, 2018

@mehcode

This comment has been minimized.

Copy link
Contributor

mehcode commented Aug 31, 2018

This seems to be fixed (in stable). Tried the play link above.

; Function Attrs: noinline norecurse nounwind readnone uwtable
define { i64, i64 } @try_op(i64, i64) unnamed_addr #2 {
  %3 = tail call { i64, i64 } @try_macro(i64 %0, i64 %1) #2
  ret { i64, i64 } %3
}

try_op:
	jmp	try_macro

try_macro:
	xorl	%eax, %eax
	testq	%rdi, %rdi
	setne	%al
	movq	%rsi, %rdx
	retq

@kennytm kennytm added the E-needstest label Aug 31, 2018

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Dec 1, 2018

You can still provoke it to make a difference and introduce conditionals or more copies for the ? operator than for the try!() macro if you change the payload types for the Result, for example using (i32, i32) or String.

type T = (i32, i32);
type E = T;
type R = Result<T, E>;

#[no_mangle]
pub fn try_op(a: R) -> R {
    Ok(a?)
}

#[no_mangle]
pub fn try_macro(a: R) -> R {
    Ok(try!(a))
}

@nikic nikic added the A-LLVM label Dec 1, 2018

@mati865

This comment has been minimized.

Copy link
Contributor

mati865 commented Apr 3, 2019

Looks like it regressed: https://godbolt.org/z/awKD_U

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.