Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unfortunate optimization with Option<bool> vs Option<uint> #20149

Closed
alexcrichton opened this issue Dec 23, 2014 · 11 comments
Closed

Unfortunate optimization with Option<bool> vs Option<uint> #20149

alexcrichton opened this issue Dec 23, 2014 · 11 comments
Labels
A-codegen Area: Code generation

Comments

@alexcrichton
Copy link
Member

Currently, given this program:

#[inline(never)]
fn get_bool(slot: &mut Option<bool>) -> &mut bool {
    if slot.is_none() {
        *slot = Some(true);
    }
    slot.as_mut().unwrap()
}

#[inline(never)]
fn get_uint(slot: &mut Option<uint>) -> &mut uint {
    if slot.is_none() {
        *slot = Some(1);
    }
    slot.as_mut().unwrap()
}

fn main() {
    println!("{}", get_bool(&mut None));
    println!("{}", get_uint(&mut None));
}

It generates this IR.

The "unfortunate" part here is that a call to failure is generated for the get_bool function where no failure is generated for the get_uint function.

There's a lot going on here so I'm not sure if it's our codegen or LLVM's codegen, but I would personally expect both cases to optimize down to never unwinding because it's known that neither Option can ever be None.

@alexcrichton
Copy link
Member Author

cc @luqmana, @dotdash, @Aatch

@alexcrichton alexcrichton added the A-codegen Area: Code generation label Dec 23, 2014
@dotdash
Copy link
Contributor

dotdash commented Dec 23, 2014

Seems to happen for all integer types smaller than int/uint

@dotdash
Copy link
Contributor

dotdash commented Dec 23, 2014

Noticable difference is that the uint version does two independent stores, one for the discriminant and one for the value itself, but the versions with the smaller data types have a store for the whole Option value at once:

  store %"enum.core::option::Option<[bool]>[#3]" { i8 1, [0 x i8] undef, [1 x i8] c"\01" }, %"enum.core::option::Option<[bool]>[#3]"* %0, align 1

And I guess the pass responsible for optimizing the icmp away can't handle that.

Since we originally emit independent stores, I guess this should be handled in LLVM upstream.

@dotdash
Copy link
Contributor

dotdash commented Dec 23, 2014

OTOH I can't reproduce the problem with clang. So maybe we could do something about the way unions are represented in LLVM types.

@dotdash
Copy link
Contributor

dotdash commented Dec 23, 2014

OK, it's something completely different. The assignment happens via store for Option<bool> etc. but via memcpy for Option<uint>. Only the memcpy gets optimized.

@Aatch
Copy link
Contributor

Aatch commented Dec 23, 2014

Because Option<bool> is only two bytes, it's an immediate. Hence why we don't use memcpy to copy it. I don't know why LLVM can optimize one and not the other though. All I can think is that the ordering of the passes is such that it combines the stores for the bool case before LLVM realises that the panic case is impossible.

@Aatch
Copy link
Contributor

Aatch commented Dec 23, 2014

There is a comment in expr.rs that seems suspicious. Commenting out the line causes get_bool to optimised as expected. Since it references morestack, I'm curious if the comment is still accurate today, as it suggests that it may be related to segmented stacks.

Maybe whoever wrote the comment could chime in? The library split has made it difficult to find out who though.

@Aatch
Copy link
Contributor

Aatch commented Dec 23, 2014

This commit added the comment in question: ee4ba449

@nikomatsakis can you confirm that it was related to segmented stacks?

@dotdash
Copy link
Contributor

dotdash commented Jan 2, 2015

http://reviews.llvm.org/D6829 -- This teaches LLVM to handle the aggregate and extract the loaded value from it, allowing it to optimize the code even without #20161. It also handles the example I presented there where the deferred load breaks an optimization.

@alexcrichton
Copy link
Member Author

Wow, thanks so much for looking into this @dotdash!

dotdash added a commit to dotdash/rust that referenced this issue Jan 11, 2015
Currently, small aggregates are passed to functions as immediate values
as is. This has two consequences.

One is that aggregates are passed component-wise by LLVM, so e.g. a
struct containing four u8 values (e.g. an RGBA struct) will be passed as
four individual values.

The other is that LLVM isn't very good at optimizing loads/stores of
first class attributes. What clang does is converting the aggregate to
an appropriately sized integer type (e.g. i32 for the four u8 values),
and using that for the function argument. This allows LLVM to create
code that is a lot better.

Fixes rust-lang#20450 rust-lang#20149 rust-lang#16506 rust-lang#13927
bors added a commit that referenced this issue Jan 11, 2015
Currently, small aggregates are passed to functions as immediate values
as is. This has two consequences.
    
One is that aggregates are passed component-wise by LLVM, so e.g. a
struct containing four u8 values (e.g. an RGBA struct) will be passed as
four individual values.
    
The other is that LLVM isn't very good at optimizing loads/stores of
first class attributes. What clang does is converting the aggregate to
an appropriately sized integer type (e.g. i32 for the four u8 values),
and using that for the function argument. This allows LLVM to create
code that is a lot better.
    
Fixes #20450 #20149 #16506 #13927
bors added a commit that referenced this issue Jan 11, 2015
Currently, small aggregates are passed to functions as immediate values
as is. This has two consequences.
    
One is that aggregates are passed component-wise by LLVM, so e.g. a
struct containing four u8 values (e.g. an RGBA struct) will be passed as
four individual values.
    
The other is that LLVM isn't very good at optimizing loads/stores of
first class attributes. What clang does is converting the aggregate to
an appropriately sized integer type (e.g. i32 for the four u8 values),
and using that for the function argument. This allows LLVM to create
code that is a lot better.
    
Fixes #20450 #20149 #16506 #13927
@huonw
Copy link
Member

huonw commented Jan 11, 2015

Closed by #20755.

@huonw huonw closed this as completed Jan 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants