Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missoptimization of functions with C calling convention #33868

Closed
Florob opened this Issue May 25, 2016 · 14 comments

Comments

Projects
None yet
8 participants
@Florob
Copy link
Contributor

Florob commented May 25, 2016

When trying to look at how aggregates are passed to functions I came across the following:
The code presented below erroneously outputs 488447261 instead of 20.
This is using the 1.9.0 preview release (or anything newer) and seems to be a regression from 1.8.0.
The behaviour only occurs when using the C calling conventions and optimizations, eliminating either factor gives the correct result.

#[repr(C)]
struct S {
    a: u32,
    b: f32,
    c: u32
}

extern "C" fn test(s: S) -> u32 {
    s.a + s.b as u32 + s.c
}

fn main() {
    println!("{}", test(S { a: 12, b: 3.4, c: 5 }));
}
@retep998

This comment has been minimized.

Copy link
Member

retep998 commented May 25, 2016

Which target is this for? There are multiple calling conventions across operating systems and architectures after all.

@Florob

This comment has been minimized.

Copy link
Contributor Author

Florob commented May 25, 2016

This is on x86_64-unknown-linux-gnu.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented May 25, 2016

I can confirm this for rustc 1.10.0-nightly (dd6e8d45e 2016-05-23)

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented May 25, 2016

Works on stable.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented May 25, 2016

cc @eddyb could it be your call trans refactor?

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented May 25, 2016

Seems like we invoke some UB:

; Function Attrs: noinline norecurse nounwind readnone uwtable
define i32 @test({ i64, i64 }) unnamed_addr #0 {
entry-block:
  ret i32 undef
}

is what the function ends up being optimised to.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented May 25, 2016

This is what I managed to reduce the issue to.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented May 25, 2016

This suffices:

#[no_mangle]
pub extern "C"  fn test(s: S) -> u32 {
    s.c
}

Bad IR:

define i32 @test({ i64, i64 }) unnamed_addr #0 !dbg !176 {
entry-block:
  %s = alloca %S
  %1 = bitcast %S* %s to { i64, i64 }*
  store { i64, i64 } %0, { i64, i64 }* %1, align 4
  call void @llvm.dbg.declare(metadata %S* %s, metadata !202, metadata !203), !dbg !204
  %2 = getelementptr inbounds %S, %S* %s, i32 0, i32 2
  %3 = load i32, i32* %2, !dbg !205
  ret i32 %3, !dbg !207
}
@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented May 25, 2016

The old, working functionality uses a shim:

  %1 = alloca { i64, i64 }
  store { i64, i64 } %0, { i64, i64 }* %1
  %2 = bitcast { i64, i64 }* %1 to %S*
  %3 = call i32 @_ZN4test10__rust_abiE(%S* noalias nocapture dereferenceable(12) %2)
  ret i32 %3
@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented May 25, 2016

We are storing 16 bytes of data in a 12-byte bag and that is killing us.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented May 25, 2016

For the reference, define i32 @test(i64 %s.coerce0, i32 %s.coerce1) #0 { is the signature generated by clang for an identical function. Alternatively we could use {i64, i32} instead of {i64, i64}.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented May 25, 2016

@arielb1 what was the signature of the shim function? nevermind

@nikomatsakis nikomatsakis added the P-high label May 25, 2016

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented May 25, 2016

Ah, this is the problem @dotdash was aware of, isn't it?
Where none of our current system works perfectly right with respect to cast arguments/returns of different sizes but some of the old code was long-winded enough to avoid the problem.

EDIT: That was actually more about calls, rather than definitions, and I can't get calls to misbehave (we do the same thing there that the old code does AFAICT). One thing that we might need to look into is MIR trans not doing a cast returned value memcpy.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented May 25, 2016

The core team discussed this issue today and our conclusions were as follows:

  • Definitely seems like something we should fix immediately, hence the P-high.
  • We're very likely to backport the fix to beta (unless it turns out to require a compiler refactor, which seems unlikely)
  • We're moving forward with the 1.9 release tomorrow as this has been on nightly for at least 6 weeks and hasn't been discovered yet. That's at least a signal that the breakage isn't major enough to warrant blocking or delaying the entire release.
  • Once the fix is backported to beta and has some testing on the beta branch we can consider whether to make a point release (e.g. 1.9.1). If this bug is widespread in practice then we can definitely do that, but if the impact continues to be low we may not do that.

@bors bors closed this in 97bf80f May 26, 2016

alexcrichton added a commit to alexcrichton/rust that referenced this issue May 26, 2016

srinivasreddy added a commit to srinivasreddy/rust that referenced this issue May 28, 2016

srinivasreddy added a commit to srinivasreddy/rust that referenced this issue May 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.