Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal code for struct of floats #1842

Open
JohanEngelen opened this issue Oct 19, 2016 · 2 comments
Open

Suboptimal code for struct of floats #1842

JohanEngelen opened this issue Oct 19, 2016 · 2 comments

Comments

@JohanEngelen
Copy link
Member

struct Vector { float x, y, z; }
float silly(Vector v) { return v.x * 5; }
float better(float x) { return x * 5; }

results in (-O3):

float example.silly(example.Vector):
        movd    %xmm0, %rax
        movd    %eax, %xmm0
        mulss   LCPI2_0(%rip), %xmm0
        retq

float example.better(float):
        mulss   LCPI3_0(%rip), %xmm0
        retq

missed optimization opportunity, or am I missing something?

@kinke
Copy link
Member

kinke commented Oct 19, 2016

First of all, this is most likely specific to the x86_64 System V ABI. I tend to blame DMD's toArgTypes() for this, rewriting the 3 floats to 2 doubles in order to pass it in a XMM register. I guess 4 floats would be more adequate and help LLVM. Unoptimized IR:

define float @_D7current5sillyFS7current6VectorZf({ double, double } %v_arg) #0 comdat {
  %.X86_64_C_struct_rewrite_dump = alloca { double, double }, align 4 ; [#uses = 2, size/byte = 16]
  store { double, double } %v_arg, { double, double }* %.X86_64_C_struct_rewrite_dump
  %v = bitcast { double, double }* %.X86_64_C_struct_rewrite_dump to %current.Vector* ; [#uses = 1]
  %1 = getelementptr inbounds %current.Vector, %current.Vector* %v, i32 0, i32 0 ; [#uses = 1, type = float*]
  %2 = load float, float* %1                      ; [#uses = 1]
  %3 = fmul float %2, 5.000000e+00                ; [#uses = 1]
  ret float %3
}

-O3:

define float @_D7current5sillyFS7current6VectorZf({ double, double } %v_arg) local_unnamed_addr #2 comdat {
  %v_arg.fca.0.extract = extractvalue { double, double } %v_arg, 0 ; [#uses = 1]
  %1 = bitcast double %v_arg.fca.0.extract to i64 ; [#uses = 1]
  %2 = trunc i64 %1 to i32                        ; [#uses = 1]
  %3 = bitcast i32 %2 to float                    ; [#uses = 1]
  %4 = fmul float %3, 5.000000e+00                ; [#uses = 1]
  ret float %4
}

@dnadlinger
Copy link
Member

Clang rewrites it as (<2 x float> %f.coerce0, float %f.coerce1), completely flattening away the struct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants