New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastcomp: extractvalue generates unaligned loads #3070

Closed
waywardmonkeys opened this Issue Dec 10, 2014 · 12 comments

Comments

Projects
None yet
4 participants
@waywardmonkeys
Contributor

waywardmonkeys commented Dec 10, 2014

Here's a pattern that pops up in code from embind:

 $method = $method | 0;
 $wireThis = $wireThis | 0;
 var $$01$i$i = 0, $$field = 0, $$field2 = 0, $10 = 0, $14 = 0, $15 = 0, $22 = 0, $24 = 0, $25 = 0, $28 = 0, $34 = 0, $35 = 0, $40 = 0, $42 = 0, $45 = 0, $46 = 0, $51 = 0, $8 = 0, $9 = 0, $x$01$i$i$i$i$i$i$i$i = 0, sp = 0;
 sp = STACKTOP;
 $$field = HEAPU8[$method >> 0] | HEAPU8[$method + 1 >> 0] << 8 | HEAPU8[$method + 2 >> 0] << 16 | HEAPU8[$method + 3 >> 0] << 24;
 $$field2 = HEAPU8[$method + 4 >> 0] | HEAPU8[$method + 5 >> 0] << 8 | HEAPU8[$method + 6 >> 0] << 16 | HEAPU8[$method + 7 >> 0] << 24;
 if (!($$field2 & 1)) $8 = $$field; else $8 = HEAP32[(HEAP32[$wireThis + ($$field2 >> 1) >> 2] | 0) + $$field >> 2] | 0;
 $9 = FUNCTION_TABLE_ii[$8 & 2047]($wireThis + ($$field2 >> 1) | 0) | 0;
 ...

We can see that $$field and $$field2 are unaligned reads.

That's from this C++:

        template<typename MemberPointer,
                 typename ReturnType,
                 typename ThisType,
                 typename... Args>
        struct MethodInvoker {
            static typename internal::BindingType<ReturnType>::WireType invoke(
                const MemberPointer& method,
                typename internal::BindingType<ThisType>::WireType wireThis,
                typename internal::BindingType<Args>::WireType... args
            ) {
                return internal::BindingType<ReturnType>::toWireType(
                    (internal::BindingType<ThisType>::fromWireType(wireThis)->*method)(
                        internal::BindingType<Args>::fromWireType(args)...
                    )
                );
            }
        };

And this is the LLVM IR involved:

  %1 = load { i32, i32 }* %method, align 4, !tbaa !22
  %2 = extractvalue { i32, i32 } %1, 1
  %3 = ashr i32 %2, 1
  %4 = bitcast %"class.northstar::PhysicsSkeletonComponent"* %wireThis to i8*
  %5 = getelementptr inbounds i8* %4, i32 %3
  %6 = bitcast i8* %5 to %"class.northstar::PhysicsSkeletonComponent"*
  %7 = extractvalue { i32, i32 } %1, 0
  %8 = and i32 %2, 1
  %9 = icmp eq i32 %8, 0
  br i1 %9, label %16, label %10

Now, the old (non-fastcomp) compiler pretended everything in an extractvalue was aligned:

              case 'extractvalue': { // XXX we assume 32-bit alignment in extractvalue/insertvalue,
                                     // but in theory they can run on packed structs too (see use getStructuralTypePartBits)

However, in fastcomp, there is the ExpandStructRegs pass in lib/Transforms/NaCl/ExpandStructRegs.cpp and it throws away the alignment in ProcessLoadOrStoreAttrs:

  // Make a pessimistic assumption about alignment.  Preserving
  // alignment information here is tricky and is not really desirable
  // for PNaCl because mistakes here could lead to non-portable
  // behaviour.
  Dest->setAlignment(1);

All of that said ... if I try to preserve the alignment by modifying the above to be:

  Dest->setAlignment(Src->getAlignment());

Then it ends up generating this JS for the $$field and $$field2 loads:

 $$field = HEAP32[$method >> 2] | 0;
 $$field2 = HEAP32[$method + 4 >> 2] | 0;
@waywardmonkeys

This comment has been minimized.

Show comment
Hide comment
@waywardmonkeys
Contributor

waywardmonkeys commented Dec 10, 2014

@waywardmonkeys

This comment has been minimized.

Show comment
Hide comment
@waywardmonkeys

waywardmonkeys Dec 10, 2014

Contributor

The code that emits this stuff in the first place for a member function pointer load is here:

https://github.com/kripken/emscripten-fastcomp-clang/blob/562a45b54cb36f20c87b20fe035a66130d740bf7/lib/CodeGen/ItaniumCXXABI.cpp#L265

Contributor

waywardmonkeys commented Dec 10, 2014

The code that emits this stuff in the first place for a member function pointer load is here:

https://github.com/kripken/emscripten-fastcomp-clang/blob/562a45b54cb36f20c87b20fe035a66130d740bf7/lib/CodeGen/ItaniumCXXABI.cpp#L265

@waywardmonkeys

This comment has been minimized.

Show comment
Hide comment
@waywardmonkeys

waywardmonkeys Dec 10, 2014

Contributor

In the IMVU codebase, this change to use aligned loads appears to result in an 80k size reduction in the minified build and nearly 200k in the -g3 debug build.

Contributor

waywardmonkeys commented Dec 10, 2014

In the IMVU codebase, this change to use aligned loads appears to result in an 80k size reduction in the minified build and nearly 200k in the -g3 debug build.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Dec 11, 2014

Owner

I don't think we can just assume full alignment, but taking into account the struct's alignment plus the field's alignment inside it should give the right result. I don't remember how to do that in llvm, but there is a method that does this, I am fairly sure. Likely @sunfishcode would know off the top of his head.

Owner

kripken commented Dec 11, 2014

I don't think we can just assume full alignment, but taking into account the struct's alignment plus the field's alignment inside it should give the right result. I don't remember how to do that in llvm, but there is a method that does this, I am fairly sure. Likely @sunfishcode would know off the top of his head.

@chadaustin

This comment has been minimized.

Show comment
Hide comment
@chadaustin

chadaustin Dec 11, 2014

Collaborator

Isn't there an option to assume full alignment? The old compiler had an option like that, iirc. It's undefined behavior in C to rely on unaligned reads so, at least for our use case, I would prefer ALL memory operations have full alignment.

Collaborator

chadaustin commented Dec 11, 2014

Isn't there an option to assume full alignment? The old compiler had an option like that, iirc. It's undefined behavior in C to rely on unaligned reads so, at least for our use case, I would prefer ALL memory operations have full alignment.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Dec 11, 2014

Owner

There isn't currently an option for force full alignment in fastcomp, sounds ok to me to add one.

Owner

kripken commented Dec 11, 2014

There isn't currently an option for force full alignment in fastcomp, sounds ok to me to add one.

@sunfishcode

This comment has been minimized.

Show comment
Hide comment
@sunfishcode

sunfishcode Dec 11, 2014

Collaborator

Offfhand, I don't think Emscripten on asmjs-unknown-emscripten ever needs to discard alignment information, in contrast to the needs of PNaCl which does. I think we can do "full alignment" by default.

Collaborator

sunfishcode commented Dec 11, 2014

Offfhand, I don't think Emscripten on asmjs-unknown-emscripten ever needs to discard alignment information, in contrast to the needs of PNaCl which does. I think we can do "full alignment" by default.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Dec 11, 2014

Owner

Yes, we don't need to discard alignment information, as we can depend on JS portability, unlike PNaCl. We should preserve alignment information here, as mentioned above.

However, I don't think we can do full alignment by default. While undefined behavior, it would break many real-world apps, e.g. Cube 2/BananaBread. But again, adding an option for full alignment sounds fine.

Owner

kripken commented Dec 11, 2014

Yes, we don't need to discard alignment information, as we can depend on JS portability, unlike PNaCl. We should preserve alignment information here, as mentioned above.

However, I don't think we can do full alignment by default. While undefined behavior, it would break many real-world apps, e.g. Cube 2/BananaBread. But again, adding an option for full alignment sounds fine.

@waywardmonkeys

This comment has been minimized.

Show comment
Hide comment
@waywardmonkeys

waywardmonkeys Feb 26, 2015

Contributor

Can we talk about this again?

Contributor

waywardmonkeys commented Feb 26, 2015

Can we talk about this again?

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Feb 27, 2015

Owner

Sure. Is there new data or other info? Re-reading back here, I would be interested to see whether Cube 2 does in fact break with full alignment. I'm 99% sure it would, but maybe I am missing something. That could easily change my position here.

Owner

kripken commented Feb 27, 2015

Sure. Is there new data or other info? Re-reading back here, I would be interested to see whether Cube 2 does in fact break with full alignment. I'm 99% sure it would, but maybe I am missing something. That could easily change my position here.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Mar 20, 2015

Owner

The pnacl pass code looks fixed on the merge-pnacl-mar-13-2015 branches, so we might see this fixed next week or so if those merge to incoming.

Owner

kripken commented Mar 20, 2015

The pnacl pass code looks fixed on the merge-pnacl-mar-13-2015 branches, so we might see this fixed next week or so if those merge to incoming.

@waywardmonkeys

This comment has been minimized.

Show comment
Hide comment
@waywardmonkeys

waywardmonkeys Apr 13, 2015

Contributor

Oddly, while the LLVM IR still has some extractvalues without alignment, it seems like the generated JS is much better:

function __ZN10emscripten8internal13MethodInvokerIMN9northstar6CanvasEFvRKN4gmtl3VecIiLj2EEEEvPS3_JS8_EE6invokeERKSA_SB_PS6_(i3, i1, i2) {
 i3 = i3 | 0;
 i1 = i1 | 0;
 i2 = i2 | 0;
 var i4 = 0;
 i4 = HEAP32[i3 >> 2] | 0;
 i3 = HEAP32[i3 + 4 >> 2] | 0;
 if (i3 & 1) i4 = HEAP32[(HEAP32[i1 + (i3 >> 1) >> 2] | 0) + i4 >> 2] | 0;
 FUNCTION_TABLE_vii[i4 & 1023](i1 + (i3 >> 1) | 0, i2);
 return;
}

Closing.

Contributor

waywardmonkeys commented Apr 13, 2015

Oddly, while the LLVM IR still has some extractvalues without alignment, it seems like the generated JS is much better:

function __ZN10emscripten8internal13MethodInvokerIMN9northstar6CanvasEFvRKN4gmtl3VecIiLj2EEEEvPS3_JS8_EE6invokeERKSA_SB_PS6_(i3, i1, i2) {
 i3 = i3 | 0;
 i1 = i1 | 0;
 i2 = i2 | 0;
 var i4 = 0;
 i4 = HEAP32[i3 >> 2] | 0;
 i3 = HEAP32[i3 + 4 >> 2] | 0;
 if (i3 & 1) i4 = HEAP32[(HEAP32[i1 + (i3 >> 1) >> 2] | 0) + i4 >> 2] | 0;
 FUNCTION_TABLE_vii[i4 & 1023](i1 + (i3 >> 1) | 0, i2);
 return;
}

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment