Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSE2 jpeg decoding segfaults in mingw 32 bit builds #81

Closed
SpartanJ opened this issue Feb 17, 2015 · 20 comments
Closed

SSE2 jpeg decoding segfaults in mingw 32 bit builds #81

SpartanJ opened this issue Feb 17, 2015 · 20 comments

Comments

@SpartanJ
Copy link

Hi Sean!

I received a bug report on one of my libraries that relies on stb_image to do the image decoding. It took me a while to understand what was happening since i couldn't reproduce it, until the user mentioned that it was building for 32bits with mingw32 ( and i was building 64 bit builds ).

It seems that the jpeg SSE2 decoder is failing strangely, since if i build with VC works just fine.
Here's an image of the callstack when fails:
fail.

Here's a project to test the issue.

I've also tested to cross-compile the project from linux and it also happens. I must say that i know nothing about SSE2 so i really can't help much with the bug. And AFAIK it's a 32bit mingw only problem, no problems so far in 32 or 64 bit builds in linux, and if i remember correctly, no problems in os x neither.

Edit:
I tried compiling with "-msse -msse2" enabled but didn't help.

Regards

@nothings
Copy link
Owner

I haven't looked at your project, since I'm not set up for mingw, but:

Isn't mingw a gcc? GCC is supposed to have SIMD disabled since this version: 5c121a9

Can you try sticking #error's in around that code and see what path is being compiled?

Also, I assume your machine supports SSE2 and yet you are getting this crash in 32-bit?

For the other person watching this repo: Fabian, it doesn't seem like this is the gcc sse-enabled thing, if his machine does support SSE (and the call stack shows he's gonna into the SSE branch of the code post-CPU-detection, if I'm not mistaken). Any idea?

@rygorous
Copy link
Collaborator

I would need to know what the actual exception and actual instruction
faulting is. Source-level doesn't cut it. The only way loading a constant
would fail is due to an alignment fault, but when I'm using the _mm_set
intrinsics, an alignment fault on that data is a compiler bug!

-Fabian

On Mon, Feb 16, 2015 at 7:43 PM, Sean Barrett notifications@github.com
wrote:

I haven't looked at your project, since I'm not set up for mingw, but:

Isn't mingw a gcc? GCC is supposed to have SIMD disabled since this
version: 5c121a9
5c121a9

Can you try sticking #error's in around that code and see what path is
being compiled?

Also, I assume your machine supports SSE2 and yet you are getting this
crash in 32-bit?

For the other person watching this repo: Fabian, it doesn't seem like this
is the gcc sse-enabled thing, if his machine does support SSE (and the call
stack shows he's gonna into the SSE branch of the code post-CPU-detection,
if I'm not mistaken). Any idea?


Reply to this email directly or view it on GitHub
#81 (comment).

@rygorous
Copy link
Collaborator

(also the GCC SSE2 detection logic is broken, that stuff needs to happen
inside the x86/x64 check.)

On Mon, Feb 16, 2015 at 8:13 PM, Fabian Giesen rygorous@gmail.com wrote:

I would need to know what the actual exception and actual instruction
faulting is. Source-level doesn't cut it. The only way loading a constant
would fail is due to an alignment fault, but when I'm using the _mm_set
intrinsics, an alignment fault on that data is a compiler bug!

-Fabian

On Mon, Feb 16, 2015 at 7:43 PM, Sean Barrett notifications@github.com
wrote:

I haven't looked at your project, since I'm not set up for mingw, but:

Isn't mingw a gcc? GCC is supposed to have SIMD disabled since this
version: 5c121a9
5c121a9

Can you try sticking #error's in around that code and see what path is
being compiled?

Also, I assume your machine supports SSE2 and yet you are getting this
crash in 32-bit?

For the other person watching this repo: Fabian, it doesn't seem like
this is the gcc sse-enabled thing, if his machine does support SSE (and the
call stack shows he's gonna into the SSE branch of the code
post-CPU-detection, if I'm not mistaken). Any idea?


Reply to this email directly or view it on GitHub
#81 (comment).

@nothings
Copy link
Owner

Yep, there's already a separate bug report for that.

@SpartanJ
Copy link
Author

Sean:

  1. Yes, the SSE2 code path is being compiled and executed, since the SSS2 preprocessor is predefined, so this doesn't disable the SSE2 codepath.
  2. Yes, that's correct ( tested in 3 machines, i7, q6600, and another one ).
  3. I've found a way to use the SSE2 codepath without segfaulting, with the hints about aligment given by Fabian. If i build with -mpreferred-stack-boundary=2 or build with -mstackrealign the problem disappear. I've googled a little, and it seems a known issue:
    I've found the solution here, and some interesting info here and here.

Hope this helps, let me know if i can provide more useful information.

@nothings
Copy link
Owner

We can't be 100% sure without seeing the assembly to know for sure which instruction is failing what's going on or how to work around it. Since we can't force people to set compiler flags, unless there's a pragma to enable it in code, unless we find a workaround we'll just have to disable it in mingw entirely.

But it seems pretty compiler-broken if that stack-alignment problem only happens with mingw and not with gcc (e.g. compiling gcc -msse2 32-bit on Linux).

Also, you said you also tried "-msse -msse2", but does that mean this happened without setting those flags? In that case I'm not sure why the SSE2 #define is happening.

@nothings
Copy link
Owner

Well, reading https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838 it sounds like -mstackrealign is the fix for this in gcc 4.5 for Linux but it's not on by default so it should just potentially fail in gcc for Linux as well, but only if being called by code that wasn't compiled by gcc (or something) because they declared the Linux ABI as always having 16-byte aligned stack. So I guess in practice it works in Linux because of that (mostly), and it's going to fail in mingw (without -mstackrealign) since it probably doesn't enforce a 16-byte-aligned stack since it's running on Windows.

I guess from this: http://www.peterstock.co.uk/games/mingw_sse/ then maybe putting "attribute((force_align_arg_pointer))" on all the topmost function (or just all the SSE functions) is the mingw workaround.

@rygorous
Copy link
Collaborator

I haven't seen the disassembly, so it's hard to be sure, but I've seen a
variant of this problem before.

Basically, newer GCCs (>=4.5) by default assume (and maintain) 16-byte
alignment for the stack pointer, and VC++ doesn't. GCC will generate code
(such as SSE register spills) assuming this is true, and that code blows up
when it's not. 64-bit is not an issue because the Windows 64-bit ABI does
specify 16-byte alignment for RSP.

MinGW code calling other code compiled with MinGW always works, since ESP
stays 16-byte aligned. MinGW calling VC++ works too; the MinGW code keeps
ESP 16-byte aligned, VC++ doesn't but also doesn't care. VC++ calling
MinGW, whether across DLL boundaries or via callbacks (WindowProc etc.)
does not generally work without a fix-up. Really MinGW would need to
generate a thunk here for any functions that are externally visible or have
their address taken, but it doesn't. _beginthread etc works since their
runtime lib does the fix-up, for other code you're on your own.

Essentially, MinGW is intentionally non-conforming to the Windows ABI, in a
way that breaks if you export DLL functions or use callbacks. Which I think
is pretty shitty, but GCC playing by its own rules in that kind of
situation is pretty much par for the course.

Either way, this isn't even a MinGW problem per se. It will work if you
compile a DLL with MinGW and then only use it from MinGW. It really is a
compiler inter-op problem in a very specific use case, and I'm honestly not
sure that the fix belongs in stb_image! I would argue that it makes more
sense to either compile SOIL2 with -mstackrealign or add the
"force_align_arg_pointer" on SOILs dllexported API functions.

-Fabian

On Mon, Feb 16, 2015 at 11:21 PM, Sean Barrett notifications@github.com
wrote:

We can't be 100% sure without seeing the assembly to know for sure which
instruction is failing what's going on or how to work around it. Since we
can't force people to set compiler flags, unless there's a pragma to enable
it in code, unless we find a workaround we'll just have to disable it in
mingw entirely.

But it seems pretty compiler-broken if that stack-alignment problem only
happens with mingw and not with gcc (e.g. compiling gcc -msse2 32-bit on
Linux).

Also, you said you also tried "-msse -msse2", but does that mean this
happened without setting those flags? In that case I'm not sure why the
SSE2 #define is happening.


Reply to this email directly or view it on GitHub
#81 (comment).

@nothings
Copy link
Owner

It's a compiler interop problem that will ALWAYS happen with mingw & vc++ (or without compiler interop, just with callbacks?) when you put stb_image into a mingw-compiled library, though.

So I don't know if that's "a very specific use case". Being able to drop stb_image.h into code that gets compiled as a library seems a reasonable use case to me. That library always breaking if compiled with mingw seems worth fixing. (I agree it's still mingw's bug, though.)

@rygorous
Copy link
Collaborator

Don't know, honestly. I just have the nagging feeling that even if you add
the force_align attrs everywhere, functions will get inlined (because say
stbi_image_load only has one call site) and stuff breaks again.

You already need to change the compiler flags to get SSE2 on MinGW in the
first place (namely, add "-msse2"). To me it seems a lot saner to either
add "-mstackrealign" to that list, or just disable SSE2 on MinGW.

-Fabian

On Tue, Feb 17, 2015 at 12:01 AM, Sean Barrett notifications@github.com
wrote:

It's a compiler interop problem that will ALWAYS happen with mingw & vc++
(or without compiler interop, just with callbacks?) when you put stb_image
into a mingw-compiled library, though.

So I don't know if that's "a very specific use case". Being able to drop
stb_image.h into code that gets compiled as a library seems a reasonable
use case to me. That library always breaking if compiled with mingw seems
worth fixing. (I agree it's still mingw's bug, though.)


Reply to this email directly or view it on GitHub
#81 (comment).

@nothings
Copy link
Owner

Yeah, unfortunately header file libs inherit their compiler flags from the source file, so it doesn't make sense to tell people they need to add another flag so it won't crash. So disabling may still be the best. (And add a flag to force it on if you know what you're doing.)

@rygorous
Copy link
Collaborator

That pull request has an attempted fix. That said, I don't have a working MinGW setup either so it's untested.

SpartanJ, can you try this and check whether it fixes the problem?

Note that this patch changes stb_image to default to no SSE2 on MinGW 32-bit. You can re-enable SSE2 on MinGW using a #define (see the patch for details), but this is only safe if you also add "-mstackrealign" to your build. Does this work for you?

@SpartanJ
Copy link
Author

rygorous i'll test this after work, but i can see one problem in the patch:

#if defined(__MINGW32__) && defined(STBI_X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)

This it will also disable SIMD for x64 builds, since STBI_X86_TARGET includes x86_64, that's not necessary as far as we know.

I'll try the Qt solution, using __attribute__((force_align_arg_pointer)) as Sean mentioned. I'm more inclined to that solution ( if indeed fixes the problem ), at least for my project, i can force the code to be stack aligned with the -mstackrealign build flag, but the main problem is that people don't usually use the premake file provided by the project to generate the library, so they usually compile the project without any fix that i could provide. I know it's not my problem, since people usually don't read instructions, but i prefer to have something that works out of the box and gives you the best performance possible. If that doesn't work i'll disable it too.

Also, you said you also tried "-msse -msse2", but does that mean this happened without setting those flags? In that case I'm not sure why the SSE2 #define is happening.

I've tried both, and it's the same. The SSE2 #define is happening because it's predefined by most of the compilers i tried, look, this is GCC and Clang on Linux:

prognosys:~ # gcc -dM -E - < /dev/null | grep SS
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1

prognosys:~ # gcc -dM -E - < /dev/null | grep SSE
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1

The same flags are predefined in MingW on Windows, but they are not defined in MingW on Linux ( the cross compiler, i686-w64-mingw32-gcc ).

Thanks for your help!
Regards

@rygorous
Copy link
Collaborator

x64 always has SSE2 on, and your default CC on Linux is likely to be
64-bit, but 32-bit is different.

I thought MINGW32 was 32-bit specific, since there's also MINGW64.
If it's not, what are the rules?

-Fabian
On Feb 17, 2015 8:46 AM, "SpartanJ" notifications@github.com wrote:

rygorous i'll test this after work, but i can see one problem in the patch:

#if defined(MINGW32) && defined(STBI_X86_TARGET) &&
!defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)

This it will also disable SIMD for x64 builds, since STBI_X86_TARGET
includes x86_64, that's not necessary as far as we know.

I'll try the Qt solution, using attribute((force_align_arg_pointer))
as Sean mentioned. I'm more inclined to that solution ( if indeed fixes the
problem ), at least for my project, i can force the code to be stack
aligned with the -mstackrealign build flag, but the main problem is that
people don't usually use the premake file provided by the project to
generate the library, so they usually compile the project without any fix
that i could provide. I know it's not my problem, since people usually
don't read instructions, but i prefer to have something that works out of
the box and gives you the best performance possible. If that doesn't work
i'll disable it too.

Also, you said you also tried "-msse -msse2", but does that mean this
happened without setting those flags? In that case I'm not sure why the
SSE2 #define is happening.

I've tried both, and it's the same. The SSE2 #define is happening because
it's predefined by most of the compilers i tried, look, this is GCC and
Clang on Linux:

prognosys:~ # gcc -dM -E - < /dev/null | grep SS
#define SSE2_MATH 1
#define SSE_MATH 1
#define SSE2 1
#define SSE 1

prognosys:~ # gcc -dM -E - < /dev/null | grep SSE
#define SSE2_MATH 1
#define SSE_MATH 1
#define SSE2 1
#define SSE 1

The same flags are predefined in MingW on Windows, but they are not
defined in MingW on Linux ( the cross compiler, i686-w64-mingw32-gcc ).

Thanks for your help!
Regards


Reply to this email directly or view it on GitHub
#81 (comment).

@SpartanJ
Copy link
Author

x64 always has SSE2 on, and your default CC on Linux is likely to be 64-bit, but 32-bit is different.

You're right, with -m32 it does not output __SSE2__ in Linux, but, mingw with -m32 is still there, that seems odd.

I thought MINGW32 was 32-bit specific, since there's also MINGW64. If it's not, what are the rules?

It seems that __MINGW32__ is always present in 32 and 64 bits, and __MINGW64__ is defined in some places but not others ( i'm not sure why ), gcc.exe doesn't have it ( but does have __WIN64__ declared for example ), x64_64-w64-mingw32-gcc.exe does have it.

@SpartanJ
Copy link
Author

I tried forcing the align with __attribute__((force_align_arg_pointer)) and worked just fine. I'll keep that approach. See the commit log here: https://bitbucket.org/SpartanJ/soil2/commits/2c1e6f0ec72d
( see that i used !defined(__x86_64__), that's because __i386 is also declared in 64bit builds, it seems to me that's an ugly bug in mingw, it can't be intentional ).

@nothings
Copy link
Owner

This is fixed in a different way in 2.03; search for mingw in the file.

@SpartanJ
Copy link
Author

This commit also disables SSE2 for 64 bits builds and it's not necessary. We should check only for 32bits.

@nothings nothings reopened this Apr 15, 2015
@nothings
Copy link
Owner

Posted 2.04 which attempts to fix this. Let me know.

@nothings
Copy link
Owner

Closing this to make my issues list more readable. Reopen if there's a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants