-
-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
movaps unaligned address crash on i386 #190
Comments
I think the problem is that you're allocating a structure that contains a vector, without sufficient alignment. If you'd allocate the struct e.g. on the stack instead of with malloc, it should have enough alignment. (If you'd check with some To allocate memory for such vector types, you probably need to use something like |
I think you're right. PR libvips/libvips#2144 fixes this. I was wondering, do you think this could be reproduced on Linux i386 as well? I tried this, which generates similar x86 instructions with no crashes: DetailsFROM i386/debian:bullseye
RUN apt-get update -qq && \
apt-get install -qqy --no-install-recommends clang && \
apt-get clean -y && \
rm -rf /var/lib/apt/lists/* docker build -t clang-i386 .
docker run -v $(pwd):/build -w /build -it clang-i386 \
clang -g -msse2 -mstackrealign test.c -o test-i686 diff --git a/coff-i386 b/elf32-i386
index 1111111..2222222 100644
--- a/coff-i386
+++ b/elf32-i386
@@ -1,22 +1,21 @@
-test-i686.exe: file format coff-i386
+test-i686: file format elf32-i386
Disassembly
-004013f0
+08049180
;
; 13
55 pushl %ebp
89 e5 movl %esp, %ebp
83 e4 f0 andl $-16, %esp
83 ec 40 subl $64, %esp
- e8 32 01 00 00 calll 0x401530 <___main>
c7 44 24 34 00 00 00 00 movl $0, 52(%esp)
; 15
c7 04 24 20 02 00 00 movl $544, (%esp)
- e8 90 0f 00 00 calll 0x4023a2 <_malloc>
- f2 0f 10 05 50 30 40 00 movsd 4206672, %xmm0
+ e8 a3 fe ff ff calll 0x8049040 <malloc@plt>
+ f2 0f 10 05 08 a0 04 08 movsd 134520840, %xmm0
; 14
89 44 24 30 movl %eax, 48(%esp)
; 17
@@ -34,7 +33,7 @@ Disassembly
; 22
c7 44 24 2c 00 00 00 00 movl $0, 44(%esp)
83 7c 24 2c 03 cmpl $3, 44(%esp)
- 0f 8f 44 00 00 00 jg 0x401498 <_main+0xa8>
+ 0f 8f 44 00 00 00 jg 0x8049223 <main+0xa3>
; 24
8b 44 24 30 movl 48(%esp), %eax
8b 4c 24 2c movl 44(%esp), %ecx
@@ -52,19 +51,19 @@ Disassembly
8b 44 24 2c movl 44(%esp), %eax
83 c0 01 addl $1, %eax
89 44 24 2c movl %eax, 44(%esp)
- e9 b1 ff ff ff jmp 0x401449 <_main+0x59>
+ e9 b1 ff ff ff jmp 0x80491d4 <main+0x54>
; 26
8b 44 24 30 movl 48(%esp), %eax
89 04 24 movl %eax, (%esp)
- e8 f8 0e 00 00 calll 0x40239c <_free>
+ e8 01 fe ff ff calll 0x8049030 <free@plt>
31 c0 xorl %eax, %eax
; 28
89 ec movl %ebp, %esp
5d popl %ebp
c3 retl
- cc int3
- cc int3
- cc int3
- cc int3
- cc int3
- cc int3
+ 66 90 nop
+ 66 90 nop
+ 66 90 nop
+ 66 90 nop
+ 66 90 nop
+ 90 nop (that And I had a go with https://github.com/johnsonjh/duma to force |
I also could not reproduce this with https://github.com/AFLplusplus/AFLplusplus/tree/stable/utils/libdislocator on Linux x64. I added some diff --git a/test.c b/test.c
index 1111111..2222222 100644
--- a/test.c
+++ b/test.c
@@ -1,4 +1,6 @@
#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
#define MAX_BANDS (64)
@@ -10,10 +12,15 @@ typedef struct _VipsCompositeBase {
v4f max_band_vec;
} VipsCompositeBase;
+#define is_aligned(POINTER, BYTE_COUNT) \
+ (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)
+
int main() {
VipsCompositeBase *composite =
(VipsCompositeBase *) malloc(sizeof(VipsCompositeBase));
+ printf("is aligned: %d\n", is_aligned(composite, 16));
+
composite->max_band[0] = 255;
composite->max_band[1] = 255;
composite->max_band[2] = 255; After compiling and running this on my Windows x64 PC (I don't have any i686 hardware to test this on), I noticed this: C:\Users\KleisAuke\Desktop>test-i686.exe
is aligned: 1
C:\Users\KleisAuke\Desktop>echo Exit Code is %errorlevel%
Exit Code is 0
C:\Users\KleisAuke\Desktop>test-i686.exe
is aligned: 0
C:\Users\KleisAuke\Desktop>echo Exit Code is %errorlevel%
Exit Code is -1073741819 So the first run appears to work, but any subsequent runs fails. Could this be an stack alignment bug? fwiw, I tried to remove the |
A possible workaround could be to have LLVM emit a Detailsdiff --git a/test.c b/test.c
index 1111111..2222222 100644
--- a/test.c
+++ b/test.c
@@ -1,19 +1,22 @@
#include <stdlib.h>
+#include <stdio.h>
#define MAX_BANDS (64)
-typedef float v4f __attribute__((vector_size(4 * sizeof(float))));
+typedef float v4f __attribute__((vector_size(4 * sizeof(float)), aligned(8)));
typedef struct _VipsCompositeBase {
double max_band[MAX_BANDS + 1];
v4f max_band_vec;
-} VipsCompositeBase;
+} __attribute__((aligned(8))) VipsCompositeBase;
int main() {
VipsCompositeBase *composite =
(VipsCompositeBase *) malloc(sizeof(VipsCompositeBase));
+ printf("Alignment of VipsCompositeBase is %zu\n", __alignof__(VipsCompositeBase));
+
composite->max_band[0] = 255;
composite->max_band[1] = 255;
composite->max_band[2] = 255; diff --git a/old.asm b/new.asm
index 1111111..2222222 100644
--- a/old.asm
+++ b/new.asm
@@ -14,7 +14,7 @@ Disassembly of section .text:
4013f9: e8 32 01 00 00 calll 0x401530 <___main>
4013fe: c7 44 24 34 00 00 00 00 movl $0, 52(%esp)
; /build/test.c:15
- 401406: c7 04 24 20 02 00 00 movl $544, (%esp)
+ 401406: c7 04 24 18 02 00 00 movl $536, (%esp)
40140d: e8 90 0f 00 00 calll 0x4023a2 <_malloc>
401412: f2 0f 10 05 50 30 40 00 movsd 4206672, %xmm0
; /build/test.c:14
@@ -43,11 +43,11 @@ Disassembly of section .text:
401461: f2 0f 10 04 c8 movsd (%eax,%ecx,8), %xmm0
401466: f2 0f 5a c0 cvtsd2ss %xmm0, %xmm0
; /build/test.c:23
- 40146a: 0f 28 88 10 02 00 00 movaps 528(%eax), %xmm1
+ 40146a: 0f 10 88 08 02 00 00 movups 520(%eax), %xmm1
401471: 0f 29 4c 24 10 movaps %xmm1, 16(%esp)
401476: f3 0f 11 44 94 10 movss %xmm0, 16(%esp,%edx,4)
40147c: 0f 28 44 24 10 movaps 16(%esp), %xmm0
- 401481: 0f 29 80 10 02 00 00 movaps %xmm0, 528(%eax)
+ 401481: 0f 11 80 08 02 00 00 movups %xmm0, 520(%eax)
; /build/test.c:22
401488: 8b 44 24 2c movl 44(%esp), %eax
40148c: 83 c0 01 addl $1, %eax |
Sure, adding an attribute to claim that the struct is misaligned is one way of fixing it. FWIW, most projects that work with SIMD do make sure that all allocations are aligned; most modern OSes provide some sort of aligned allocation function. Unfortunately there's not one that is available on all common platforms these days; have a look at what ffmpeg does: http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavutil/mem.c;h=cfb6d8ab8ffa6c6d51f325d4d8bbe7dc0922871f;hb=1af4885014f7d80abbd28613a2939fbcada94ecd#l84 What you did in your PR also works (and is platform independent), which approach is to be preferred is up to each project I guess. Regarding the testcase, a more telling testcase would be this: #include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <stdalign.h>
#define MAX_BANDS (64)
typedef float v4f __attribute__((vector_size(4 * sizeof(float))));
typedef struct _VipsCompositeBase {
double max_band[MAX_BANDS + 1];
v4f max_band_vec;
} VipsCompositeBase;
#define is_aligned(POINTER, BYTE_COUNT) \
(((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)
int main() {
printf("alignof %d\n", (int)alignof(VipsCompositeBase));
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++)
malloc(1 + (rand() % 8000));
VipsCompositeBase *composite =
(VipsCompositeBase *) malloc(sizeof(VipsCompositeBase));
printf("aligned: %d\n", is_aligned(composite, alignof(VipsCompositeBase)));
}
return 0;
} Whether it returns an aligned address or not depends on the layout of the heap, earlier allocations, allocator algorithm, etc. Malloc is only guaranteed to give memory aligned enough for the natural C types, with the largest ones being With a malloc that guarantees 8 byte alignment, the returned addresses will be 16 byte aligned 50% of the time, even though that wasn't requested or guaranteed. On glibc on Ubuntu 20.04, it seems that I always get a 16-byte aligned address, even in i386 mode - so plain malloc would seem to work, even though it's not guaranteed to. But if you'd double the vector size and add |
Thanks for the detailed explanation! Indeed, using an aligned allocation function makes more sense, done with commit libvips/libvips@5ef9c84 in PR libvips/libvips#2144 (which has just landed). It's quite puzzling that in i386 mode on Linux you always get a 16-byte aligned address, which makes this kind of bugs hard to detect. 😅 |
When using GNU vector extensions, it seems that Clang generates a
movaps
instruction with an unaligned parameter, which causes a crash on Windows i386.I was able to create a reduced test case to reproduce this:
(It could possibly be related to the code, I tried several patches to fix this, with no luck)
Here are the objdump's generated with
llvm-objdump -l --disassemble-symbols=_main
for LLVM and GCC:Details
The text was updated successfully, but these errors were encountered: