Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid assembly generated at -O3 #15

Closed
fxcoudert opened this issue Aug 31, 2020 · 5 comments
Closed

Invalid assembly generated at -O3 #15

fxcoudert opened this issue Aug 31, 2020 · 5 comments

Comments

@fxcoudert
Copy link
Contributor

Invalid assembly generated at -O3:

$ cat sound.c 
void toto (char audio[8192]) {
  int i;

  for (i = 0; i < 4095; i++)
    audio[i] = i / 8;
}
$ ./bin/gcc sound.c -c -O2
$ ./bin/gcc sound.c -c -O3
/var/folders/5b/7mrgq6j55b51p7ztj1r3gv400000gq/T//ccPBI4WH.s:36:14: error: immediate must be an integer in range [0, 255].
        movi    v0.8b, 0xfffffffffffffffe
                       ^

Reduced from gcc.c-torture/compile/sound.c
Diff of the generated assembly:

--- O2.s	2020-08-31 13:58:03.000000000 +0200
+++ O3.s	2020-08-31 13:58:12.000000000 +0200
@@ -6,16 +6,48 @@
 _toto:
 LFB0:
 	.cfi_startproc
-	mov	x1, 0
+	adrp	x1, lC0@GOTPAGE
+	add	x1, x1, lC0@PAGEOFF;momd
+	movi	v17.4s, 0x10
+	add	x2, x0, 4080
+	movi	v16.4s, 0x4
+	movi	v7.4s, 0x8
+	movi	v6.4s, 0xc
+	ldr	q2, [x1]
+	mov	x1, x0
 	.p2align 3,,7
 L2:
-	asr	w2, w1, 3
-	strb	w2, [x0, x1]
-	add	x1, x1, 1
-	cmp	x1, 4095
+	mov	v0.16b, v2.16b
+	add	v2.4s, v2.4s, v17.4s
+	add	v5.4s, v0.4s, v16.4s
+	add	v3.4s, v0.4s, v6.4s
+	add	v4.4s, v0.4s, v7.4s
+	xtn	v1.4h, v0.4s
+	xtn2	v1.8h, v5.4s
+	xtn	v0.4h, v4.4s
+	xtn2	v0.8h, v3.4s
+	ushr	v1.8h, v1.8h, 3
+	ushr	v0.8h, v0.8h, 3
+	xtn	v3.8b, v1.8h
+	xtn2	v3.16b, v0.8h
+	str	q3, [x1], 16
+	cmp	x2, x1
 	bne	L2
+	movi	v0.8b, 0xfffffffffffffffe
+	mov	w1, -1
+	str	w1, [x0, 4088]
+	strh	w1, [x0, 4092]
+	strb	w1, [x0, 4094]
+	str	d0, [x0, 4080]
 	ret
 	.cfi_endproc
 LFE0:
+	.const
+	.align	4
+lC0:
+	.word	0
+	.word	1
+	.word	2
+	.word	3
 	.ident	"GCC: (GNU) 11.0.0 20200829 (experimental)"
 	.subsections_via_symbols
@iains
Copy link
Owner

iains commented Aug 31, 2020

Invalid assembly generated at -O3:

$ cat sound.c 
void toto (char audio[8192]) {
  int i;

  for (i = 0; i < 4095; i++)
    audio[i] = i / 8;
}
$ ./bin/gcc sound.c -c -O2
$ ./bin/gcc sound.c -c -O3
/var/folders/5b/7mrgq6j55b51p7ztj1r3gv400000gq/T//ccPBI4WH.s:36:14: error: immediate must be an integer in range [0, 255].
        movi    v0.8b, 0xfffffffffffffffe
                       ^

I think there are a few instances of this in the test suite fails. I wonder if it's because Darwin's char is signed (where many platforms [most?] are unsigned) - so perhaps a masking operation is missing?, dunno, just speculating.

@fxcoudert
Copy link
Contributor Author

Not sure I get what you mean, but changing the char to unsigned char (or signed char) in the source does not fix the issue.

@iains
Copy link
Owner

iains commented Aug 31, 2020

i / 8 is signed

if looks like the output has done something like / equivalent to...

printf ... %d ... (int) ((char) i / 8)

what happens if you do
audio[i] = (unsigned char) i / 8;

?

@fxcoudert
Copy link
Contributor Author

If i is made unsigned char, or cast to that type as you showed, then the error goes away.

@iains
Copy link
Owner

iains commented Sep 2, 2020

should be fixed by
efda199

@iains iains closed this as completed Sep 2, 2020
iains pushed a commit that referenced this issue Jun 18, 2021
The fixed error is:

==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900
    #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161
    #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517
    #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686
    #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567
    #5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656
    #6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657
    #7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667
    #8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773
    #9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191
    #10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001
    #11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154
    #12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289
    #13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269
    #14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537
    #15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482
    #16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210
    #17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349
    #18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39
    #19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332
    #20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd)

0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920)
allocated by thread T0 here:
    #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102
    #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156

gcc/ChangeLog:

	* gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].
iains pushed a commit that referenced this issue May 26, 2023
This patch adds support for xstormy16's swap nibbles instruction (swpn).
For the test case:

short foo(short x) {
  return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
}

GCC with -O2 currently generates the nine instruction sequence:
foo:    mov r7,r2
        asr r2,#4
        and r2,#15
        mov.w r6,#-256
        and r6,r7
        or r2,r6
        shl r7,#4
        and r7,#255
        or r2,r7
        ret

with this patch, we now generate:
foo:	swpn r2
	ret

To achieve this using combine's four instruction "combinations" requires
a little wizardry.  Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload.  This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode.  Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics.  The naming of the new code iterators is taken from i386.md.

2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/stormy16/stormy16.md (any_lshift): New code iterator.
	(any_or_plus): Likewise.
	(any_rotate): Likewise.
	(*<any_lshift>_and_internal): New define_insn_and_split to
	recognize a logical shift followed by an AND, and split it
	again after reload.
	(*swpn): New define_insn matching xstormy16's swpn.
	(*swpn_zext): New define_insn recognizing swpn followed by
	zero_extendqihi2, i.e. with the high byte set to zero.
	(*swpn_sext): Likewise, for swpn followed by cbw.
	(*swpn_sext_2): Likewise, for an alternate RTL form.
	(*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
	sequence is split in the correct place to recognize the *swpn_zext
	followed by any_or_plus (ior, xor or plus) instruction.

gcc/testsuite/ChangeLog
	* gcc.target/xstormy16/swpn-1.c: New QImode test case.
	* gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
	* gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
	* gcc.target/xstormy16/swpn-4.c: New HImode test case.
iains pushed a commit that referenced this issue May 26, 2023
This patch contains some minor tweak to xstormy16's machine description
most significantly providing a pattern for HImode rotate left by a single
bit that requires only two instructions.

unsigned short foo(unsigned short x)
{
  return (x << 1) | (x >> 15);
}

currently with -O2 generates:
foo:    mov r7,r2
        shr r7,#15
        shl r2,#1
        or r2,r7
        ret

with this patch, GCC now generates:
foo:	shl r2,#1 | adc r2,#0
        ret

Additionally neghi2 is converted to a define_insn (so that the RTL
optimizers see the negation semantics), and HImode rotations by
8-bits can now be recognized and implemented using swpb.

2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/stormy16/stormy16.md (neghi2): Convert from a define_expand
	to a define_insn.
	(*rotatehi_1): New define_insn for efficient 2 insn sequence.
	(*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb.

gcc/testsuite/ChangeLog
	* gcc.target/xstormy16/neghi2.c: New test case.
	* gcc.target/xstormy16/rotatehi-1.c: Likewise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants