Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling a function with structs leads to wrong code at -O2 #10

Closed
fxcoudert opened this issue Aug 31, 2020 · 6 comments
Closed

Calling a function with structs leads to wrong code at -O2 #10

fxcoudert opened this issue Aug 31, 2020 · 6 comments

Comments

@fxcoudert
Copy link
Contributor

Taken from gcc.c-torture/execute/20000603-1.c, fails at -O2 and higher:

$ cat 20000603-1.c 
#include <stdlib.h>
#include <stdio.h>

struct s1 { double d; };
struct s2 { double d; };
union u { struct s1 x; struct s2 y; };

double f(struct s1 *a, struct s2 *b)
{
  a->d = 1.0;
  return b->d + 1.0;
}

int main()
{
  union u a;
  a.x.d = 0.0;
  printf("%g\n", f (&a.x, &a.y));
  if (f (&a.x, &a.y) != 2.0)
    abort ();
  return 0;
}
$ ./bin/gcc 20000603-1.c -O1 && ./a.out
2
$ ./bin/gcc 20000603-1.c -O2 && ./a.out
1
@fxcoudert
Copy link
Contributor Author

diff between the generated assembly at -O1 and -O2 (commenting out the printf statement):

--- O1.s	2020-08-31 11:38:20.000000000 +0200
+++ O2.s	2020-08-31 11:38:44.000000000 +0200
@@ -1,19 +1,21 @@
 	.arch armv8-a
 	.text
 	.align	2
+	.p2align 4,,11
 	.globl _f
 _f:
 LFB9:
 	.cfi_startproc
+	ldr	d1, [x1]
 	fmov	d0, 1.0e+0
 	str	d0, [x0]
-	ldr	d1, [x1]
 	fadd	d0, d1, d0
 	ret
 	.cfi_endproc
 LFE9:
 	.section __TEXT,__text_startup,regular,pure_instructions
 	.align	2
+	.p2align 4,,11
 	.globl _main
 _main:
 LFB10:
@@ -23,13 +25,13 @@ LFB10:
 	.cfi_offset 29, -32
 	.cfi_offset 30, -24
 	mov	x29, sp
+	add	x1, sp, 24
+	mov	x0, x1
 	str	xzr, [sp, 24]
-	add	x0, sp, 24
-	mov	x1, x0
 	bl	_f
 	fmov	d1, 2.0e+0
 	fcmp	d0, d1
-	bne	L5
+	bne	L6
 	mov	w0, 0
 	ldp	x29, x30, [sp], 32
 	.cfi_remember_state
@@ -37,7 +39,7 @@ LFB10:
 	.cfi_restore 29
 	.cfi_def_cfa_offset 0
 	ret
-L5:
+L6:
 	.cfi_restore_state
 	bl	_abort
 	.cfi_endproc

@iains
Copy link
Owner

iains commented Aug 31, 2020

diff between the generated assembly at -O1 and -O2 (commenting out the printf statement):

--- O1.s	2020-08-31 11:38:20.000000000 +0200
+++ O2.s	2020-08-31 11:38:44.000000000 +0200
@@ -1,19 +1,21 @@
 	.arch armv8-a
 	.text
 	.align	2
+	.p2align 4,,11
 	.globl _f
 _f:
 LFB9:
 	.cfi_startproc
+	ldr	d1, [x1]
 	fmov	d0, 1.0e+0
 	str	d0, [x0]
-	ldr	d1, [x1]

this is a NOP, it seems

fadd d0, d1, d0
ret
.cfi_endproc
LFE9:
.section __TEXT,__text_startup,regular,pure_instructions
.align 2

  • .p2align 4,,11
    not important, to correctness, only to performance
    .globl _main
    _main:
    LFB10:
    @@ -23,13 +25,13 @@ LFB10:
    .cfi_offset 29, -32
    .cfi_offset 30, -24
    mov x29, sp
  • add x1, sp, 24
  • mov x0, x1
    x1 = x0 = sp + 24
    str xzr, [sp, 24]
  • add x0, sp, 24
  • mov x1, x0
    x1 = x0 = sp +24
    bl _f
    fmov d1, 2.0e+0
    fcmp d0, d1
  • bne L5
  • bne L6
    diference only in the label name.
    mov w0, 0
    ldp x29, x30, [sp], 32
    .cfi_remember_state
    @@ -37,7 +39,7 @@ LFB10:
    .cfi_restore 29
    .cfi_def_cfa_offset 0
    ret
    -L5:
    +L6:
    .cfi_restore_state
    bl _abort
    .cfi_endproc

The two codes seem to be equivalent - so not sure what's happening - does it still abort with the printf commented?
(I would be less surprised to find that printf was messing things up - problems with variadic funcs are expected).

@iains
Copy link
Owner

iains commented Aug 31, 2020

diff between the generated assembly at -O1 and -O2 (commenting out the printf statement):

--- O1.s	2020-08-31 11:38:20.000000000 +0200
+++ O2.s	2020-08-31 11:38:44.000000000 +0200
@@ -1,19 +1,21 @@
 	.arch armv8-a
 	.text
 	.align	2
+	.p2align 4,,11
 	.globl _f
 _f:
 LFB9:
 	.cfi_startproc
+	ldr	d1, [x1]
 	fmov	d0, 1.0e+0
 	str	d0, [x0]
-	ldr	d1, [x1]

this is a NOP, it seems

NO it isn't - X0 aliases X1, I wonder what this does on aarch64-linux?

@fxcoudert
Copy link
Contributor Author

I can make it fail without printf, in a test case guaranteed with 0% variadic functions:

$ cat a.c
struct s1 { double d; };
struct s2 { double d; };
union u { struct s1 x; struct s2 y; };

double f(struct s1 *a, struct s2 *b)
{
  a->d = 1.0;
  return b->d + 1.0;
}

int main()
{
  union u a;
  a.x.d = 0.0;
  return (f (&a.x, &a.y) != 2.0) ? 1 : 0;
}
$ ./bin/gcc a.c -O1 && ./a.out ; echo $?
0
$ ./bin/gcc a.c -O2 && ./a.out ; echo $?
1

@iains
Copy link
Owner

iains commented Aug 31, 2020

see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96875
(this is not an issue specific to the darwin variant)

@iains
Copy link
Owner

iains commented Sep 1, 2020

there doesn't seem to be any reason to keep this open here - if we need to XFAIL the test case, to avoid the noise, then that can be done.

@iains iains closed this as completed Sep 1, 2020
iains pushed a commit that referenced this issue Jun 18, 2021
The fixed error is:

==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900
    #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161
    #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517
    #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686
    #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567
    #5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656
    #6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657
    #7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667
    #8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773
    #9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191
    #10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001
    #11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154
    #12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289
    #13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269
    #14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537
    #15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482
    #16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210
    #17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349
    #18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39
    #19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332
    #20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd)

0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920)
allocated by thread T0 here:
    #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102
    #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156

gcc/ChangeLog:

	* gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].
iains pushed a commit that referenced this issue Jan 2, 2022
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 #2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 #3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 #4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 #5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 #6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 #7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 #8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 #9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.
iains pushed a commit that referenced this issue May 26, 2023
This patch fixes PR rtl-optimization/109476, which is a code quality
regression affecting AVR.  The cause is that the lower-subreg pass is
sometimes overly aggressive, lowering the LSHIFTRT below:

(insn 7 4 8 2 (set (reg:HI 51)
        (lshiftrt:HI (reg/v:HI 49 [ b ])
            (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
     (nil))

into a pair of QImode SUBREG assignments:

(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
        (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
        (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))

but this idiom, SETs of SUBREGs, interferes with combine's ability
to associate/fuse instructions.  The solution, on targets that
have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
is to split/lower LSHIFTRT to a ZERO_EXTEND.

To answer Richard's question in comment #10 of the bugzilla PR,
the function resolve_shift_zext is called with one of four RTX
codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
LSHIFTRT can the setting of low_part and high_part SUBREGs be
replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
extension, so don't set the high_part to zero; if we're splitting
a ZERO_EXTEND then it doesn't make sense to replace it with a
ZERO_EXTEND, and for ASHIFT we've played games to swap the
high_part and low_part SUBREGs, so that we assign the low_part
to zero (for double word shifts by greater than word size bits).

2023-04-28  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR rtl-optimization/109476
	* lower-subreg.cc: Include explow.h for force_reg.
	(find_decomposable_shift_zext): Pass an additional SPEED_P argument.
	If decomposing a suitable LSHIFTRT and we're not splitting
	ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
	instead of setting a high part SUBREG to zero, which helps combine.
	(decompose_multiword_subregs): Update call to resolve_shift_zext.

gcc/testsuite/ChangeLog
	PR rtl-optimization/109476
	* gcc.target/avr/mmcu/pr109476.c: New test case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants