Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembler/relocation that produces a malformed object file. #3

Open
iains opened this issue Aug 30, 2020 · 3 comments
Open

Assembler/relocation that produces a malformed object file. #3

iains opened this issue Aug 30, 2020 · 3 comments

Comments

@iains
Copy link
Owner

iains commented Aug 30, 2020

adrp  x0, foo@PAGE-1
add  x0, x0, foo@PAGEOFF-1

is a reasonably common thing for GCC to emit (corresponding to 'one position before' something).
The assembler accepts the syntax but produces a malformed object file.

Right now, it's not clear if negative addends are accepted.
The documentation neither says that they are signed or unsigned - and historically Darwin has accepted relocations of the form A - B ± C so long as A is a symbol defined in the current TU.

Worked around by 0f0f43d for now.

@iains
Copy link
Owner Author

iains commented Aug 31, 2020

this is confirmed as an LLVM back-end bug, there's a radar (68073350);
signed offsets should work.

I guess, we might take time out to fix the LLVM backend, but that takes time to percolate to the Xcode assembler.
Another solution is to encode the offset such that it can be added.

so we could do something like

 foo@PAGE+(offset & 0xffffffU) 

which will truncate the negative number so that it fits into the 24b added field in the reloc.

@iains
Copy link
Owner Author

iains commented Aug 31, 2020

we can even check for the value to be in range, and switch to a different form if it's not (essentially I already have code to do that - it's just invoked unconditionally for negative offsets). Better than an assembler error (when that is fixed).

@iains
Copy link
Owner Author

iains commented Sep 2, 2020

unfortunately, fixing this in the manner described breaks ld64 - when using it to strip binaries:

ld -S -r -no_uuid <object.o> -o <output.o>

I'll have to file an issue/radar

iains pushed a commit that referenced this issue Sep 7, 2020
Given the following C function:

double *f(double *p, unsigned x)
{
    return p + x;
}

prior to this patch, GCC at -O2 would generate:

f:
        add     x0, x0, x1, uxtw 3
        ret

but this add instruction uses architecturally-invalid syntax: the width
of the third operand conflicts with the width of the extension
specifier. The third operand is only permitted to be an x register when
the extension specifier is (u|s)xtx.

This instruction, and analogous insns for adds, sub, subs, and cmp, are
rejected by clang, but accepted by binutils. Assembling and
disassembling such an insn with binutils gives the architecturally-valid
version in the disassembly:

   0:   8b214c00        add     x0, x0, w1, uxtw #3

This patch fixes several patterns in the AArch64 backend to use the
standard syntax as specified in the Arm ARM such that GCC's output can
be assembled by assemblers other than GAS.

---

gcc/ChangeLog:

	* config/aarch64/aarch64.md
	(*adds_<optab><ALLX:mode>_<GPI:mode>): Ensure extended operand
	agrees with width of extension specifier.
	(*subs_<optab><ALLX:mode>_<GPI:mode>): Likewise.
	(*adds_<optab><ALLX:mode>_shift_<GPI:mode>): Likewise.
	(*subs_<optab><ALLX:mode>_shift_<GPI:mode>): Likewise.
	(*add_<optab><ALLX:mode>_<GPI:mode>): Likewise.
	(*add_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise.
	(*add_uxt<mode>_shift2): Likewise.
	(*sub_<optab><ALLX:mode>_<GPI:mode>): Likewise.
	(*sub_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise.
	(*sub_uxt<mode>_shift2): Likewise.
	(*cmp_swp_<optab><ALLX:mode>_reg<GPI:mode>): Likewise.
	(*cmp_swp_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
	* gcc.target/aarch64/cmp.c: Likewise.
	* gcc.target/aarch64/subs3.c: Likewise.
	* gcc.target/aarch64/subsp.c: Likewise.
	* gcc.target/aarch64/extend-syntax.c: New test.
iains pushed a commit that referenced this issue Nov 7, 2020
Enable thumb1_gen_const_int to generate RTL or asm depending on the
context, so that we avoid duplicating code to handle constants in
Thumb-1 with -mpure-code.

Use a template so that the algorithm is effectively shared, and
rely on two classes to handle the actual emission as RTL or asm.

The generated sequence is improved to handle right-shiftable and small
values with less instructions. We now generate:

128:
        movs    r0, r0, #128
264:
        movs    r3, #33
        lsls    r3, #3
510:
        movs    r3, #255
        lsls    r3, #1
512:
        movs    r3, #1
        lsls    r3, #9
764:
        movs    r3, #191
        lsls    r3, #2
65536:
        movs    r3, #1
        lsls    r3, #16
0x123456:
        movs    r3, #18 ;0x12
        lsls    r3, #8
        adds    r3, #52 ;0x34
        lsls    r3, #8
        adds    r3, #86 ;0x56
0x1123456:
        movs    r3, #137 ;0x89
        lsls    r3, #8
        adds    r3, #26 ;0x1a
        lsls    r3, #8
        adds    r3, #43 ;0x2b
        lsls    r3, #1
0x1000010:
        movs    r3, #16
        lsls    r3, #16
        adds    r3, #1
        lsls    r3, #4
0x1000011:
        movs    r3, #1
        lsls    r3, #24
        adds    r3, #17
-8192:
	movs	r3, #1
	lsls	r3, #13
	rsbs	r3, #0

The patch adds a testcase which does not fully exercise
thumb1_gen_const_int, as other existing patterns already catch small
constants.  These parts of thumb1_gen_const_int are used by
arm_thumb1_mi_thunk.

2020-11-02  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm.c (thumb1_const_rtl, thumb1_const_print): New
	classes.
	(thumb1_gen_const_int): Rename to ...
	(thumb1_gen_const_int_1): ... New helper function. Add capability
	to emit either RTL or asm, improve generated code.
	(thumb1_gen_const_int_rtl): New function.
	* config/arm/arm-protos.h (thumb1_gen_const_int): Rename to
	thumb1_gen_const_int_rtl.
	* config/arm/thumb1.md: Call thumb1_gen_const_int_rtl instead
	of thumb1_gen_const_int.

	gcc/testsuite/
	* gcc.target/arm/pure-code/no-literal-pool-m0.c: New.
iains pushed a commit that referenced this issue Feb 28, 2021
/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128'
    #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77
    #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190
    #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338
    #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370
    #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f)
    #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24)
    #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd)

Differential Revision: https://reviews.llvm.org/D97263

Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.
iains pushed a commit that referenced this issue Jun 18, 2021
The insn has extraneous operand #3 that is aliased in RTL to operand #0
with a constraint.  The operands specify a single-bit field in memory
that the machine instruction produced boths reads for the purpose of
determining whether to branch or not and either clears or sets according
to the machine operation selected with the `ccss' iterator.  The caller
of the insn is supposed to supply the same rtx for both operands.

This odd arrangement happens to work with old reload, but breaks with
libatomic if LRA is used instead:

.../libatomic/flag.c: In function 'atomic_flag_test_and_set':
.../libatomic/flag.c:36:1: error: unable to generate reloads for:
   36 | }
      | ^
(jump_insn 7 6 19 2 (unspec_volatile [
            (set (pc)
                (if_then_else (eq (zero_extract:SI (mem/v:QI (reg:SI 27) [-1  S1 A8])
                            (const_int 1 [0x1])
                            (const_int 0 [0]))
                        (const_int 1 [0x1]))
                    (label_ref:SI 25)
                    (pc)))
            (set (zero_extract:SI (mem/v:QI (reg:SI 28) [-1  S1 A8])
                    (const_int 1 [0x1])
                    (const_int 0 [0]))
                (const_int 1 [0x1]))
        ] 100) ".../libatomic/flag.c":35:10 669 {jbbssiqi}
     (nil)
 -> 25)
during RTL pass: reload
.../libatomic/flag.c:36:1: internal compiler error: in curr_insn_transform, at lra-constraints.c:4098
0x1112c587 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
	.../gcc/rtl-error.c:108
0x10ee6563 curr_insn_transform
	.../gcc/lra-constraints.c:4098
0x10eeaf87 lra_constraints(bool)
	.../gcc/lra-constraints.c:5133
0x10ec97e3 lra(_IO_FILE*)
	.../gcc/lra.c:2336
0x10e4633f do_reload
	.../gcc/ira.c:5827
0x10e46b27 execute
	.../gcc/ira.c:6013
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Switch to using `match_dup' as expected then for a machine instruction
that in its encoding only has one actual operand in for the single-bit
field.

	gcc/
	* config/vax/builtins.md (jbb<ccss>i<mode>): Remove operand #3.
	(sync_lock_test_and_set<mode>): Adjust accordingly.
	(sync_lock_release<mode>): Likewise.
iains pushed a commit that referenced this issue Jun 18, 2021
gcc/ada/

	* atree.h (Slots_Ptr): Change pointed-to type to any_slot.
	* fe.h (Get_RT_Exception_Name): Change type of parameter.
	* namet.ads (Name_Entry): Mark non-boolean components as aliased,
	reorder the boolean components and add an explicit Spare component.
	* namet.adb (Name_Enter): Adjust aggregate accordingly.
	(Name_Find): Likewise.
	(Reinitialize): Likewise.
	* namet.h (struct Name_Entry): Adjust accordingly.
	(Names_Ptr): Use correct type.
	(Name_Chars_Ptr): Likewise.
	(Get_Name_String): Fix declaration and adjust to above changes.
	* types.ads (RT_Exception_Code): Add pragma Convention C.
	* types.h (Column_Number_Type): Fix original type.
	(slot): Rename union type to...
	(any_slot): ...this and adjust assertion accordingly.
	(RT_Exception_Code): New enumeration type.
	* uintp.ads (Uint_Entry): Mark components as aliased.
	* uintp.h (Uints_Ptr):  Use correct type.
	(Udigits_Ptr): Likewise.
	* gcc-interface/gigi.h (gigi): Adjust name and type of parameter.
	* gcc-interface/cuintp.c (UI_To_gnu): Adjust references to Uints_Ptr
	and Udigits_Ptr.
	* gcc-interface/trans.c (Slots_Ptr): Adjust pointed-to type.
	(gigi): Adjust type of parameter.
	(build_raise_check): Add cast in call to Get_RT_Exception_Name.
iains pushed a commit that referenced this issue Jun 18, 2021
The fixed error is:

==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900
    #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161
    #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517
    #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686
    #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567
    #5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656
    #6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657
    #7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667
    #8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773
    #9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191
    #10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001
    #11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154
    #12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289
    #13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269
    #14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537
    #15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482
    #16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210
    #17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349
    #18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39
    #19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332
    #20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd)

0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920)
allocated by thread T0 here:
    #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102
    #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156

gcc/ChangeLog:

	* gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].
iains pushed a commit that referenced this issue Nov 28, 2021
Fixes:

==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178
READ of size 4 at 0x00000666ca5c thread T0
    #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
    #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
    #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
    #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829
    #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427
    #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
    #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
    #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
    #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146

for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.

gcc/d/ChangeLog:

	* d-attribs.cc (parse_optimize_options): Check index before
	accessing cl_options.
iains pushed a commit that referenced this issue Jan 2, 2022
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 #2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 #3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 #4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 #5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 #6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 #7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 #8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 #9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.
iains pushed a commit that referenced this issue Jan 22, 2022
This is a "canonical types differ for identical types" ICE, which started
with r11-4682.  It's a bit tricky to explain.  Consider:

  template <typename T> struct S {
    S<T> bar() noexcept(T::value);  // #1
    S<T> foo() noexcept(T::value);  // #2
  };

  template <typename T> S<T> S<T>::foo() noexcept(T::value) {}  // #3

We ICE because #3 and #2 have the same type, but their canonical types
differ: TYPE_CANONICAL (#3) == #2 but TYPE_CANONICAL (#2) == #1.

The member functions #1 and #2 have the same type.  However, since their
noexcept-specifier is deferred, when parsing them, we create a variant for
both of them, because DEFERRED_PARSE cannot be compared.  In other words,
build_cp_fntype_variant's

  tree v = TYPE_MAIN_VARIANT (type);
  for (; v; v = TYPE_NEXT_VARIANT (v))
    if (cp_check_qualified_type (v, type, type_quals, rqual, raises, late))
      return v;

will *not* find an existing variant when creating a method_type for #2, so we
have to create a new one.

But then we perform delayed parsing and call fixup_deferred_exception_variants
for #1 and #2.  f_d_e_v will replace TYPE_RAISES_EXCEPTIONS with the newly
parsed noexcept-specifier.  It also sets TYPE_CANONICAL (#2) to #1.  Both
noexcepts turned out to be the same, so now we have two equivalent variants in
the list!  I.e.,

+-----------------+      +-----------------+      +-----------------+
|      main       |      |      #2         |      |      #1         |
| S S::<T379>(S*) |----->| S S::<T37c>(S*) |----->| S S::<T37a>(S*) |----->NULL
|    -            |      |  noex(T::value) |      |  noex(T::value) |
+-----------------+      +-----------------+      +-----------------+

Then we get to #3.  As for #1 and #2, grokdeclarator calls build_memfn_type,
which ends up calling build_cp_fntype_variant, which will use the loop
above to look for an existing variant.  The first one that matches
cp_check_qualified_type will be used, so we use #2 rather than #1, and the
TYPE_CANONICAL mismatch follows.  Hopefully that makes sense.

As for the fix, I didn't think I could rewrite the method_type #2 with #1
because the type may have escaped via decltype.  So my approach is to
elide #2 from the list, so when looking for a matching variant, we always
find #1 (#2 remains live though, which admittedly sounds sort of dodgy).

	PR c++/101715

gcc/cp/ChangeLog:

	* tree.cc (fixup_deferred_exception_variants): Remove duplicate
	variants after parsing the exception specifications.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/noexcept72.C: New test.
	* g++.dg/cpp0x/noexcept73.C: New test.
iains pushed a commit that referenced this issue Feb 26, 2022
…04617]

On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
 #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
 #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
 #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/104617
	* simple-object-elf.c (simple_object_elf_match): Fix up URL
	in comment.
	(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
	sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
	range (inclusive).
iains pushed a commit that referenced this issue May 26, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X##2;		\
    template <class U> struct X##3;		\
    template <class U> struct X##4;		\
    template <class U> struct X##5;		\
    template <class U> struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
iains pushed a commit that referenced this issue Aug 13, 2023
Hi, Richard and Richi.

Base on the suggestions from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html

This patch choose (1) approach that Richard provided, meaning:

RVV implements cond_* optabs as expanders.  RVV therefore supports
both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
are needed at the gimple level.

Such approach can make codes much cleaner and reasonable.

Consider this following case:
void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n)
{
  for (int i = 0; i < n; i++)
    if (cond[i])
      a[i] = b[i] + a[i];
}

Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
<source>:5:21: missed: couldn't vectorize loop
<source>:5:21: missed: not vectorized: control flow in loop.

ARM SVE:

...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
...
vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56);

For RVV, we want IR as follows:

...
_68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, vect__6.13_55, _68, 0);
...

Both len and mask of COND_LEN_ADD are real not dummy.

This patch has been fully tested in RISC-V port with supporting both COND_* and COND_LEN_*.

And also, Bootstrap and Regression on X86 passed.

OK for trunk?

gcc/ChangeLog:

	* internal-fn.cc (get_len_internal_fn): New function.
	(DEF_INTERNAL_COND_FN): Ditto.
	(DEF_INTERNAL_SIGNED_COND_FN): Ditto.
	* internal-fn.h (get_len_internal_fn): Ditto.
	* tree-vect-stmts.cc (vectorizable_call): Add CALL auto-vectorization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant