Assembler/relocation that produces a malformed object file. #3

iains · 2020-08-30T14:58:49Z

adrp  x0, foo@PAGE-1
add  x0, x0, foo@PAGEOFF-1

is a reasonably common thing for GCC to emit (corresponding to 'one position before' something).
The assembler accepts the syntax but produces a malformed object file.

Right now, it's not clear if negative addends are accepted.
The documentation neither says that they are signed or unsigned - and historically Darwin has accepted relocations of the form A - B ± C so long as A is a symbol defined in the current TU.

Worked around by 0f0f43d for now.

The text was updated successfully, but these errors were encountered:

iains · 2020-08-31T19:38:04Z

this is confirmed as an LLVM back-end bug, there's a radar (68073350);
signed offsets should work.

I guess, we might take time out to fix the LLVM backend, but that takes time to percolate to the Xcode assembler.
Another solution is to encode the offset such that it can be added.

so we could do something like

 foo@PAGE+(offset & 0xffffffU)

which will truncate the negative number so that it fits into the 24b added field in the reloc.

iains · 2020-08-31T19:57:41Z

we can even check for the value to be in range, and switch to a different form if it's not (essentially I already have code to do that - it's just invoked unconditionally for negative offsets). Better than an assembler error (when that is fixed).

iains · 2020-09-02T19:57:45Z

unfortunately, fixing this in the manner described breaks ld64 - when using it to strip binaries:

ld -S -r -no_uuid <object.o> -o <output.o>

I'll have to file an issue/radar

Given the following C function: double *f(double *p, unsigned x) { return p + x; } prior to this patch, GCC at -O2 would generate: f: add x0, x0, x1, uxtw 3 ret but this add instruction uses architecturally-invalid syntax: the width of the third operand conflicts with the width of the extension specifier. The third operand is only permitted to be an x register when the extension specifier is (u|s)xtx. This instruction, and analogous insns for adds, sub, subs, and cmp, are rejected by clang, but accepted by binutils. Assembling and disassembling such an insn with binutils gives the architecturally-valid version in the disassembly: 0: 8b214c00 add x0, x0, w1, uxtw #3 This patch fixes several patterns in the AArch64 backend to use the standard syntax as specified in the Arm ARM such that GCC's output can be assembled by assemblers other than GAS. --- gcc/ChangeLog: * config/aarch64/aarch64.md (*adds_<optab><ALLX:mode>_<GPI:mode>): Ensure extended operand agrees with width of extension specifier. (*subs_<optab><ALLX:mode>_<GPI:mode>): Likewise. (*adds_<optab><ALLX:mode>_shift_<GPI:mode>): Likewise. (*subs_<optab><ALLX:mode>_shift_<GPI:mode>): Likewise. (*add_<optab><ALLX:mode>_<GPI:mode>): Likewise. (*add_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise. (*add_uxt<mode>_shift2): Likewise. (*sub_<optab><ALLX:mode>_<GPI:mode>): Likewise. (*sub_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise. (*sub_uxt<mode>_shift2): Likewise. (*cmp_swp_<optab><ALLX:mode>_reg<GPI:mode>): Likewise. (*cmp_swp_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax. * gcc.target/aarch64/cmp.c: Likewise. * gcc.target/aarch64/subs3.c: Likewise. * gcc.target/aarch64/subsp.c: Likewise. * gcc.target/aarch64/extend-syntax.c: New test.

Enable thumb1_gen_const_int to generate RTL or asm depending on the context, so that we avoid duplicating code to handle constants in Thumb-1 with -mpure-code. Use a template so that the algorithm is effectively shared, and rely on two classes to handle the actual emission as RTL or asm. The generated sequence is improved to handle right-shiftable and small values with less instructions. We now generate: 128: movs r0, r0, #128 264: movs r3, #33 lsls r3, #3 510: movs r3, #255 lsls r3, #1 512: movs r3, #1 lsls r3, #9 764: movs r3, #191 lsls r3, #2 65536: movs r3, #1 lsls r3, #16 0x123456: movs r3, #18 ;0x12 lsls r3, #8 adds r3, #52 ;0x34 lsls r3, #8 adds r3, #86 ;0x56 0x1123456: movs r3, #137 ;0x89 lsls r3, #8 adds r3, #26 ;0x1a lsls r3, #8 adds r3, #43 ;0x2b lsls r3, #1 0x1000010: movs r3, #16 lsls r3, #16 adds r3, #1 lsls r3, #4 0x1000011: movs r3, #1 lsls r3, #24 adds r3, #17 -8192: movs r3, #1 lsls r3, #13 rsbs r3, #0 The patch adds a testcase which does not fully exercise thumb1_gen_const_int, as other existing patterns already catch small constants. These parts of thumb1_gen_const_int are used by arm_thumb1_mi_thunk. 2020-11-02 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm.c (thumb1_const_rtl, thumb1_const_print): New classes. (thumb1_gen_const_int): Rename to ... (thumb1_gen_const_int_1): ... New helper function. Add capability to emit either RTL or asm, improve generated code. (thumb1_gen_const_int_rtl): New function. * config/arm/arm-protos.h (thumb1_gen_const_int): Rename to thumb1_gen_const_int_rtl. * config/arm/thumb1.md: Call thumb1_gen_const_int_rtl instead of thumb1_gen_const_int. gcc/testsuite/ * gcc.target/arm/pure-code/no-literal-pool-m0.c: New.

/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128' #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77 #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190 #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338 #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370 #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f) #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24) #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd) Differential Revision: https://reviews.llvm.org/D97263 Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.

The insn has extraneous operand #3 that is aliased in RTL to operand #0 with a constraint. The operands specify a single-bit field in memory that the machine instruction produced boths reads for the purpose of determining whether to branch or not and either clears or sets according to the machine operation selected with the `ccss' iterator. The caller of the insn is supposed to supply the same rtx for both operands. This odd arrangement happens to work with old reload, but breaks with libatomic if LRA is used instead: .../libatomic/flag.c: In function 'atomic_flag_test_and_set': .../libatomic/flag.c:36:1: error: unable to generate reloads for: 36 | } | ^ (jump_insn 7 6 19 2 (unspec_volatile [ (set (pc) (if_then_else (eq (zero_extract:SI (mem/v:QI (reg:SI 27) [-1 S1 A8]) (const_int 1 [0x1]) (const_int 0 [0])) (const_int 1 [0x1])) (label_ref:SI 25) (pc))) (set (zero_extract:SI (mem/v:QI (reg:SI 28) [-1 S1 A8]) (const_int 1 [0x1]) (const_int 0 [0])) (const_int 1 [0x1])) ] 100) ".../libatomic/flag.c":35:10 669 {jbbssiqi} (nil) -> 25) during RTL pass: reload .../libatomic/flag.c:36:1: internal compiler error: in curr_insn_transform, at lra-constraints.c:4098 0x1112c587 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) .../gcc/rtl-error.c:108 0x10ee6563 curr_insn_transform .../gcc/lra-constraints.c:4098 0x10eeaf87 lra_constraints(bool) .../gcc/lra-constraints.c:5133 0x10ec97e3 lra(_IO_FILE*) .../gcc/lra.c:2336 0x10e4633f do_reload .../gcc/ira.c:5827 0x10e46b27 execute .../gcc/ira.c:6013 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Switch to using `match_dup' as expected then for a machine instruction that in its encoding only has one actual operand in for the single-bit field. gcc/ * config/vax/builtins.md (jbb<ccss>i<mode>): Remove operand #3. (sync_lock_test_and_set<mode>): Adjust accordingly. (sync_lock_release<mode>): Likewise.

gcc/ada/ * atree.h (Slots_Ptr): Change pointed-to type to any_slot. * fe.h (Get_RT_Exception_Name): Change type of parameter. * namet.ads (Name_Entry): Mark non-boolean components as aliased, reorder the boolean components and add an explicit Spare component. * namet.adb (Name_Enter): Adjust aggregate accordingly. (Name_Find): Likewise. (Reinitialize): Likewise. * namet.h (struct Name_Entry): Adjust accordingly. (Names_Ptr): Use correct type. (Name_Chars_Ptr): Likewise. (Get_Name_String): Fix declaration and adjust to above changes. * types.ads (RT_Exception_Code): Add pragma Convention C. * types.h (Column_Number_Type): Fix original type. (slot): Rename union type to... (any_slot): ...this and adjust assertion accordingly. (RT_Exception_Code): New enumeration type. * uintp.ads (Uint_Entry): Mark components as aliased. * uintp.h (Uints_Ptr): Use correct type. (Udigits_Ptr): Likewise. * gcc-interface/gigi.h (gigi): Adjust name and type of parameter. * gcc-interface/cuintp.c (UI_To_gnu): Adjust references to Uints_Ptr and Udigits_Ptr. * gcc-interface/trans.c (Slots_Ptr): Adjust pointed-to type. (gigi): Adjust type of parameter. (build_raise_check): Add cast in call to Get_RT_Exception_Name.

The fixed error is: ==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900 #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172 #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161 #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517 #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686 #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567 #5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656 #6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657 #7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667 #8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773 #9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191 #10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001 #11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154 #12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289 #13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269 #14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537 #15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482 #16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210 #17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349 #18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39 #19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332 #20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd) 0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920) allocated by thread T0 here: #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102 #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156 gcc/ChangeLog: * gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].

Fixes: ==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178 READ of size 4 at 0x00000666ca5c thread T0 #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855 #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916 #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887 #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829 #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427 #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346 #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967 #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808 #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146 for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d. gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Check index before accessing cl_options.

…imize or target pragmas [PR103012] The following testcases ICE when an optimize or target pragma is followed by a long line (4096+ chars). This is because on such long lines we can't use columns anymore, but the cpp_define calls performed by c_cpp_builtins_optimize_pragma or from the backend hooks for target pragma are done on temporary buffers and expect to get columns from whatever line they appear on (which happens to be the long line after optimize/target pragma), and we run into: #0 fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986 #1 0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502 #2 0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827 #3 0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898 #4 0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592 #5 0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394 #6 0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601 #7 0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639 #8 0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589 #9 0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>) at ../../gcc/c-family/c-cppbuiltin.c:600 assertion that LC_RENAME doesn't happen first. I think the right fix is emit those predefined macros upon optimize/target pragmas with BUILTINS_LOCATION, like we already do for those macros at the start of the TU, they don't appear in columns of the next line after it. Another possibility would be to force them at the location of the pragma. 2021-12-30 Jakub Jelinek <jakub@redhat.com> PR c++/103012 gcc/ * config/i386/i386-c.c (ix86_pragma_target_parse): Perform cpp_define/cpp_undef calls with forced token locations BUILTINS_LOCATION. * config/arm/arm-c.c (arm_pragma_target_parse): Likewise. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise. * config/s390/s390-c.c (s390_pragma_target_parse): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform cpp_define_unused/cpp_undef calls with forced token locations BUILTINS_LOCATION. gcc/testsuite/ PR c++/103012 * g++.dg/cpp/pr103012.C: New test. * g++.target/i386/pr103012.C: New test.

This is a "canonical types differ for identical types" ICE, which started with r11-4682. It's a bit tricky to explain. Consider: template <typename T> struct S { S<T> bar() noexcept(T::value); // #1 S<T> foo() noexcept(T::value); // #2 }; template <typename T> S<T> S<T>::foo() noexcept(T::value) {} // #3 We ICE because #3 and #2 have the same type, but their canonical types differ: TYPE_CANONICAL (#3) == #2 but TYPE_CANONICAL (#2) == #1. The member functions #1 and #2 have the same type. However, since their noexcept-specifier is deferred, when parsing them, we create a variant for both of them, because DEFERRED_PARSE cannot be compared. In other words, build_cp_fntype_variant's tree v = TYPE_MAIN_VARIANT (type); for (; v; v = TYPE_NEXT_VARIANT (v)) if (cp_check_qualified_type (v, type, type_quals, rqual, raises, late)) return v; will *not* find an existing variant when creating a method_type for #2, so we have to create a new one. But then we perform delayed parsing and call fixup_deferred_exception_variants for #1 and #2. f_d_e_v will replace TYPE_RAISES_EXCEPTIONS with the newly parsed noexcept-specifier. It also sets TYPE_CANONICAL (#2) to #1. Both noexcepts turned out to be the same, so now we have two equivalent variants in the list! I.e., +-----------------+ +-----------------+ +-----------------+ | main | | #2 | | #1 | | S S::<T379>(S*) |----->| S S::<T37c>(S*) |----->| S S::<T37a>(S*) |----->NULL | - | | noex(T::value) | | noex(T::value) | +-----------------+ +-----------------+ +-----------------+ Then we get to #3. As for #1 and #2, grokdeclarator calls build_memfn_type, which ends up calling build_cp_fntype_variant, which will use the loop above to look for an existing variant. The first one that matches cp_check_qualified_type will be used, so we use #2 rather than #1, and the TYPE_CANONICAL mismatch follows. Hopefully that makes sense. As for the fix, I didn't think I could rewrite the method_type #2 with #1 because the type may have escaped via decltype. So my approach is to elide #2 from the list, so when looking for a matching variant, we always find #1 (#2 remains live though, which admittedly sounds sort of dodgy). PR c++/101715 gcc/cp/ChangeLog: * tree.cc (fixup_deferred_exception_variants): Remove duplicate variants after parsing the exception specifications. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept72.C: New test. * g++.dg/cpp0x/noexcept73.C: New test.

…04617] On #define A(n) int foo1##n(void) { return 1##n; } #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325) B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642) testcase with ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized (testcase too slow to be included into testsuite). The problem is clearly reported by readelf: readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323 readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section. because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be used instead and .symtab_shndx section should contain the real section index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >= SHN_LORESERVE value is needed it should put those into Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can contain any section index. Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before 2011) used to mishandle the > 63.75K sections case and assumed there is a hole in between the sections, but what simple_object_elf_copy_lto_debug_sections does wouldn't help in that case for the debug temp object creation, we'd need to detect the case also in that routine and take it into account in the remapping etc. I think it is not worth it given that it is over 10 years, if somebody needs 63.75K or more sections, better use more recent binutils. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR lto/104617 * simple-object-elf.c (simple_object_elf_match): Fix up URL in comment. (simple_object_elf_copy_lto_debug_sections): Remap sh_info and sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE range (inclusive).

I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X##2; \ template <class U> struct X##3; \ template <class U> struct X##4; \ template <class U> struct X##5; \ template <class U> struct X##6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.

Hi, Richard and Richi. Base on the suggestions from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html This patch choose (1) approach that Richard provided, meaning: RVV implements cond_* optabs as expanders. RVV therefore supports both IFN_COND_ADD and IFN_COND_LEN_ADD. No dummy length arguments are needed at the gimple level. Such approach can make codes much cleaner and reasonable. Consider this following case: void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i] + a[i]; } Output of RISC-V (32-bits) gcc (trunk) (Compiler #3) <source>:5:21: missed: couldn't vectorize loop <source>:5:21: missed: not vectorized: control flow in loop. ARM SVE: ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vec_mask_and_55 = loop_mask_49 & mask__27.10_51; ... vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56); For RVV, we want IR as follows: ... _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]); ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, vect__6.13_55, _68, 0); ... Both len and mask of COND_LEN_ADD are real not dummy. This patch has been fully tested in RISC-V port with supporting both COND_* and COND_LEN_*. And also, Bootstrap and Regression on X86 passed. OK for trunk? gcc/ChangeLog: * internal-fn.cc (get_len_internal_fn): New function. (DEF_INTERNAL_COND_FN): Ditto. (DEF_INTERNAL_SIGNED_COND_FN): Ditto. * internal-fn.h (get_len_internal_fn): Ditto. * tree-vect-stmts.cc (vectorizable_call): Add CALL auto-vectorization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assembler/relocation that produces a malformed object file. #3

Assembler/relocation that produces a malformed object file. #3

iains commented Aug 30, 2020

iains commented Aug 31, 2020

iains commented Aug 31, 2020

iains commented Sep 2, 2020

Assembler/relocation that produces a malformed object file. #3

Assembler/relocation that produces a malformed object file. #3

Comments

iains commented Aug 30, 2020

iains commented Aug 31, 2020

iains commented Aug 31, 2020

iains commented Sep 2, 2020