forked from gcc-mirror/gcc
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoongArch: Add vector calling convention support #113
Open
ChenghuiPan
wants to merge
3,623
commits into
loongson:dev/vecarg
Choose a base branch
from
ChenghuiPan:dev/vecarg
base: dev/vecarg
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ad049ba
to
84e74e7
Compare
This patch fixes a bug where the compiler could crash on a postcondition on a subprogram body (i.e. a body that "acts as spec"), if the postcondition contains 'Old attributes that use the Ada 2022 feature that allows certain conditionals (see RM-6.1.1). The main bug fix here is in exp_attr.adb to set Ins_Node properly in the Acts_As_Spec case. Otherwise, the initialization of the 'Old temp would occur before the declaration, which gigi does not like. gcc/ada/ * exp_attr.adb (Attribute_Old): The 'Old attribute we are processing here is in a postcondition, which cannot be inside the "Wrapped_Statements" of the subprogram with that postcondition. So remove the loop labeled "Climb the parent chain looking for subprogram _Wrapped_Statements". The only way this loop could find a Subp is if we are nested inside a subprogram that also has a postcondition, and in that case we would find the wrong (outer) one. In any case, Subp is set to Empty after the loop, so all subsequent tests for Present (Subp) are necessarily False; remove them and the corresponding code. Set Ins_Node unconditionally (to the right thing). Remove obsolete comments. * sem_util.adb (Determining_Expressions): Fix assertion; Pragma_Test_Case was missing. (Eligible_For_Conditional_Evaluation): Fix assert that could fail in case of errors. * libgnat/s-valspe.ads: Remove pragma Unevaluated_Use_Of_Old; there are no uses of 'Old in this package.
gcc/ada/ * sinfo.ads: Fix typo.
It is now totally unused by the front-end and dependent tools. gcc/ada/ * einfo.ads (Postconditions_Proc): Delete. * gen_il-fields.ads (Opt_Field_Enum): Remove Postconditions_Proc. * gen_il-gen-gen_entities.adb (E_Function): Likewise. (E_Procedure): Likewise. (E_Entry): Likewise. (E_Entry_Family): Likewise.
GNATprove raised warnings about unspecified Global contracts when using functions from an instance of Ada.Numerics.Generic_Elementary_Functions. This patch adds null Global contracts to all subprograms. gcc/ada/ * libgnat/a-ngelfu.ads (Sqrt): Add Global contracts. (Log): Likewise. (Exp): Likewise. ("**"): Likewise. (Sin): Likewise. (Cos): Likewise. (Tan): Likewise. (Cot): Likewise. (Arcsin): Likewise. (Arccos): Likewise. (Arctan): Likewise. (Arccot): Likewise. (Sinh): Likewise. (Cosh): Likewise. (Tanh): Likewise. (Coth): Likewise. (Arcsinh): Likewise. (Arccosh): Likewise. (Arctanh): Likewise. (Arccoth): Likewise.
…BIP protocol Dynamically-allocated objects that require finalization are attached to a finalization master, which is of a (limited) controlled type declared in the System.Finalization_Masters unit. Now there are two kinds of them: homogeneous and heterogeneous; for the former, all the objects attached to the master share the same Finalize_Address primitive whereas, for the latter, they may have different Finalize_Address primitives. There is a problem in this scheme with the BIP protocol, because this protocol forwards the finalization master from callers to callees and it does so even if the result types are distinct, so it is possible for a homogeneous finalization master to end up containing objects with different Finalize_Address primitives; in that case, the object attached last wins and sets the common Finalize_Address, which is then used to finalize other objects with unpredictable outcome (and very loud valgrind report). Therefore, this change gets rid of homogeneous finalization masters and also streamlines the implementation of heterogeneous ones by storing the Finalize_Address primitive on a per object basis in the FM_Node record. gcc/ada/ * einfo.ads (Pending_Access_Types): Delete. * exp_ch3.adb (Freeze_Type.Process_Pending_Access_Types): Likewise. (Freeze_Type): Do not call Process_Pending_Access_Types. * exp_ch7.ads (Make_Set_Finalize_Address_Call): Delete. * exp_ch7.adb (Build_Finalization_Master.Add_Pending_Access_Type): Delete. (Build_Finalization_Master): Do not set Finalize_Address on the master or call Add_Pending_Access_Type. (Make_Set_Finalize_Address_Call): Delete. * gen_il-fields.ads (Opt_Field_Enum): Remove Pending_Access_Types. * gen_il-gen-gen_entities.adb (Type_Kind): Likewise. * rtsfind.ads (RE_Id): Remove RE_Set_Finalize_Address. (RE_Unit_Table): Likewise. * sem_ch3.adb (Analyze_Full_Type_Declaration): Do not deal with pending access types. * libgnat/s-finmas.ads (Attach_Unprotected): Add Finalize_Address second parameter. (Delete_Finalize_Address_Unprotected): Delete. (Finalize_Address): Likewise. (Finalize_Address_Unprotected): Likewise. (Is_Homogeneous): Likewise. (Set_Finalize_Address): Likewise. (Set_Finalize_Address_Unprotected): Likewise. (Set_Heterogeneous_Finalize_Address_Unprotected): Likewise. (Set_Is_Heterogeneous): Likewise. (FM_Node): Add Finalize_Address component. (Finalization_Master): Remove Is_Homogeneous and Finalize_Address components. * libgnat/s-finmas.adb: Remove with & use clauses for System.HTable. (Finalize_Address_Table): Delete. (Attach_Unprotected): Add Finalize_Address second parameter and save its value in the Finalize_Address field of the node. (Delete_Finalize_Address_Unprotected): Delete. (Finalize): Call Finalize_Address saved in the nodes. (Finalize_Address): Delete. (Finalize_Address_Unprotected): Likewise. (Hash): Likewise. (Is_Homogeneous): Likewise. (Print_Master): Adjust. (Set_Finalize_Address): Delete. (Set_Finalize_Address_Unprotected): Likewise. (Set_Heterogeneous_Finalize_Address_Unprotected): Likewise. (Set_Is_Heterogeneous): Likewise. * libgnat/s-stposu.adb (Finalize_Address_Table_In_Use): Likewise. (Allocate_Any_Controlled): Pass Fin_Address to Attach_Unprotected and remove obsolete processing. (Deallocate_Any_Controlled): Remove obsolete processing. (Set_Pool_Of_Subpool): Do not call Set_Is_Heterogeneous.
Routine Contains_POC (where POC means "per-object constraint") was failing to detect expressions of the form "Current_Type'Access", because it was comparing prefix (typically an N_Identifier) with a scope (typically an N_Definining_Entity). This was harmless, because these expressions are detected anyway in Analyze_Access_Attribute, together with uses of 'Unconstrained_Access and 'Unchecked_Access. Also, this routine was failing to detect the use of discriminants in array types with constrained subtype indication, e.g.: type T (D : Integer) is record C : array (Integer range 1 .. D); end record; It is simpler to just reuse Has_Discriminant_Dependent_Constraint and leave detection of access attributes to Analyze_Access_Attribute. gcc/ada/ * sem_attr.adb (Analyze_Access_Attribute): Prevent search from going too far. * sem_ch3.adb (Analyze_Component_Declaration): Remove Contains_POC; reuse Has_Discriminant_Dependent_Constraint.
Code cleanup; semantics is unaffected. gcc/ada/ * sem_attr.adb (Analyze_Access_Attribute): Move code to IF branch where its result is used.
Code cleanup; behaviour is unaffected. gcc/ada/ * sem_attr.adb (Analyze_Access_Attribute): Replace loop with Current_Scope_No_Loops.
In GNATprove mode we didn't inline subprograms whose formal parameters was of a record type with constraints depending on discriminants. Now this is extended to formal parameters with per-object constraints, regardless if they come from references to discriminants or from attributes prefixed by the current type instance. gcc/ada/ * inline.adb (Has_Formal_With_Per_Object_Constrained_Component): Use flag Has_Per_Object_Constraint which is set by analysis; rename for consistency.
Code cleanup. gcc/ada/ * exp_ch4.adb (Useful): Remove redundant check for empty list, because iteration with First works also for empty list; rename local variable from L to Action.
Code cleanup. gcc/ada/ * inline.adb (Has_Single_Return): Remove redundant check for empty list, because First works also for empty list.
Code cleanup. gcc/ada/ * exp_aggr.ads (Static_Array_Aggregate): Fix typo in comment.
Code cleanup; semantics is unaffected. gcc/ada/ * exp_ch3.adb (Count_Default_Sized_Task_Stacks): Do not look for tasks inside record discriminants; remove avoid repeated call to Has_Task that happened for record components. (Expand_N_Object_Declaration): Use high-level routine to detect array types and subtypes; remove unused initial values.
Negative numbers of stack counts have no meaning. gcc/ada/ * lib.ads, lib.adb (Primary_Stack_Count, Sec_Stack_Count, Increment_Primary_Stack_Count, Increment_Sec_Stack_Count, Unit_Record): Stack counts are never negative. * ali.ads (Unit_Record): Likewise. * bindgen.adb (Num_Primary_Stacks, Num_Sec_Stacks): Likewise. * exp_ch3.adb (Count_Default_Sized_Task_Stacks): Likewise. * sem_util.ads, sem_util.adb (Number_Of_Elements_In_Array): Likewise.
Fix handling of null arrays when calculating the secondary stack size for the binder. gcc/ada/ * sem_util.adb (Number_Of_Elements_In_Array): Fix counting of elements in null arrays; remove redundant parenthesis; avoid run-time conversion of 1 to universal integer.
This reverts commit 109f1b2.
Support for Solaris 11.3 had already been obsoleted in GCC 13. However, since the only Solaris system in the cfarm was running 11.3, I've kept it in tree until now when both Solaris 11.4/SPARC and x86 systems have been added. This patch actually removes the Solaris 11.3 support. Apart from several minor simplifications, there are two more widespread changes: * In Solaris 11.4, libsocket and libnsl were folded into libc, so there's no longer a need to link them explictly. * Since Solaris 11.4, Solaris includes all crts needed by gcc (like crt1.o and gcrt1.o) with the base system. All workarounds to provide fallbacks can thus go. Bootstrapped without regressions on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (as/ld, gas/ld, and gas/gld) as well as Solaris 11.3/x86 to ascertain that version is actually rejected. 2024-04-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> c++tools: * configure.ac (ax_lib_socket_nsl.m4): Don't sinclude. (AX_LIB_SOCKET_NSL): Don't call. (NETLIBS): Remove. * configure: Regenerate. * Makefile.in (NETLIBS): Remove. (g++-mapper-server$(exeext)): Remove $(NETLIBS). gcc: * config.gcc: Move *-*-solaris2.11.[0-3]* to unsupported list. <*-*-solaris2*> (default_use_cxa_atexit): Set unconditionally. * configure.ac (AX_LIB_SOCKET_NSL): Don't call. (NETLIBS): Remove. (gcc_cv_ld_aligned_shf_merge): Remove. (hidden_linkonce) <i?86-*-solaris2* | x86_64-*-solaris2*>: Remove. (gcc_cv_target_dl_iterate_phdr) <*-*-solaris2*>: Always set to yes. * Makefile.in (NETLIBS): Remove. * configure, config.in, aclocal.m4: Regenerate. * config/sol2.h: Don't check HAVE_SOLARIS_CRTS. (STARTFILE_SPEC): Remove !HAVE_SOLARIS_CRTS case. [USE_GLD] (LINK_EH_SPEC): Remove TARGET_DL_ITERATE_PHDR guard. * config/i386/i386.cc (USE_HIDDEN_LINKONCE): Remove guard. * varasm.cc (mergeable_string_section): Remove HAVE_LD_ALIGNED_SHF_MERGE handling. (mergeable_constant_section): Likewise. * doc/install.texi (Specific,i?86-*-solaris2*): Reference Solaris 11.4 only. (Specific, *-*-solaris2*): Document Solaris 11.3 removal. Remove 11.3 references and caveats. Update for 11.4. gcc/cp: * Make-lang.in (cc1plus$(exeext)): Remove $(NETLIBS). gcc/objcp: * Make-lang.in (cc1objplus$(exeext)): Remove $(NETLIBS). gcc/testsuite: * lib/target-supports.exp (check_effective_target_pie): Always enable on *-*-solaris2*. libgcc: * configure.ac <*-*-solaris2*> (libgcc_cv_solaris_crts): Remove. * config.host <*-*-solaris2*>: Remove !libgcc_cv_solaris_crts support. * configure, config.in: Regenerate. * config/sol2/gmon.c (internal_mcount) [!HAVE_SOLARIS_CRTS]: Remove. * config/i386/sol2-c1.S, config/sparc/sol2-c1.S: Remove. * config/sol2/t-sol2 (crt1.o, gcrt1.o): Remove. libstdc++-v3: * testsuite/lib/dg-options.exp (add_options_for_net_ts) <*-*-solaris2*>: Don't link with -lsocket -lnsl.
The polymorphic Value_Range object takes a tree type at construction so it can determine what type of range to use (currently irange or frange). It seems a few of the types are slightly off. This isn't a problem now, because IPA only cares about integers and pointers, which can both live in an irange. However, with prange coming about, we need to get the type right, because you can't store an integer in a pointer range or vice versa. Also, in preparation for prange, the irange::supports_p() idiom will become: irange::supports_p () || prange::supports_p() To avoid changing all these places, I've added an inline function we can later change and change everything at once. Finally, there's a Value_Range::supports_type_p() && irange::supports_p() in the code. The latter is a subset of the former, so there's no need to check both. gcc/ChangeLog: * ipa-cp.cc (ipa_vr_operation_and_type_effects): Use ipa_supports_p. (ipa_value_range_from_jfunc): Change Value_Range type. (propagate_vr_across_jump_function): Same. * ipa-cp.h (ipa_supports_p): New. * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Change Value_Range type. * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Use ipa_supports_p. (ipcp_get_parm_bits): Same.
TYPE_STRUCTURAL_EQUALITY_P is part of our type system so we have to make sure to include that into the type unification done via type_hash_canon. This requires the flag to be set before querying the hash which is the biggest part of the patch. PR middle-end/114931 gcc/ * tree.cc (type_hash_canon_hash): Hash TYPE_STRUCTURAL_EQUALITY_P. (type_cache_hasher::equal): Compare TYPE_STRUCTURAL_EQUALITY_P. (build_array_type_1): Set TYPE_STRUCTURAL_EQUALITY_P before probing with type_hash_canon. (build_function_type): Likewise. (build_method_type_directly): Likewise. (build_offset_type): Likewise. (build_complex_type): Likewise. * attribs.cc (build_type_attribute_qual_variant): Likewise. gcc/c-family/ * c-common.cc (complete_array_type): Set TYPE_STRUCTURAL_EQUALITY_P before probing with type_hash_canon. gcc/testsuite/ * gcc.dg/pr114931.c: New testcase.
The recent move of libgfortran object files to subdirs and the resulting breakage of libgfortran.so symbol exports demonstrated how fragile deriving object and archive names from their libtool counterparts in the Makefiles is. Therefore, this patch moves that step into make_sunver.pl, considerably simplifying the Makefile rules to create the version scripts. Bootstrapped without regressions on i386-pc-solaris2.11 and sparc-sun-solaris2.11, verifying that the version scripts are identical except for the input filenames. 2024-05-06 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> contrib: * make_sunver.pl: Use File::Basename; Skip -lLIB args. Convert libtool object/archive names to underlying objects/archives. libatomic: * Makefile.am [LIBAT_BUILD_VERSIONED_SHLIB_SUN] (libatomic.map-sun): Pass $(libatomic_la_OBJECTS), $(libatomic_la_LIBADD) to make_sunver.pl unmodified. * Makefile.in: Regenerate. libffi: * Makefile.am [LIBFFI_BUILD_VERSIONED_SHLIB_SUN] (libffi.map-sun): Pass $(libffi_la_OBJECTS), $(libffi_la_LIBADD) to make_sunver.pl unmodified. * Makefile.in: Regenerate. libgfortran: * Makefile.am [LIBGFOR_USE_SYMVER_SUN} (gfortran.ver-sun): Pass $(libgfortran_la_OBJECTS), $(libgfortran_la_LIBADD) to make_sunver.pl unmodified. * Makefile.in: Regenerate. libgomp: * Makefile.am [LIBGOMP_BUILD_VERSIONED_SHLIB_SUN] (libgomp.ver-sun): Pass $(libgomp_la_OBJECTS), $(libgomp_la_LIBADD) to make_sunver.pl unmodified. * Makefile.in: Regenerate. libitm: * Makefile.am [LIBITM_BUILD_VERSIONED_SHLIB_SUN] (libitm.map-sun): Pass $(libitm_la_OBJECTS), $(libitm_la_LIBADD) to make_sunver.pl unmodified. * Makefile.in: Regenerate. libquadmath: * Makefile.am [LIBQUAD_USE_SYMVER_SUN] (quadmath.map-sun): Pass $(libquadmath_la_OBJECTS), $(libquadmath_la_LIBADD) to make_sunver.pl unmodified. * Makefile.in: Regenerate. libssp: * Makefile.am [LIBSSP_USE_SYMVER_SUN] (ssp.map-sun): Pass $(libssp_la_OBJECTS), $(libssp_la_LIBADD) to make_sunver.pl unmodified. * Makefile.in: Regenerate. libstdc++-v3: * src/Makefile.am [ENABLE_SYMVERS_SUN] (libstdc++-symbols.ver-sun): Pass $(libstdc___la_OBJECTS), $(libstdc___la_LIBADD) to make_sunver.pl unmodified. * src/Makefile.in: Regenerate.
We're currently using size_t but at the same time storing them into bitmaps which only support unsigned int index. The following makes it unsigned int throughout, saving memory as well. * cfgexpand.cc (stack_var::representative): Use 'unsigned' for stack var indexes instead of 'size_t'. (stack_var::next): Likewise. (EOC): Likewise. (stack_vars_alloc): Likewise. (stack_vars_num): Likewise. (decl_to_stack_part): Likewise. (stack_vars_sorted): Likewise. (add_stack_var): Likewise. (add_stack_var_conflict): Likewise. (stack_var_conflict_p): Likewise. (visit_op): Likewise. (visit_conflict): Likewise. (add_scope_conflicts_1): Likewise. (stack_var_cmp): Likewise. (part_hashmap): Likewise. (update_alias_info_with_stack_vars): Likewise. (union_stack_vars): Likewise. (partition_stack_vars): Likewise. (dump_stack_var_partition): Likewise. (expand_stack_vars): Likewise. (account_stack_vars): Likewise. (stack_protect_decl_phase_1): Likewise. (stack_protect_decl_phase_2): Likewise. (asan_decl_phase_3): Likewise. (init_vars_expansion): Likewise. (estimated_stack_frame_size): Likewise.
Bitcount operations popcount, clz, and ctz are emulated for narrow modes in case an operation is only supported for wider modes. Beside that ctz may be emulated via clz in expand_ctz. Reflect this in expression_expensive_p. I considered the emulation of ctz via clz as not expensive since this basically reduces to ctz (x) = c - (clz (x & ~x)) where c is the mode precision minus 1 which should be faster than a loop. gcc/ChangeLog: PR tree-optimization/110490 * tree-scalar-evolution.cc (expression_expensive_p): Also consider mode widening for popcount, clz, and ctz.
operand_equal_p already has checking code to verify the hash is equal, avoid doing that again in gimplify_hasher::equal. * gimplify.cc (gimplify_hasher::equal): Remove redundant checking.
This avoids a tempoary when gimplifying reg = a ? b : c, re-using the LHS of an assignment if that's a register. PR middle-end/27800 * gimplify.cc (gimplify_modify_expr_rhs): For a COND_EXPR avoid a temporary from gimplify_cond_expr when the LHS is a register by pushing the assignment into the COND_EXPR arms. * gcc.dg/pr27800.c: New testcase.
…ions If we update the list of "active" symbols versions now, rather than when adding a new symbol version, we will notice if new symbols get added to the wrong version (as in PR 114692). libstdc++-v3/ChangeLog: * testsuite/util/testsuite_abi.cc: Update latest versions to new versions that should be used in future.
libstdc++-v3/ChangeLog: * include/backward/auto_ptr.h: Use https for URL in comment. * include/bits/basic_ios.h: Likewise. * include/std/iostream: Likewise.
…iant Implement the changes from P2944R3 which add constraints to the comparison operators of std::pair, std::tuple, and std::variant. The paper also changes std::optional, but we already constrain its comparisons using SFINAE on the return type. However, we need some additional constraints on the [optional.comp.with.t] operators that compare an optional with a value. The paper doesn't say to do that, but I think it's needed because otherwise when the comparison for two optional objects fails its constraints, the two overloads that are supposed to be for comparing to a non-optional become the best overload candidates, but are ambiguous (and we don't even get as far as checking the constraints for satisfaction). I reported LWG 4072 for this. The paper does not change std::expected, but probably should have done. I'll submit an LWG issue about that and implement it separately. Also add [[nodiscard]] to all these comparison operators. libstdc++-v3/ChangeLog: * include/bits/stl_pair.h (operator==): Add constraint. * include/bits/version.def (constrained_equality): Define. * include/bits/version.h: Regenerate. * include/std/optional: Define feature test macro. (__optional_rep_op_t): Use is_convertible_v instead of is_convertible. * include/std/tuple: Define feature test macro. (operator==, __tuple_cmp, operator<=>): Reimplement C++20 comparisons using lambdas. Add constraints. * include/std/utility: Define feature test macro. * include/std/variant: Define feature test macro. (_VARIANT_RELATION_FUNCTION_TEMPLATE): Add constraints. (variant): Remove unnecessary friend declarations for comparison operators. * testsuite/20_util/optional/relops/constrained.cc: New test. * testsuite/20_util/pair/comparison_operators/constrained.cc: New test. * testsuite/20_util/tuple/comparison_operators/constrained.cc: New test. * testsuite/20_util/variant/relops/constrained.cc: New test. * testsuite/20_util/tuple/comparison_operators/overloaded.cc: Disable for C++20 and later. * testsuite/20_util/tuple/comparison_operators/overloaded2.cc: Remove dg-error line for target c++20.
The call to Build_Allocate_Deallocate_Proc must occur before the special accessibility check for class-wide allocation is generated, because this check comes with cleanup code. gcc/ada/ * exp_ch4.adb (Expand_Allocator_Expression): Move the first call to Build_Allocate_Deallocate_Proc up to before the accessibility check.
gcc/ada/ * exp_ch7.adb (Finalization Management): Add a short description of the implementation of finalization chains.
gcc/ada/ * sem_util.adb: Typo fix in comment. * exp_aggr.adb: Likewise.
This patch fixes a crash when the compiler emits a warning about an unchecked conversion and -gnatdJ is enabled. gcc/ada/ * sem_ch13.adb (Validate_Unchecked_Conversions): Add node parameters to Error_Msg calls.
The implementation of User_Aspect_Definition uses subtype Boolean_Aspects to decide which existing aspects can be used to define old aspects. This subtype didn't include many of the SPARK aspects, notably the Always_Terminates. gcc/ada/ * aspects.ads (Aspect_Id, Boolean_Aspect): Change categorization of Boolean-valued SPARK aspects. * sem_ch13.adb (Analyze_Aspect_Specification): Adapt CASE statements to new classification of Boolean-valued SPARK aspects.
In the MAINTAINERS file, names and emails are separated by tabs. One of the entries recently added used spaces. This patch corrects this. The check-MAINTAINERS.py script breaks a bit when this happens. This patch also adds warning about this situation into the script. ChangeLog: * MAINTAINERS: Use tabs between name and email. contrib/ChangeLog: * check-MAINTAINERS.py: Add warning about not using tabs. Signed-off-by: Filip Kastl <fkastl@suse.cz>
This patch enables overlapped by-piece operations by defining TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear ratio is 2. So the overlap is only enabled with compare by-pieces. gcc/ * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. gcc/testsuite/ * gcc.target/powerpc/block-cmp-9.c: New.
libstdc++-v3/ChangeLog: PR libstdc++/115063 * include/std/stacktrace (basic_stacktrace::max_size): Fix typo in reference to _M_alloc member. * testsuite/19_diagnostics/stacktrace/stacktrace.cc: Check max_size() compiles.
this patch tames down inliner on (mutiply) self-recursive always_inline functions. While we already have caps on recursive inlning, the testcase combines early inliner and late inliner to get very wide recursive inlining tree. The basic idea is to ignore DISREGARD_INLINE_LIMITS when deciding on inlining self recursive functions (so we cut on function being large) and clear the flag once it is detected. I did not include the testcase since it still produces a lot of code and would slow down testing. It also outputs many inlining failed messages that is not very nice, but it is hard to detect self recursin cycles in full generality when indirect calls and other tricks may happen. gcc/ChangeLog: PR ipa/113291 * ipa-inline.cc (enum can_inline_edge_by_limits_flags): New enum. (can_inline_edge_by_limits_p): Take flags instead of multiple bools; add flag for forcing inlinie limits. (can_early_inline_edge_p): Update. (want_inline_self_recursive_call_p): Update; use FORCE_LIMITS mode. (check_callers): Update. (update_caller_keys): Update. (update_callee_keys): Update. (recursive_inlining): Update. (add_new_edges_to_heap): Update. (speculation_useful_p): Update. (inline_small_functions): Clear DECL_DISREGARD_INLINE_LIMITS on self recursion. (flatten_function): Update. (inline_to_all_callers_1): Update.
Consider a hello world, compiled with -gsplit-dwarf and dwarf version 4, and -g3: ... $ gcc -gdwarf-4 -gsplit-dwarf /data/vries/hello.c -g3 -save-temps -dA ... In section .debug_macro.dwo, we have: ... .Ldebug_macro0: .value 0x4 # DWARF macro version number .byte 0x2 # Flags: 32-bit, lineptr present .long .Lskeleton_debug_line0 .byte 0x3 # Start new file .uleb128 0 # Included from line number 0 .uleb128 0x1 # file /data/vries/hello.c .byte 0x5 # Define macro strp .uleb128 0 # At line number 0 .uleb128 0x1d0 # The macro: "__STDC__ 1" ... Given that we use a DW_MACRO_define_strp, we'd expect 0x1d0 to be an offset into a .debug_str.dwo section. But in fact, 0x1d0 is an index into the string offset table in section .debug_str_offsets.dwo: ... .long 0x34f0 # indexed string 0x1d0: __STDC__ 1 ... Add asserts that catch this inconsistency, and fix this by using DW_MACRO_define_strx instead. Tested on x86_64. gcc/ChangeLog: 2024-05-14 Tom de Vries <tdevries@suse.de> PR debug/115066 * dwarf2out.cc (output_macinfo_op): Fix DW_MACRO_define_strx/strp choice for v4 .debug_macro.dwo. Add asserts to check that choice. gcc/testsuite/ChangeLog: 2024-05-14 Tom de Vries <tdevries@suse.de> PR debug/115066 * gcc.dg/pr115066.c: New test.
This section can be misread to say that shrink_to_fit is available from GCC 3.4, but it was added later. libstdc++-v3/ChangeLog: * doc/xml/manual/strings.xml: Clarify that GCC 4.5 added std::string::shrink_to_fit. * doc/html/manual/strings.html: Regenerate.
Do not use dynamic_cast unconditionally, in case libstdc++ is built with -fno-rtti. libstdc++-v3/ChangeLog: PR libstdc++/115015 * src/c++23/print.cc (__open_terminal(streambuf*)) [!__cpp_rtti]: Do not use dynamic_cast.
…Solaris [PR107750] gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c currently FAILs on Solaris: FAIL: gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c (test for excess errors) Excess errors: /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c:91:3: error: implicit declaration of function 'memset' [-Wimplicit-function-declaration] Solaris <sys/select.h> has but no declaration of memset. While one can argue that this should be fixed, it's easy enough to just include <string.h> instead, which is what this patch does. Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu. 2024-05-14 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/testsuite: PR analyzer/107750 * gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c: Include <string.h>.
On aarch64, I get this failure: ... FAIL: gcc.dg/pr115066.c scan-assembler \\.byte\\t0xb\\t# Define macro strx ... This happens because we expect to match: ... .byte 0xb # Define macro strx ... but instead we get: ... .byte 0xb // Define macro strx ... Fix this by not explicitly matching the comment marker. Tested on aarch64 and x86_64. gcc/testsuite/ChangeLog: 2024-05-14 Tom de Vries <tdevries@suse.de> * gcc.dg/pr115066.c: Don't match comment marker.
Make clear_by_pieces() available to other parts of the compiler, similar to store_by_pieces(). gcc/ChangeLog: * expr.cc (clear_by_pieces): Remove static from clear_by_pieces. * expr.h (clear_by_pieces): Add prototype for clear_by_pieces.
Let's add '\t' to the instruction match pattern to avoid false positive matches when compiling with -flto. gcc/testsuite/ChangeLog: * gcc.target/riscv/cmo-zicbom-1.c: Add \t to test pattern. * gcc.target/riscv/cmo-zicbom-2.c: Likewise. * gcc.target/riscv/cmo-zicbop-1.c: Likewise. * gcc.target/riscv/cmo-zicbop-2.c: Likewise. * gcc.target/riscv/cmo-zicboz-1.c: Likewise. * gcc.target/riscv/cmo-zicboz-2.c: Likewise.
The Zicboz extension offers the cbo.zero instruction, which can be used to clean a memory region corresponding to a cache block. The Zic64b extension defines the cache block size to 64 byte. If both extensions are available, it is possible to use cbo.zero to clear memory, if the alignment and size constraints are met. This patch implements this. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_expand_block_clear): New prototype. * config/riscv/riscv-string.cc (riscv_expand_block_clear_zicboz_zic64b): New function to expand a block-clear with cbo.zero. (riscv_expand_block_clear): New RISC-V block-clear expansion function. * config/riscv/riscv.md (setmem<mode>): New setmem expansion.
The following revisits the fix for PR99954 which was observed as causing missed memcpy recognition and instead using memmove for non-aliasing copies. While the original fix mitigated bogus recognition of memcpy the root cause was not properly identified. The root cause is dr_analyze_indices "failing" to handle union references and leaving the DRs indices in a state that's not correctly handled by dr_may_alias. The following mitigates this there appropriately, restoring memcpy recognition for non-aliasing copies. This makes us run into a latent issue in ptr_deref_may_alias_decl_p when the pointer is something like &MEM[0].a in which case we fail to handle non-SSA name pointers. Add code similar to what we have in ptr_derefs_may_alias_p. PR tree-optimization/99954 * tree-data-ref.cc (dr_may_alias_p): For bases that are not completely analyzed fall back to TBAA and points-to. * tree-loop-distribution.cc (loop_distribution::classify_builtin_ldst): When there is no dependence again classify as memcpy. * tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify the pointer is an SSA name. * gcc.dg/tree-ssa/ldist-40.c: New testcase.
... if the constant can be represented as sum of two S12 values. The two S12 values could instead be fused with subsequent ADD insn. The helps - avoid an additional LUI insn - side benefits of not clobbering a reg e.g. w/o patch w/ patch long | | plus(unsigned long i) | li a5,4096 | { | addi a5,a5,-2032 | addi a0, a0, 2047 return i + 2064; | add a0,a0,a5 | addi a0, a0, 17 } | ret | ret NOTE: In theory not having const in a standalone reg might seem less CSE friendly, but for workloads in consideration these mat are from very late LRA reloads and follow on GCSE is not doing much currently. The real benefit however is seen in base+offset computation for array accesses and especially for stack accesses which are finalized late in optim pipeline, during LRA register allocation. Often the finalized offsets trigger LRA reloads resulting in mind boggling repetition of exact same insn sequence including LUI based constant materialization. This shaves off 290 billion dynamic instrustions (QEMU icounts) in SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of suite, there additional 10 billion shaved, with both gains and losses in indiv workloads as is usual with compiler changes. 500.perlbench_r-0 | 1,214,534,029,025 | 1,212,887,959,387 | 500.perlbench_r-1 | 740,383,419,739 | 739,280,308,163 | 500.perlbench_r-2 | 692,074,638,817 | 691,118,734,547 | 502.gcc_r-0 | 190,820,141,435 | 190,857,065,988 | 502.gcc_r-1 | 225,747,660,839 | 225,809,444,357 | <- -0.02% 502.gcc_r-2 | 220,370,089,641 | 220,406,367,876 | <- -0.03% 502.gcc_r-3 | 179,111,460,458 | 179,135,609,723 | <- -0.02% 502.gcc_r-4 | 219,301,546,340 | 219,320,416,956 | <- -0.01% 503.bwaves_r-0 | 278,733,324,691 | 278,733,323,575 | <- -0.01% 503.bwaves_r-1 | 442,397,521,282 | 442,397,519,616 | 503.bwaves_r-2 | 344,112,218,206 | 344,112,216,760 | 503.bwaves_r-3 | 417,561,469,153 | 417,561,467,597 | 505.mcf_r | 669,319,257,525 | 669,318,763,084 | 507.cactuBSSN_r | 2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10% 508.namd_r | 1,855,884,342,110 | 1,855,881,110,934 | 510.parest_r | 1,654,525,521,053 | 1,654,402,859,174 | 511.povray_r | 2,990,146,655,619 | 2,990,060,324,589 | 519.lbm_r | 1,158,337,294,525 | 1,158,337,294,529 | 520.omnetpp_r | 1,021,765,791,283 | 1,026,165,661,394 | 521.wrf_r | 1,715,955,652,503 | 1,714,352,737,385 | 523.xalancbmk_r | 849,846,008,075 | 849,836,851,752 | 525.x264_r-0 | 277,801,762,763 | 277,488,776,427 | 525.x264_r-1 | 927,281,789,540 | 926,751,516,742 | 525.x264_r-2 | 915,352,631,375 | 914,667,785,953 | 526.blender_r | 1,652,839,180,887 | 1,653,260,825,512 | 527.cam4_r | 1,487,053,494,925 | 1,484,526,670,770 | 531.deepsjeng_r | 1,641,969,526,837 | 1,642,126,598,866 | 538.imagick_r | 2,098,016,546,691 | 2,097,997,929,125 | 541.leela_r | 1,983,557,323,877 | 1,983,531,314,526 | 544.nab_r | 1,516,061,611,233 | 1,516,061,407,715 | 548.exchange2_r | 2,072,594,330,215 | 2,072,591,648,318 | 549.fotonik3d_r | 1,001,499,307,366 | 1,001,478,944,189 | 554.roms_r | 1,028,799,739,111 | 1,028,780,904,061 | 557.xz_r-0 | 363,827,039,684 | 363,057,014,260 | 557.xz_r-1 | 906,649,112,601 | 905,928,888,732 | 557.xz_r-2 | 509,023,898,187 | 508,140,356,932 | 997.specrand_fr | 402,535,577 | 403,052,561 | 999.specrand_ir | 402,535,577 | 403,052,561 | This should still be considered damage control as the real/deeper fix would be to reduce number of LRA reloads or CSE/anchor those during LRA constraint sub-pass (re)runs (thats a different PR/114729. Implementation Details (for posterity) -------------------------------------- - basic idea is to have a splitter selected via a new predicate for constant being possible sum of two S12 and provide the transform. This is however a 2 -> 2 transform which combine can't handle. So we specify it using a define_insn_and_split. - the initial loose "i" constraint caused LRA to accept invalid insns thus needing a tighter new constraint as well. - An additional fallback alternate with catch-all "r" register constraint also needed to allow any "reloads" that LRA might require for ADDI with const larger than S12. Testing -------- This is testsuite clean (rv64 only). I'll rely on post-commit CI multlib run for any possible fallout for other setups such as rv32. | | gcc | g++ | gfortran | | rv64imafdc_zba_zbb_zbs_zicond/ lp64d/ medlow | 41 / 17 | 8 / 3 | 7 / 2 | | rv64imafdc_zba_zbb_zbs_zicond/ lp64d/ medlow | 41 / 17 | 8 / 3 | 7 / 2 | I also threw this into a buildroot run, it obviously boots Linux to userspace. bloat-o-meter on glibc and kernel show overall decrease in staic instruction counts with some minor spot increases. These are generally in the category of - LUI + ADDI are 2 byte each vs. two ADD being 4 byte each. - Knock on effects due to inlining changes. - Sometimes the slightly shorter 2-insn seq in a mult-exit function can cause in-place epilogue duplication (vs. a jump back). This is slightly larger but more efficient in execution. In summary nothing to fret about. | linux/scripts/bloat-o-meter build-gcc-240131/target/lib/libc.so.6 \ build-gcc-240131-new-splitter-1-variant/target/lib/libc.so.6 | | add/remove: 0/0 grow/shrink: 21/49 up/down: 520/-3056 (-2536) | Function old new delta | getnameinfo 2756 2892 +136 ... | tempnam 136 144 +8 | padzero 276 284 +8 ... | __GI___register_printf_specifier 284 280 -4 | __EI_xdr_array 468 464 -4 | try_file_lock 268 260 -8 | pthread_create@GLIBC_2 3520 3508 -12 | __pthread_create_2_1 3520 3508 -12 ... | _nss_files_setnetgrent 932 904 -28 | _nss_dns_gethostbyaddr2_r 1524 1480 -44 | build_trtable 3312 3208 -104 | printf_positional 25000 22580 -2420 | Total: Before=2107999, After=2105463, chg -0.12% Caveat: ------ Jeff noted during v2 review that the operand0 constraint !riscv_reg_frame_related could potentially cause issues with hard reg cprop in future. If that trips things up we will have to loosen the constraint while dialing down the const range to (-2048 to 2032) as opposed to fll S12 range of (-2048 to 2047) to keep stack regs aligned. gcc/ChangeLog: * config/riscv/riscv.h: New macros to check for sum of two S12 range. * config/riscv/constraints.md: New constraint. * config/riscv/predicates.md: New Predicate. * config/riscv/riscv.md: New splitter. * config/riscv/riscv.cc (riscv_reg_frame_related): New helper. * config/riscv/riscv-protos.h: New helper prototype. gcc/testsuite/ChangeLog: * gcc.target/riscv/sum-of-two-s12-const-1.c: New test: checks for new patterns output. * gcc.target/riscv/sum-of-two-s12-const-2.c: Ditto. * gcc.target/riscv/sum-of-two-s12-const-3.c: New test: should not ICE. Tested-by: Edwin Lu <ewlu@rivosinc.com> # pre-commit-CI #1520 Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Commit r15-436-g44e7855e did not fix PR115013 for PRU because SMALL_REGISTER_CLASS_P is not returning an accurate value for the PRU backend. Word mode for PRU backend is defined as 8-bit, yet all ALU operations are preferred in 32-bit mode. Thus checking whether a register class contains a single word_mode register would not classify the actually single SImode register classes as small. This affected the multiplication source and destination register classes. Fix by implementing TARGET_CLASS_LIKELY_SPILLED_P to treat all register classes with SImode or smaller size as likely spilled. This in turn corrects the behaviour of SMALL_REGISTER_CLASS_P for PRU. PR rtl-optimization/115013 gcc/ChangeLog: * config/pru/pru.cc (pru_class_likely_spilled_p): Implement to mark classes containing one SImode register as likely spilled. (TARGET_CLASS_LIKELY_SPILLED_P): Define. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
gcc/cp/ChangeLog: * decl.cc (wrap_cleanups_r): Clarify comment. * init.cc (build_vec_init): Update comment.
We currently ICE upon the following invalid snippet because we fail to properly handle tsubst_arg_types returning error_mark_node in build_deduction_guide. == cut == template<class... Ts, class> struct A { A(Ts...); }; A a; == cut == This patch fixes this, and has been successfully tested on x86_64-pc-linux-gnu. PR c++/105760 gcc/cp/ChangeLog: * pt.cc (build_deduction_guide): Check for error_mark_node result from tsubst_arg_types. gcc/testsuite/ChangeLog: * g++.dg/parse/error66.C: New test.
So this patch allows us to eliminate an redundant AND in some shift-add style sequences. I think the testcase was reduced from xz by the RAU team, but I'm not highly confident of that. Specifically the AND is masking off the upper 32 bits of the un-shifted value and there's an outer SIGN_EXTEND from SI to DI. However in the RTL it's working on the post-shifted value, so the constant is left shifted, so we have to account for that in the pattern's condition. We can just drop the AND in this case. So instead we do a 64bit shift, then a sign extending ADD utilizing the low part of that 64bit shift result. This has run through Ventana's CI as well as my own. I'll wait for it to run through the larger CI system before pushing. Jeff gcc/ * config/riscv/riscv.md: Add pattern for sign extended shift-add sequence with a masked input. gcc/testsuite * gcc.target/riscv/shift-add-2.c: New test.
This patch adds Proof-of-Concept LoongArch vector calling convention support, and can be enabled by __attribute__ ((vecarg)) or -mvecarg option. The details and discussion can be found at Github loongson/gcc repo's issue list. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in: Add vector calling convention support. * config/loongarch/loongarch-protos.h (loongarch_init_cumulative_args): Ditto. * config/loongarch/loongarch.cc (loongarch_simd_abi): Ditto. (loongarch_fntype_abi): Ditto. (loongarch_comp_type_attributes): Ditto. (loongarch_init_cumulative_args): Ditto. (loongarch_insn_callee_abi): Ditto. (loongarch_flatten_aggregate_field): Ditto. (loongarch_flatten_aggregate_argument): Ditto. (loongarch_pass_aggregate_num_fpr): Ditto. (loongarch_pass_aggregate_in_fpr_and_gpr_p): Ditto. (loongarch_get_arg_info): Ditto. (loongarch_function_arg): Ditto. (loongarch_function_value_1): Ditto. (loongarch_return_in_memory): Ditto. (loongarch_call_tls_get_addr): Ditto. (loongarch_output_mi_thunk): Ditto. (TARGET_FNTYPE_ABI): Ditto. (TARGET_COMP_TYPE_ATTRIBUTES): Ditto. (TARGET_INSN_CALLEE_ABI): Ditto. * config/loongarch/loongarch.h (enum loongarch_pcs): Ditto. (INIT_CUMULATIVE_ARGS): Ditto. * config/loongarch/loongarch.md: Ditto. * config/loongarch/loongarch.opt: Ditto. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/vect-abi-pass-struct-1.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-pass-struct-2.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-pass-struct-3.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-pass-struct-4.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-pass-struct-5.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-pass-struct-6.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-ret-struct-1.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-ret-struct-2.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-ret-struct-3.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-ret-struct-4.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-ret-struct-5.c: New test. * gcc.target/loongarch/vector/lasx/vect-abi-ret-struct-6.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-1.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-2.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-stdarg-1.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-stdarg-2.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-struct-1.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-struct-2.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-struct-3.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-struct-4.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-struct-5.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-pass-struct-6.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-ret-struct-1.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-ret-struct-2.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-ret-struct-3.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-ret-struct-4.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-ret-struct-5.c: New test. * gcc.target/loongarch/vector/lsx/vect-abi-ret-struct-6.c: New test.
…ent. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_fntype_abi): Fixing up loongarch_fntype_abi () and TARGET_VECARG judgement. (loongarch_flatten_aggregate_field): Ditto. (loongarch_flatten_aggregate_argument): Ditto.
…ember. This patch fixes the wrong-placed condition inside loongarch_flatten_aggregate_field () function, which causes the ICE. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_flatten_aggregate_field): Fixing ICE when returning struct with more than 2 vector member. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/vect-abi-ret-struct-7.c: New test.
There's a invoking to gen_call_value_internal() that missing pcs argument. This patch corrects it. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_call_tls_get_addr): Add missing pcs argument for gen_call_value_internal().
vect-abi-pass-1.c checks instruction with $vr0 as parameter instead of $vr8, which is wrong compares to expected checking sequence. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lsx/vect-abi-pass-1.c: Change vr0 to vr8 in checking sequence.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch adds Proof-of-Concept LoongArch vector calling convention support, and can be enabled by
__attribute__ ((vecarg))
or-mvecarg
option. The details and discussion can be found at Github loongson/gcc repo's issue list.