Skip to content

Conversation

tschwinge
Copy link
Member

Merging in several stages, towards #2802 and further.

This must of course not be rebased by GitHub merge queue, but has to become a proper Git merge. (I'll handle that, once ready.)

Robin Dapp and others added 30 commits October 9, 2023 16:21
This adds a pipeline description for a generic out-of-order core.
Latency and units are not based on any real processor but more or less
educated guesses what such a processor would look like.

In order to account for latency scaling by LMUL != 1, sched_adjust_cost
is implemented.  It will scale an instruction's latency by its LMUL
so an LMUL == 8 instruction will take 8 times the number of cycles
the same instruction with LMUL == 1 would take.
As this potentially causes very high latencies which, in turn, might
lead to scheduling anomalies and a higher number of vsetvls emitted
this feature is only enabled when specifying -madjust-lmul-cost.

Additionally, in order to easily recognize pre-RA vsetvls this patch
introduces an insn type vsetvl_pre which is used in sched_adjust_cost.

In the future we might also want a latency adjustment similar to lmul
for reductions, i.e. make the latency dependent on the type and its
number of units.

gcc/ChangeLog:

	* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
	* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
	Add generic_ooo.
	* config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement
	scheduler hook.
	(TARGET_SCHED_ADJUST_COST): Define.
	* config/riscv/riscv.md (no,yes"): Include generic-ooo.md
	* config/riscv/riscv.opt: Add -madjust-lmul-cost.
	* config/riscv/generic-ooo.md: New file.
	* config/riscv/vector.md: Add vsetvl_pre.
Like ARM SVE, RVV is vectorizing these 2 cases in the same way.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-23.c: Add RVV like ARM SVE.
	* gcc.dg/vect/slp-perm-10.c: Ditto.
RVV vectortizes this case with stride8 load_lanes.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes.
This case is vectorized by stride8 load_lanes.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes.
These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP
with -fno-vect-cost-model.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports load_lanes with stride = 8.
	* gcc.dg/vect/pr97832-3.c: Ditto.
	* gcc.dg/vect/pr97832-4.c: Ditto.
RVV vectorize it with stride5 load_lanes.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes.
Turns out we didnt need this as there is no unordered relations
managed by the oracle.

	* gimple-range-gori.cc (gori_compute::compute_operand1_range): Do
	not call get_identity_relation.
	(gori_compute::compute_operand2_range): Ditto.
	* value-relation.cc (get_identity_relation): Remove.
	* value-relation.h (get_identity_relation): Remove protyotype.
A floating point equivalence may not properly reflect both signs of
zero, so be pessimsitic and ensure both signs are included.

	PR tree-optimization/111694
	gcc/
	* gimple-range-cache.cc (ranger_cache::fill_block_cache): Adjust
	equivalence range.
	* value-relation.cc (adjust_equivalence_range): New.
	* value-relation.h (adjust_equivalence_range): New prototype.

	gcc/testsuite/
	* gcc.dg/pr111694.c: New.
gcc/analyzer/ChangeLog:
	* access-diagram.cc (boundaries::add): Explicitly state
	"boundaries::" scope for "kind" enum.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Verifier checks have recently been strengthened to check that
all counts and probabilities are initialized. The checks fired
during autoprofiledbootstrap build and this patch fixes it.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
	* auto-profile.cc (afdo_calculate_branch_prob): Fix count comparisons
	* tree-vect-loop-manip.cc (vect_do_peeling): Guard against zero count
	when scaling loop profile
For RVV, we have VLS modes enable according to TARGET_MIN_VLEN
from M1 to M8.

For example, when TARGET_MIN_VLEN = 128 bits, we enable
128/256/512/1024 bits VLS modes.

This patch fixes following FAIL:
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 "optimized: basic block" 2
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp: Add 256/512/1024
Refurbish add compare patterns: use 'r' constraint, fix identation,
and fix pattern to match 'if (a+b) { ... }' constructions.

gcc/

	* config/arc/arc.cc (arc_select_cc_mode): Match NEG code with
	the first operand.
	* config/arc/arc.md (addsi_compare): Make pattern canonical.
	(addsi_compare_2): Fix identation, constraint letters.
	(addsi_compare_3): Likewise.

gcc/testsuite/

	* gcc.target/arc/add_f-combine.c: New test.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
Here is the reference comparing dump IR between ARM SVE and RVV.

https://godbolt.org/z/zqess8Gss

We can see RVV has one more dump IR:
optimized: basic block part vectorized using 128 byte vectors
since RVV has 1024 bit vectors.

The codegen is reasonable good.

However, I saw GCN also has 1024 bit vector.
This patch may cause this case FAIL in GCN port ?

Hi, GCN folk, could you check this patch in GCN port for me ?

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/bb-slp-pr65935.c: Add vect1024 variant.
	* lib/target-supports.exp: Ditto.
The following fixes fallout of r10-7145-g1dc00a8ec9aeba which made
us cautionous about CSEing a load to an object that has padding bits.
The added check also triggers for BLKmode entities like STRING_CSTs
but by definition a BLKmode entity does not have padding bits.

	PR tree-optimization/111751
	* tree-ssa-sccvn.cc (visit_reference_op_load): Exempt
	BLKmode result from the padding bits check.
Add testcase for PR111751 which has been fixed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632474.html

	PR target/111751

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr111751.c: New test.
…ning

gcc/ada/

	* sem_attr.adb (Analyze_Attribute): Protect the frontend against
	replacing 'Size by its static value if 'Size is not known at
	compile time and we are processing pragmas Compile_Time_Warning or
	Compile_Time_Errors.
The concept of extended nodes was retired at the same time Gen_IL
was introduced, but there was a reference to that concept left over
in a comment. This patch removes that reference.

Also, the description of the field Comes_From_Check_Or_Contract was
incorrectly placed in a section for fields present in all nodes in
sinfo.ads. This patch fixes this.

gcc/ada/

	* atree.ads, nlists.ads, types.ads: Remove references to extended
	nodes. Fix typo.
	* sinfo.ads: Likewise and fix position of
	Comes_From_Check_Or_Contract description.
This patch fixes the behavior of Ada.Directories.Search when being
requested to filter out regular files or directories. One of the
configurations in which that behavior was incorrect was that when the
caller requested only the regular and special files but not the
directories, the directories would still be returned.

gcc/ada/

	* libgnat/a-direct.adb: Fix filesystem entry filtering.
This occurs when one of the types has an incomplete declaration in addition
to its full declaration in its package. In this case AI05-129 says that the
incomplete type is not part of the limited view of the package, i.e. only
the full view is. Now, in the GNAT implementation, it's the opposite in the
regular view of the package, i.e. the incomplete type is the visible one.

That's why the implementation needs to also swap the types on the visibility
chain while it is swapping the views when the clauses are either installed
or removed. This works correctly for the installation, but does not for the
removal, so this change rewrites the code doing the latter.

gcc/ada/
	PR ada/111434
	* sem_ch10.adb (Replace): New procedure to replace an entity with
	another on the homonym chain.
	(Install_Limited_With_Clause): Rename Non_Lim_View to Typ for the
	sake of consistency.  Call Replace to do the replacements and split
	the code into the regular and the special cases.  Add debuggging
	output controlled by -gnatdi.
	(Install_With_Clause): Print the Parent_With and Implicit_With flags
	in the debugging output controlled by -gnatdi.
	(Remove_Limited_With_Unit.Restore_Chain_For_Shadow (Shadow)): Rewrite
	using a direct replacement of E4 by E2.   Call Replace to do the
	replacements.  Add debuggging output controlled by -gnatdi.
This happens when the conditional expression is immediately returned, for
example in an expression function.

gcc/ada/

	* exp_aggr.adb (Is_Build_In_Place_Aggregate_Return): Return true
	if the aggregate is a dependent expression of a conditional
	expression being returned from a build-in-place function.
It is only called once.

gcc/ada/

	* sem_util.ads (Set_Scope_Is_Transient): Delete.
	* sem_util.adb (Set_Scope_Is_Transient): Likewise.
	* exp_ch7.adb (Create_Transient_Scope): Set Is_Transient directly.
The purpose of this patch is to work around false-positive warnings
emitted by GNAT SAS (also known as CodePeer). It does not change
the behavior of the modified subprogram.

gcc/ada/

	* libgnat/a-direct.adb (Start_Search_Internal): Tweak subprogram
	body.
…component

This is a small bug present on strict-alignment platforms for questionable
representation clauses.

gcc/ada/

	* gcc-interface/decl.cc (inline_status_for_subprog): Minor tweak.
	(gnat_to_gnu_field): Try harder to get a packable form of the type
	for a bitfield.
…etation

The following ups the limit in fold_view_convert_expr to handle
1024bit vectors as used by GCN and RVV.  It also robustifies
the handling in visit_reference_op_load to properly give up when
constants cannot be re-interpreted.

	PR tree-optimization/111751
	* fold-const.cc (fold_view_convert_expr): Up the buffer size
	to 128 bytes.
	* tree-ssa-sccvn.cc (visit_reference_op_load): Special case
	constants, giving up when re-interpretation to the target type
	fails.
Richard patch resolve PR111751: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7c76c876e917a1f20a788f602cc78fff7d0a2a65

which cause ICE in RISC-V regression:

FAIL: gcc.dg/torture/pr53144.c   -O2  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/pr53144.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr53144.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.dg/torture/pr53144.c   -O3 -g  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O3 -g  (test for excess errors)

VLS BOOL modes vcond_mask is needed to fix this regression ICE.

More details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111751

Tested and Committed.

	PR target/111751

gcc/ChangeLog:

	* config/riscv/autovec.md: Add VLS BOOL modes.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

ChangeLog:

	* MAINTAINERS: Add myself.
This test is testing fold_extract_last pattern so it's more reasonable use
vect_fold_extract_last instead of specifying targets.

This is the vect_fold_extract_last property:
proc check_effective_target_vect_fold_extract_last { } {
    return [expr { [check_effective_target_aarch64_sve]
		   || [istarget amdgcn*-*-*]
		   || [check_effective_target_riscv_v] }]
}

include ARM SVE/GCN/RVV.

It perfectly matches what we want and more reasonable, better maintainment.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr65947-8.c: Use vect_fold_extract_last.
Like GCN, add -fno-tree-vectorize.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/predcom-2.c: Add riscv.
This patch fixes following 2 FAILs in RVV regression since the check is not accurate.

It's inspired by Robin's previous patch:
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac11b@gmail.com/

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/no-scevccp-outer-7.c: Adjust regex pattern.
	* gcc.dg/vect/no-scevccp-vect-iv-3.c: Ditto.
tschwinge and others added 29 commits March 10, 2024 23:42
Before the r5-3834 commit for PR63362, GCC 4.8-4.9 refuses to compile
cse.cc which contains a variable with rtx_def type, because rtx_def
contains a union with poly_uint16 element.  poly_int template has
defaulted default constructor and a variadic template constructor which
could have empty parameter pack. GCC < 5 treated it as non-trivially
constructible class and deleted rtunion and rtx_def default constructors.

For the cse_insn purposes, all we need is a variable with size and alignment
of rtx_def, not necessarily rtx_def itself, which we then memset to 0 and
fill in like rtx is normally allocated from heap, so this patch for
GCC_VERSION < 5000 uses an unsigned char array of the right size/alignment.

2023-10-18  Jakub Jelinek  <jakub@redhat.com>

	PR bootstrap/111852
	* cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of
	using rtx_def type for memory_extend_buf, use unsigned char
	arrayy with size of rtx_def and its alignment.

(cherry picked from commit bc4bd69)
Reformat the upstream GCC commit f4a2ae2
"Change MODE_BITSIZE to MODE_PRECISION for MODE_VECTOR_BOOL" change to
'gcc/rust/backend/rust-tree.cc' to clang-format's liking.

	gcc/rust/
	* backend/rust-tree.cc (c_common_type_for_mode): Placate clang-format.
@tschwinge tschwinge merged commit 444d1a9 into master Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.