Skip to content

Commit

Permalink
Use subscalar mode to move struct block for parameter
Browse files Browse the repository at this point in the history
Hi,

As mentioned in the previous version patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
The suboptimal code is generated for "assigning from parameter" or
"assigning to return value".
This patch enhances the assignment from parameters like the below
cases:
/////case1.c
typedef struct SA {double a[3];long l; } A;
A ret_arg (A a) {return a;}
void st_arg (A a, A *p) {*p = a;}

////case2.c
typedef struct SA {double a[3];} A;
A ret_arg (A a) {return a;}
void st_arg (A a, A *p) {*p = a;}

For this patch, bootstrap and regtest pass on ppc64{,le}
and x86_64.
* Besides asking for help reviewing this patch, I would like to
consult comments about enhancing for "assigning to returns".

On some targets(ppc64), for below case:
////case3.c
typedef struct SA {double a[3]; long l; } A;
A ret_arg_pt (A *a) {return *a;}

The optimized GIMPLE code looks like:
  <retval> = *a_2(D);
  return <retval>;
Here, <retval>(aka. RESULT_DECL) is MEM, and "aggregate_value_p"
returns true for <retval>.

* While for below case, the generated code is still suboptimal.
////case4.c
typedef struct SA {double a[3];} A;
A ret_arg_pt (A *a) {return *a;}

The optimized GIMPLE code looks like:
  D.3951 = *a_2(D);
  return D.3951;
The "return/assign" stmts are using D.3951(VAR_DECL) instead
"<retval>(RESULT_DECL)".  The mode of D.3951/<retval> is BLK.
The RTL of D.3951 is MEM, and RTL of <retval> is PARALLEL. For
PARALLEL, aggregate_value_p returns false.

In function expand_assignment, there is code:
  if (TREE_CODE (to) == RESULT_DECL
      && (REG_P (to_rtx) || GET_CODE (to_rtx) == PARALLEL))
This code can handle "<retval>", but can not handle "D.3951".

I'm thinking of one way to handle this issue is to update the
GIMPLE sequence as: "<retval> = *a_2(D); return <retval>;"
Or, collecting VARs which are used by return stmts; and for
assignments to those VARs, using sub scalar mode for the block
move.

Thanks for any comments and suggestions!

BR,
Jeff (Jiufu)
  • Loading branch information
Jiufu Guo authored and ouuleilei-bot committed Nov 17, 2022
1 parent 2b2f2ee commit 6596ecf
Showing 1 changed file with 40 additions and 0 deletions.
40 changes: 40 additions & 0 deletions gcc/expr.cc
Original file line number Diff line number Diff line change
Expand Up @@ -6045,6 +6045,46 @@ expand_assignment (tree to, tree from, bool nontemporal)
return;
}

if (TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from)
&& TYPE_MODE (TREE_TYPE (from)) == BLKmode
&& (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
|| REG_P (DECL_INCOMING_RTL (from))))
{
rtx parm = DECL_INCOMING_RTL (from);

push_temp_slots ();
machine_mode mode;
mode = GET_CODE (parm) == PARALLEL
? GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0))
: word_mode;
int mode_size = GET_MODE_SIZE (mode).to_constant ();
int size = INTVAL (expr_size (from));

/* If/How the parameter using submode, it dependes on the size and
position of the parameter. Here using heurisitic number. */
int hurstc_num = 8;
if (size < mode_size || (size % mode_size) != 0
|| size > (mode_size * hurstc_num))
result = store_expr (from, to_rtx, 0, nontemporal, false);
else
{
rtx from_rtx
= expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL);
for (int i = 0; i < size / mode_size; i++)
{
rtx temp = gen_reg_rtx (mode);
rtx src = adjust_address (from_rtx, mode, mode_size * i);
rtx dest = adjust_address (to_rtx, mode, mode_size * i);
emit_move_insn (temp, src);
emit_move_insn (dest, temp);
}
result = to_rtx;
}
preserve_temp_slots (result);
pop_temp_slots ();
return;
}

/* Compute FROM and store the value in the rtx we got. */

push_temp_slots ();
Expand Down

0 comments on commit 6596ecf

Please sign in to comment.