Skip to content

Commit

Permalink
Fine tune STV register conversion costs for -Os.
Browse files Browse the repository at this point in the history
The eagle-eyed may have spotted that my recent testcases for DImode shifts
on x86_64 included -mno-stv in the dg-options.  This is because the
Scalar-To-Vector (STV) pass currently transforms these shifts to use
SSE vector operations, producing larger code even with -Os.  The issue
is that the compute_convert_gain currently underestimates the size of
instructions required for interunit moves, which is corrected with the
patch below.

For the simple test case:

unsigned long long shl1(unsigned long long x) { return x << 1; }

without this patch, GCC -m32 -Os -mavx2 currently generates:

shl1:   push   %ebp              // 1 byte
        mov    %esp,%ebp         // 2 bytes
        vmovq  0x8(%ebp),%xmm0   // 5 bytes
        pop    %ebp              // 1 byte
        vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes
        vmovd  %xmm0,%eax        // 4 bytes
        vpextrd $0x1,%xmm0,%edx  // 6 bytes
        ret                      // 1 byte  = 24 bytes total

with this patch, we now generate the shorter

shl1:   push   %ebp             // 1 byte
        mov    %esp,%ebp        // 2 bytes
        mov    0x8(%ebp),%eax   // 3 bytes
        mov    0xc(%ebp),%edx   // 3 bytes
        pop    %ebp             // 1 byte
        add    %eax,%eax        // 2 bytes
        adc    %edx,%edx        // 2 bytes
        ret                     // 1 byte  = 15 bytes total

Benchmarking using CSiBE, shows that this patch saves 1361 bytes
when compiling with -m32 -Os, and saves 172 bytes when compiling
with -Os.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2023-10-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/i386-features.cc (compute_convert_gain): Provide
        more accurate values (sizes) for inter-unit moves with -Os.

Thanks in advance,
Roger
  • Loading branch information
rogersayle authored and ouuleilei-bot committed Oct 23, 2023
1 parent c2d62cd commit f7edc42
Showing 1 changed file with 27 additions and 5 deletions.
32 changes: 27 additions & 5 deletions gcc/config/i386/i386-features.cc
Original file line number Diff line number Diff line change
Expand Up @@ -705,11 +705,33 @@ general_scalar_chain::compute_convert_gain ()
fprintf (dump_file, " Instruction conversion gain: %d\n", gain);

/* Cost the integer to sse and sse to integer moves. */
cost += n_sse_to_integer * ix86_cost->sse_to_integer;
/* ??? integer_to_sse but we only have that in the RA cost table.
Assume sse_to_integer/integer_to_sse are the same which they
are at the moment. */
cost += n_integer_to_sse * ix86_cost->sse_to_integer;
if (!optimize_function_for_size_p (cfun))
{
cost += n_sse_to_integer * ix86_cost->sse_to_integer;
/* ??? integer_to_sse but we only have that in the RA cost table.
Assume sse_to_integer/integer_to_sse are the same which they
are at the moment. */
cost += n_integer_to_sse * ix86_cost->sse_to_integer;
}
else if (TARGET_64BIT || smode == SImode)
{
cost += n_sse_to_integer * COSTS_N_BYTES (4);
cost += n_integer_to_sse * COSTS_N_BYTES (4);
}
else if (TARGET_SSE4_1)
{
/* vmovd (4 bytes) + vpextrd (6 bytes). */
cost += n_sse_to_integer * COSTS_N_BYTES (10);
/* vmovd (4 bytes) + vpinsrd (6 bytes). */
cost += n_integer_to_sse * COSTS_N_BYTES (10);
}
else
{
/* movd (4 bytes) + psrlq (5 bytes) + movd (4 bytes). */
cost += n_sse_to_integer * COSTS_N_BYTES (13);
/* movd (4 bytes) + movd (4 bytes) + unpckldq (4 bytes). */
cost += n_integer_to_sse * COSTS_N_BYTES (12);
}

if (dump_file)
fprintf (dump_file, " Registers conversion cost: %d\n", cost);
Expand Down

0 comments on commit f7edc42

Please sign in to comment.