Skip to content

Commit

Permalink
[X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs
Browse files Browse the repository at this point in the history
Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port.

Discovered while investigating the correct fptoui costs to fix the regressions in D101555.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
  • Loading branch information
RKSimon committed May 20, 2021
1 parent 68d5235 commit a26288e
Show file tree
Hide file tree
Showing 5 changed files with 185 additions and 184 deletions.
25 changes: 13 additions & 12 deletions llvm/lib/Target/X86/X86ScheduleAtom.td
Expand Up @@ -37,6 +37,7 @@ def AtomPort0 : ProcResource<1>; // ALU: ALU0, shift/rotate, load/store
def AtomPort1 : ProcResource<1>; // ALU: ALU1, bit processing, jump, and LEA
// SIMD/FP: SIMD ALU, FP Adder

// NOTE: This is for ops that can use EITHER port, not for ops that require BOTH ports.
def AtomPort01 : ProcResGroup<[AtomPort0, AtomPort1]>;

// Loads are 3 cycles, so ReadAfterLd registers needn't be available until 3
Expand Down Expand Up @@ -223,30 +224,30 @@ defm : X86WriteResUnsupported<WriteFMoveY>;

defm : X86WriteRes<WriteEMMS, [AtomPort01], 5, [5], 1>;

defm : AtomWriteResPair<WriteFAdd, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFAddX, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFAdd, [AtomPort1], [AtomPort0,AtomPort1], 5, 5, [1], [1,1]>;
defm : AtomWriteResPair<WriteFAddX, [AtomPort1], [AtomPort0,AtomPort1], 5, 5, [1], [1,1]>;
defm : X86WriteResPairUnsupported<WriteFAddY>;
defm : X86WriteResPairUnsupported<WriteFAddZ>;
defm : AtomWriteResPair<WriteFAdd64, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFAdd64X, [AtomPort01], [AtomPort01], 6, 7, [6], [7]>;
defm : AtomWriteResPair<WriteFAdd64, [AtomPort1], [AtomPort0,AtomPort1], 5, 5, [1], [1,1]>;
defm : AtomWriteResPair<WriteFAdd64X, [AtomPort0,AtomPort1], [AtomPort0,AtomPort1], 6, 7, [5,5], [6,6]>;
defm : X86WriteResPairUnsupported<WriteFAdd64Y>;
defm : X86WriteResPairUnsupported<WriteFAdd64Z>;
defm : AtomWriteResPair<WriteFCmp, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFCmpX, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFCmp, [AtomPort1], [AtomPort0,AtomPort1], 5, 5, [1], [1,1]>;
defm : AtomWriteResPair<WriteFCmpX, [AtomPort0,AtomPort1], [AtomPort0,AtomPort1], 6, 7, [5,5], [6,6]>;
defm : X86WriteResPairUnsupported<WriteFCmpY>;
defm : X86WriteResPairUnsupported<WriteFCmpZ>;
defm : AtomWriteResPair<WriteFCmp64, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFCmp64X, [AtomPort01], [AtomPort01], 6, 7, [6], [7]>;
defm : AtomWriteResPair<WriteFCmp64, [AtomPort1], [AtomPort0,AtomPort1], 5, 5, [1], [1,1]>;
defm : AtomWriteResPair<WriteFCmp64X, [AtomPort0,AtomPort1], [AtomPort0,AtomPort1], 6, 7, [5,5], [6,6]>;
defm : X86WriteResPairUnsupported<WriteFCmp64Y>;
defm : X86WriteResPairUnsupported<WriteFCmp64Z>;
defm : AtomWriteResPair<WriteFCom, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFComX, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFMul, [AtomPort0], [AtomPort0], 4, 4, [4], [4]>;
defm : AtomWriteResPair<WriteFMulX, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFMul, [AtomPort0], [AtomPort0], 4, 4, [2], [2]>;
defm : AtomWriteResPair<WriteFMulX, [AtomPort0], [AtomPort0], 5, 5, [2], [2]>;
defm : X86WriteResPairUnsupported<WriteFMulY>;
defm : X86WriteResPairUnsupported<WriteFMulZ>;
defm : AtomWriteResPair<WriteFMul64, [AtomPort0], [AtomPort0], 5, 5, [5], [5]>;
defm : AtomWriteResPair<WriteFMul64X, [AtomPort01], [AtomPort01], 9, 10, [9], [10]>;
defm : AtomWriteResPair<WriteFMul64, [AtomPort0], [AtomPort0], 5, 5, [2], [2]>;
defm : AtomWriteResPair<WriteFMul64X, [AtomPort0,AtomPort1], [AtomPort0,AtomPort1], 9, 10, [9,9], [10,10]>;
defm : X86WriteResPairUnsupported<WriteFMul64Y>;
defm : X86WriteResPairUnsupported<WriteFMul64Z>;
defm : AtomWriteResPair<WriteFRcp, [AtomPort0], [AtomPort0], 4, 4, [4], [4]>;
Expand Down
98 changes: 49 additions & 49 deletions llvm/test/tools/llvm-mca/X86/Atom/resources-sse1.s
Expand Up @@ -194,18 +194,18 @@ xorps (%rax), %xmm2
# CHECK-NEXT: [6]: HasSideEffects (U)

# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
# CHECK-NEXT: 1 5 5.00 addps %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * addps (%rax), %xmm2
# CHECK-NEXT: 1 5 5.00 addss %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * addss (%rax), %xmm2
# CHECK-NEXT: 1 5 1.00 addps %xmm0, %xmm2
# CHECK-NEXT: 1 5 1.00 * addps (%rax), %xmm2
# CHECK-NEXT: 1 5 1.00 addss %xmm0, %xmm2
# CHECK-NEXT: 1 5 1.00 * addss (%rax), %xmm2
# CHECK-NEXT: 1 1 0.50 andnps %xmm0, %xmm2
# CHECK-NEXT: 1 1 1.00 * andnps (%rax), %xmm2
# CHECK-NEXT: 1 1 0.50 andps %xmm0, %xmm2
# CHECK-NEXT: 1 1 1.00 * andps (%rax), %xmm2
# CHECK-NEXT: 1 5 5.00 cmpeqps %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * cmpeqps (%rax), %xmm2
# CHECK-NEXT: 1 5 5.00 cmpeqss %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * cmpeqss (%rax), %xmm2
# CHECK-NEXT: 1 6 5.00 cmpeqps %xmm0, %xmm2
# CHECK-NEXT: 1 7 6.00 * cmpeqps (%rax), %xmm2
# CHECK-NEXT: 1 5 1.00 cmpeqss %xmm0, %xmm2
# CHECK-NEXT: 1 5 1.00 * cmpeqss (%rax), %xmm2
# CHECK-NEXT: 1 9 4.50 comiss %xmm0, %xmm1
# CHECK-NEXT: 1 10 5.00 * comiss (%rax), %xmm1
# CHECK-NEXT: 1 5 5.00 cvtpi2ps %mm0, %xmm2
Expand All @@ -232,14 +232,14 @@ xorps (%rax), %xmm2
# CHECK-NEXT: 1 34 17.00 * divss (%rax), %xmm2
# CHECK-NEXT: 1 5 2.50 * * U ldmxcsr (%rax)
# CHECK-NEXT: 1 1 1.00 * * U maskmovq %mm0, %mm1
# CHECK-NEXT: 1 5 5.00 maxps %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * maxps (%rax), %xmm2
# CHECK-NEXT: 1 5 5.00 maxss %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * maxss (%rax), %xmm2
# CHECK-NEXT: 1 5 5.00 minps %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * minps (%rax), %xmm2
# CHECK-NEXT: 1 5 5.00 minss %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * minss (%rax), %xmm2
# CHECK-NEXT: 1 6 5.00 maxps %xmm0, %xmm2
# CHECK-NEXT: 1 7 6.00 * maxps (%rax), %xmm2
# CHECK-NEXT: 1 5 1.00 maxss %xmm0, %xmm2
# CHECK-NEXT: 1 5 1.00 * maxss (%rax), %xmm2
# CHECK-NEXT: 1 6 5.00 minps %xmm0, %xmm2
# CHECK-NEXT: 1 7 6.00 * minps (%rax), %xmm2
# CHECK-NEXT: 1 5 1.00 minss %xmm0, %xmm2
# CHECK-NEXT: 1 5 1.00 * minss (%rax), %xmm2
# CHECK-NEXT: 1 1 0.50 movaps %xmm0, %xmm2
# CHECK-NEXT: 1 1 1.00 * movaps %xmm0, (%rax)
# CHECK-NEXT: 1 1 1.00 * movaps (%rax), %xmm2
Expand All @@ -258,10 +258,10 @@ xorps (%rax), %xmm2
# CHECK-NEXT: 1 1 0.50 movups %xmm0, %xmm2
# CHECK-NEXT: 1 2 1.00 * movups %xmm0, (%rax)
# CHECK-NEXT: 1 3 1.50 * movups (%rax), %xmm2
# CHECK-NEXT: 1 5 5.00 mulps %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * mulps (%rax), %xmm2
# CHECK-NEXT: 1 4 4.00 mulss %xmm0, %xmm2
# CHECK-NEXT: 1 4 4.00 * mulss (%rax), %xmm2
# CHECK-NEXT: 1 5 2.00 mulps %xmm0, %xmm2
# CHECK-NEXT: 1 5 2.00 * mulps (%rax), %xmm2
# CHECK-NEXT: 1 4 2.00 mulss %xmm0, %xmm2
# CHECK-NEXT: 1 4 2.00 * mulss (%rax), %xmm2
# CHECK-NEXT: 1 1 0.50 orps %xmm0, %xmm2
# CHECK-NEXT: 1 1 1.00 * orps (%rax), %xmm2
# CHECK-NEXT: 1 1 0.50 pavgb %mm0, %mm2
Expand Down Expand Up @@ -306,10 +306,10 @@ xorps (%rax), %xmm2
# CHECK-NEXT: 1 34 17.00 sqrtss %xmm0, %xmm2
# CHECK-NEXT: 1 34 17.00 * sqrtss (%rax), %xmm2
# CHECK-NEXT: 1 15 7.50 * U stmxcsr (%rax)
# CHECK-NEXT: 1 5 5.00 subps %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * subps (%rax), %xmm2
# CHECK-NEXT: 1 5 5.00 subss %xmm0, %xmm2
# CHECK-NEXT: 1 5 5.00 * subss (%rax), %xmm2
# CHECK-NEXT: 1 5 1.00 subps %xmm0, %xmm2
# CHECK-NEXT: 1 5 1.00 * subps (%rax), %xmm2
# CHECK-NEXT: 1 5 1.00 subss %xmm0, %xmm2
# CHECK-NEXT: 1 5 1.00 * subss (%rax), %xmm2
# CHECK-NEXT: 1 9 4.50 ucomiss %xmm0, %xmm1
# CHECK-NEXT: 1 10 5.00 * ucomiss (%rax), %xmm1
# CHECK-NEXT: 1 1 1.00 unpckhps %xmm0, %xmm2
Expand All @@ -325,22 +325,22 @@ xorps (%rax), %xmm2

# CHECK: Resource pressure per iteration:
# CHECK-NEXT: [0] [1]
# CHECK-NEXT: 508.00 346.00
# CHECK-NEXT: 438.00 393.00

# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0] [1] Instructions:
# CHECK-NEXT: 5.00 - addps %xmm0, %xmm2
# CHECK-NEXT: 5.00 - addps (%rax), %xmm2
# CHECK-NEXT: 5.00 - addss %xmm0, %xmm2
# CHECK-NEXT: 5.00 - addss (%rax), %xmm2
# CHECK-NEXT: - 1.00 addps %xmm0, %xmm2
# CHECK-NEXT: 1.00 1.00 addps (%rax), %xmm2
# CHECK-NEXT: - 1.00 addss %xmm0, %xmm2
# CHECK-NEXT: 1.00 1.00 addss (%rax), %xmm2
# CHECK-NEXT: 0.50 0.50 andnps %xmm0, %xmm2
# CHECK-NEXT: 1.00 - andnps (%rax), %xmm2
# CHECK-NEXT: 0.50 0.50 andps %xmm0, %xmm2
# CHECK-NEXT: 1.00 - andps (%rax), %xmm2
# CHECK-NEXT: 5.00 - cmpeqps %xmm0, %xmm2
# CHECK-NEXT: 5.00 - cmpeqps (%rax), %xmm2
# CHECK-NEXT: 5.00 - cmpeqss %xmm0, %xmm2
# CHECK-NEXT: 5.00 - cmpeqss (%rax), %xmm2
# CHECK-NEXT: 5.00 5.00 cmpeqps %xmm0, %xmm2
# CHECK-NEXT: 6.00 6.00 cmpeqps (%rax), %xmm2
# CHECK-NEXT: - 1.00 cmpeqss %xmm0, %xmm2
# CHECK-NEXT: 1.00 1.00 cmpeqss (%rax), %xmm2
# CHECK-NEXT: 4.50 4.50 comiss %xmm0, %xmm1
# CHECK-NEXT: 5.00 5.00 comiss (%rax), %xmm1
# CHECK-NEXT: - 5.00 cvtpi2ps %mm0, %xmm2
Expand All @@ -367,14 +367,14 @@ xorps (%rax), %xmm2
# CHECK-NEXT: 17.00 17.00 divss (%rax), %xmm2
# CHECK-NEXT: 2.50 2.50 ldmxcsr (%rax)
# CHECK-NEXT: 1.00 - maskmovq %mm0, %mm1
# CHECK-NEXT: 5.00 - maxps %xmm0, %xmm2
# CHECK-NEXT: 5.00 - maxps (%rax), %xmm2
# CHECK-NEXT: 5.00 - maxss %xmm0, %xmm2
# CHECK-NEXT: 5.00 - maxss (%rax), %xmm2
# CHECK-NEXT: 5.00 - minps %xmm0, %xmm2
# CHECK-NEXT: 5.00 - minps (%rax), %xmm2
# CHECK-NEXT: 5.00 - minss %xmm0, %xmm2
# CHECK-NEXT: 5.00 - minss (%rax), %xmm2
# CHECK-NEXT: 5.00 5.00 maxps %xmm0, %xmm2
# CHECK-NEXT: 6.00 6.00 maxps (%rax), %xmm2
# CHECK-NEXT: - 1.00 maxss %xmm0, %xmm2
# CHECK-NEXT: 1.00 1.00 maxss (%rax), %xmm2
# CHECK-NEXT: 5.00 5.00 minps %xmm0, %xmm2
# CHECK-NEXT: 6.00 6.00 minps (%rax), %xmm2
# CHECK-NEXT: - 1.00 minss %xmm0, %xmm2
# CHECK-NEXT: 1.00 1.00 minss (%rax), %xmm2
# CHECK-NEXT: 0.50 0.50 movaps %xmm0, %xmm2
# CHECK-NEXT: 1.00 - movaps %xmm0, (%rax)
# CHECK-NEXT: 1.00 - movaps (%rax), %xmm2
Expand All @@ -393,10 +393,10 @@ xorps (%rax), %xmm2
# CHECK-NEXT: 0.50 0.50 movups %xmm0, %xmm2
# CHECK-NEXT: 1.00 1.00 movups %xmm0, (%rax)
# CHECK-NEXT: 1.50 1.50 movups (%rax), %xmm2
# CHECK-NEXT: 5.00 - mulps %xmm0, %xmm2
# CHECK-NEXT: 5.00 - mulps (%rax), %xmm2
# CHECK-NEXT: 4.00 - mulss %xmm0, %xmm2
# CHECK-NEXT: 4.00 - mulss (%rax), %xmm2
# CHECK-NEXT: 2.00 - mulps %xmm0, %xmm2
# CHECK-NEXT: 2.00 - mulps (%rax), %xmm2
# CHECK-NEXT: 2.00 - mulss %xmm0, %xmm2
# CHECK-NEXT: 2.00 - mulss (%rax), %xmm2
# CHECK-NEXT: 0.50 0.50 orps %xmm0, %xmm2
# CHECK-NEXT: 1.00 - orps (%rax), %xmm2
# CHECK-NEXT: 0.50 0.50 pavgb %mm0, %mm2
Expand Down Expand Up @@ -441,10 +441,10 @@ xorps (%rax), %xmm2
# CHECK-NEXT: 17.00 17.00 sqrtss %xmm0, %xmm2
# CHECK-NEXT: 17.00 17.00 sqrtss (%rax), %xmm2
# CHECK-NEXT: 7.50 7.50 stmxcsr (%rax)
# CHECK-NEXT: 5.00 - subps %xmm0, %xmm2
# CHECK-NEXT: 5.00 - subps (%rax), %xmm2
# CHECK-NEXT: 5.00 - subss %xmm0, %xmm2
# CHECK-NEXT: 5.00 - subss (%rax), %xmm2
# CHECK-NEXT: - 1.00 subps %xmm0, %xmm2
# CHECK-NEXT: 1.00 1.00 subps (%rax), %xmm2
# CHECK-NEXT: - 1.00 subss %xmm0, %xmm2
# CHECK-NEXT: 1.00 1.00 subss (%rax), %xmm2
# CHECK-NEXT: 4.50 4.50 ucomiss %xmm0, %xmm1
# CHECK-NEXT: 5.00 5.00 ucomiss (%rax), %xmm1
# CHECK-NEXT: 1.00 - unpckhps %xmm0, %xmm2
Expand Down

0 comments on commit a26288e

Please sign in to comment.