Skip to content

Commit

Permalink
[tblgen][llvm-mca] Add the ability to describe move elimination candi…
Browse files Browse the repository at this point in the history
…dates via tablegen.

This patch adds the ability to identify instructions that are "move elimination
candidates". It also allows scheduling models to describe processor register
files that allow move elimination.

A move elimination candidate is an instruction that can be eliminated at
register renaming stage.
Each subtarget can specify which instructions are move elimination candidates
with the help of tablegen class "IsOptimizableRegisterMove" (see
llvm/Target/TargetInstrPredicate.td).

For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated.
The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this:

```
def : IsOptimizableRegisterMove<[
  InstructionEquivalenceClass<[
    // GPR variants.
    MOV32rr, MOV64rr,

    // MMX variants.
    MMX_MOVQ64rr,

    // SSE variants.
    MOVAPSrr, MOVUPSrr,
    MOVAPDrr, MOVUPDrr,
    MOVDQArr, MOVDQUrr,

    // AVX variants.
    VMOVAPSrr, VMOVUPSrr,
    VMOVAPDrr, VMOVUPDrr,
    VMOVDQArr, VMOVDQUrr
  ], CheckNot<CheckSameRegOperand<0, 1>> >
]>;
```

Definitions of IsOptimizableRegisterMove from processor models of a same
Target are processed by the SubtargetEmitter to auto-generate a target-specific
override for each of the following predicate methods:

```
bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI)
const;
bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned
CPUID) const;
```

By default, those methods return false (i.e. conservatively assume that there
are no move elimination candidates).

Tablegen class RegisterFile has been extended with the following information:
 - The set of register classes that allow move elimination.
 - Maxium number of moves that can be eliminated every cycle.
 - Whether move elimination is restricted to moves from registers that are
   known to be zero.

This patch is structured in three part:

A first part (which is mostly boilerplate) adds the new
'isOptimizableRegisterMove' target hooks, and extends existing register file
descriptors in MC by introducing new fields to describe properties related to
move elimination.

A second part, uses the new tablegen constructs to describe move elimination in
the BtVer2 scheduling model.

A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove'
hook to mark instructions that are candidates for move elimination. It also
teaches class RegisterFile how to describe constraints on move elimination at
PRF granularity.

llvm-mca tests for btver2 show differences before/after this patch.

Differential Revision: https://reviews.llvm.org/D53134

llvm-svn: 344334
  • Loading branch information
Andrea Di Biagio authored and Andrea Di Biagio committed Oct 12, 2018
1 parent e02d09d commit 6eebbe0
Show file tree
Hide file tree
Showing 16 changed files with 315 additions and 195 deletions.
13 changes: 13 additions & 0 deletions llvm/include/llvm/CodeGen/TargetSubtargetInfo.h
Expand Up @@ -169,6 +169,19 @@ class TargetSubtargetInfo : public MCSubtargetInfo {
return isZeroIdiom(MI, Mask);
}

/// Returns true if MI is a candidate for move elimination.
///
/// A candidate for move elimination may be optimized out at register renaming
/// stage. Subtargets can specify the set of optimizable moves by
/// instantiating tablegen class `IsOptimizableRegisterMove` (see
/// llvm/Target/TargetInstrPredicate.td).
///
/// SubtargetEmitter is responsible for processing all the definitions of class
/// IsOptimizableRegisterMove, and auto-generate an override for this method.
virtual bool isOptimizableRegisterMove(const MachineInstr *MI) const {
return false;
}

/// True if the subtarget should run MachineScheduler after aggressive
/// coalescing.
///
Expand Down
11 changes: 11 additions & 0 deletions llvm/include/llvm/MC/MCInstrAnalysis.h
Expand Up @@ -136,6 +136,17 @@ class MCInstrAnalysis {
return isZeroIdiom(MI, Mask, CPUID);
}

/// Returns true if MI is a candidate for move elimination.
///
/// Different subtargets may apply different constraints to optimizable
/// register moves. For example, on most X86 subtargets, a candidate for move
/// elimination cannot specify the same register for both source and
/// destination.
virtual bool isOptimizableRegisterMove(const MCInst &MI,
unsigned CPUID) const {
return false;
}

/// Given a branch instruction try to get the address the branch
/// targets. Return true on success, and the address in Target.
virtual bool
Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/MC/MCSchedule.h
Expand Up @@ -142,6 +142,7 @@ struct MCSchedClassDesc {
struct MCRegisterCostEntry {
unsigned RegisterClassID;
unsigned Cost;
bool AllowMoveElimination;
};

/// A register file descriptor.
Expand All @@ -159,6 +160,12 @@ struct MCRegisterFileDesc {
uint16_t NumRegisterCostEntries;
// Index of the first cost entry in MCExtraProcessorInfo::RegisterCostTable.
uint16_t RegisterCostEntryIdx;
// A value of zero means: there is no limit in the number of moves that can be
// eliminated every cycle.
uint16_t MaxMovesEliminatedPerCycle;
// Ture if this register file only knows how to optimize register moves from
// known zero registers.
bool AllowZeroMoveEliminationOnly;
};

/// Provide extra details about the machine processor.
Expand Down
8 changes: 7 additions & 1 deletion llvm/include/llvm/Target/TargetInstrPredicate.td
Expand Up @@ -313,7 +313,7 @@ class STIPredicate<STIPredicateDecl declaration,
}

// Convenience classes and definitions used by processor scheduling models to
// describe dependency breaking instructions.
// describe dependency breaking instructions and move elimination candidates.
let UpdatesOpcodeMask = 1 in {

def IsZeroIdiomDecl : STIPredicateDecl<"isZeroIdiom">;
Expand All @@ -323,8 +323,14 @@ def IsDepBreakingDecl : STIPredicateDecl<"isDependencyBreaking">;

} // UpdatesOpcodeMask

def IsOptimizableRegisterMoveDecl
: STIPredicateDecl<"isOptimizableRegisterMove">;

class IsZeroIdiomFunction<list<DepBreakingClass> classes>
: STIPredicate<IsZeroIdiomDecl, classes>;

class IsDepBreakingFunction<list<DepBreakingClass> classes>
: STIPredicate<IsDepBreakingDecl, classes>;

class IsOptimizableRegisterMove<list<InstructionEquivalenceClass> classes>
: STIPredicate<IsOptimizableRegisterMoveDecl, classes>;
30 changes: 29 additions & 1 deletion llvm/include/llvm/Target/TargetSchedule.td
Expand Up @@ -460,6 +460,10 @@ class SchedAlias<SchedReadWrite match, SchedReadWrite alias> {
// - The number of physical registers which can be used for register renaming
// purpose.
// - The cost of a register rename.
// - The set of registers that allow move elimination.
// - The maximum number of moves that can be eliminated every cycle.
// - Whether move elimination is limited to register moves whose input
// is known to be zero.
//
// The cost of a rename is the number of physical registers allocated by the
// register alias table to map the new definition. By default, register can be
Expand Down Expand Up @@ -506,11 +510,35 @@ class SchedAlias<SchedReadWrite match, SchedReadWrite alias> {
// partial write is combined with the previous super-register definition. We
// should add support for these cases, and correctly model merge problems with
// partial register accesses.
//
// Field MaxMovesEliminatedPerCycle specifies how many moves can be eliminated
// every cycle. A default value of zero for that field means: there is no limit
// to the number of moves that can be eliminated by this register file.
//
// An instruction MI is a candidate for move elimination if a call to
// method TargetSubtargetInfo::isOptimizableRegisterMove(MI) returns true (see
// llvm/CodeGen/TargetSubtargetInfo.h, and llvm/MC/MCInstrAnalysis.h).
//
// Subtargets can instantiate tablegen class IsOptimizableRegisterMove (see
// llvm/Target/TargetInstrPredicate.td) to customize the set of move elimination
// candidates. By default, no instruction is a valid move elimination candidate.
//
// A register move MI is eliminated only if:
// - MI is a move elimination candidate.
// - The destination register is from a register class that allows move
// elimination (see field `AllowMoveElimination` below).
// - Constraints on the move kind, and the maximum number of moves that can be
// eliminated per cycle are all met.

class RegisterFile<int numPhysRegs, list<RegisterClass> Classes = [],
list<int> Costs = []> {
list<int> Costs = [], list<bit> AllowMoveElim = [],
int MaxMoveElimPerCy = 0, bit AllowZeroMoveElimOnly = 0> {
list<RegisterClass> RegClasses = Classes;
list<int> RegCosts = Costs;
list<bit> AllowMoveElimination = AllowMoveElim;
int NumPhysRegs = numPhysRegs;
int MaxMovesEliminatedPerCycle = MaxMoveElimPerCy;
bit AllowZeroMoveEliminationOnly = AllowZeroMoveElimOnly;
SchedMachineModel SchedModel = ?;
}

Expand Down
34 changes: 32 additions & 2 deletions llvm/lib/Target/X86/X86ScheduleBtVer2.td
Expand Up @@ -48,12 +48,22 @@ def JFPU1 : ProcResource<1>; // Vector/FPU Pipe1: VALU1/STC/FPM
// part of it.
// Reference: Section 21.10 "AMD Bobcat and Jaguar pipeline: Partial register
// access" - Agner Fog's "microarchitecture.pdf".
def JIntegerPRF : RegisterFile<64, [GR64, CCR]>;
def JIntegerPRF : RegisterFile<64, [GR64, CCR], [1, 1], [1, 0],
0, // Max moves that can be eliminated per cycle.
1>; // Restrict move elimination to zero regs.

// The Jaguar FP Retire Queue renames SIMD and FP uOps onto a pool of 72 SSE
// registers. Operations on 256-bit data types are cracked into two COPs.
// Reference: www.realworldtech.com/jaguar/4/
def JFpuPRF: RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>;

// The PRF in the floating point unit can eliminate a move from a MMX or SSE
// register that is know to be zero (i.e. it has been zeroed using a zero-idiom
// dependency breaking instruction, or via VZEROALL).
// Reference: Section 21.8 "AMD Bobcat and Jaguar pipeline: Dependency-breaking
// instructions" - Agner Fog's "microarchitecture.pdf"
def JFpuPRF: RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2], [1, 1, 0],
0, // Max moves that can be eliminated per cycle.
1>; // Restrict move elimination to zero regs.

// The retire control unit (RCU) can track up to 64 macro-ops in-flight. It can
// retire up to two macro-ops per cycle.
Expand Down Expand Up @@ -805,4 +815,24 @@ def : IsDepBreakingFunction<[
], ZeroIdiomPredicate>
]>;

def : IsOptimizableRegisterMove<[
InstructionEquivalenceClass<[
// GPR variants.
MOV32rr, MOV64rr,

// MMX variants.
MMX_MOVQ64rr,

// SSE variants.
MOVAPSrr, MOVUPSrr,
MOVAPDrr, MOVUPDrr,
MOVDQArr, MOVDQUrr,

// AVX variants.
VMOVAPSrr, VMOVUPSrr,
VMOVAPDrr, VMOVUPDrr,
VMOVDQArr, VMOVDQUrr
], TruePred >
]>;

} // SchedModel
24 changes: 12 additions & 12 deletions llvm/test/tools/llvm-mca/X86/BtVer2/reg-move-elimination-1.s
Expand Up @@ -32,13 +32,13 @@ vaddps %xmm1, %xmm1, %xmm2
# CHECK-NEXT: 1 3 1.00 vaddps %xmm1, %xmm1, %xmm2

# CHECK: Register File statistics:
# CHECK-NEXT: Total number of mappings created: 6
# CHECK-NEXT: Max number of mappings used: 5
# CHECK-NEXT: Total number of mappings created: 3
# CHECK-NEXT: Max number of mappings used: 3

# CHECK: * Register File #1 -- JFpuPRF:
# CHECK-NEXT: Number of physical registers: 72
# CHECK-NEXT: Total number of mappings created: 6
# CHECK-NEXT: Max number of mappings used: 5
# CHECK-NEXT: Total number of mappings created: 3
# CHECK-NEXT: Max number of mappings used: 3

# CHECK: * Register File #2 -- JIntegerPRF:
# CHECK-NEXT: Number of physical registers: 64
Expand All @@ -63,25 +63,25 @@ vaddps %xmm1, %xmm1, %xmm2

# CHECK: Resource pressure per iteration:
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
# CHECK-NEXT: - - - 1.00 1.00 1.00 1.00 - - - - - - -
# CHECK-NEXT: - - - 1.00 - 1.00 - - - - - - - -

# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
# CHECK-NEXT: - - - - - - - - - - - - - - vxorps %xmm0, %xmm0, %xmm0
# CHECK-NEXT: - - - - 1.00 - 1.00 - - - - - - - vmovaps %xmm0, %xmm1
# CHECK-NEXT: - - - - - - - - - - - - - - vmovaps %xmm0, %xmm1
# CHECK-NEXT: - - - 1.00 - 1.00 - - - - - - - - vaddps %xmm1, %xmm1, %xmm2

# CHECK: Timeline view:
# CHECK-NEXT: Index 0123456789

# CHECK: [0,0] DR . . vxorps %xmm0, %xmm0, %xmm0
# CHECK-NEXT: [0,1] DeER . . vmovaps %xmm0, %xmm1
# CHECK-NEXT: [0,1] DR . . vmovaps %xmm0, %xmm1
# CHECK-NEXT: [0,2] .DeeeER . vaddps %xmm1, %xmm1, %xmm2
# CHECK-NEXT: [1,0] .D----R . vxorps %xmm0, %xmm0, %xmm0
# CHECK-NEXT: [1,1] . DeE--R . vmovaps %xmm0, %xmm1
# CHECK-NEXT: [1,2] . D=eeeER. vaddps %xmm1, %xmm1, %xmm2
# CHECK-NEXT: [1,1] . D----R . vmovaps %xmm0, %xmm1
# CHECK-NEXT: [1,2] . DeeeER . vaddps %xmm1, %xmm1, %xmm2
# CHECK-NEXT: [2,0] . D----R. vxorps %xmm0, %xmm0, %xmm0
# CHECK-NEXT: [2,1] . DeE---R vmovaps %xmm0, %xmm1
# CHECK-NEXT: [2,1] . D----R. vmovaps %xmm0, %xmm1
# CHECK-NEXT: [2,2] . DeeeER vaddps %xmm1, %xmm1, %xmm2

# CHECK: Average Wait times (based on the timeline view):
Expand All @@ -92,5 +92,5 @@ vaddps %xmm1, %xmm1, %xmm2

# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 3 0.0 0.0 2.7 vxorps %xmm0, %xmm0, %xmm0
# CHECK-NEXT: 1. 3 1.0 1.0 1.7 vmovaps %xmm0, %xmm1
# CHECK-NEXT: 2. 3 1.3 0.0 0.0 vaddps %xmm1, %xmm1, %xmm2
# CHECK-NEXT: 1. 3 0.0 0.0 2.7 vmovaps %xmm0, %xmm1
# CHECK-NEXT: 2. 3 1.0 1.0 0.0 vaddps %xmm1, %xmm1, %xmm2
104 changes: 52 additions & 52 deletions llvm/test/tools/llvm-mca/X86/BtVer2/reg-move-elimination-2.s
Expand Up @@ -14,12 +14,12 @@ movdqu %xmm5, %xmm0

# CHECK: Iterations: 3
# CHECK-NEXT: Instructions: 27
# CHECK-NEXT: Total Cycles: 19
# CHECK-NEXT: Total Cycles: 15
# CHECK-NEXT: Total uOps: 27

# CHECK: Dispatch Width: 2
# CHECK-NEXT: uOps Per Cycle: 1.42
# CHECK-NEXT: IPC: 1.42
# CHECK-NEXT: uOps Per Cycle: 1.80
# CHECK-NEXT: IPC: 1.80
# CHECK-NEXT: Block RThroughput: 4.5

# CHECK: Instruction Info:
Expand All @@ -42,13 +42,13 @@ movdqu %xmm5, %xmm0
# CHECK-NEXT: 1 1 0.50 movdqu %xmm5, %xmm0

# CHECK: Register File statistics:
# CHECK-NEXT: Total number of mappings created: 21
# CHECK-NEXT: Max number of mappings used: 8
# CHECK-NEXT: Total number of mappings created: 0
# CHECK-NEXT: Max number of mappings used: 0

# CHECK: * Register File #1 -- JFpuPRF:
# CHECK-NEXT: Number of physical registers: 72
# CHECK-NEXT: Total number of mappings created: 21
# CHECK-NEXT: Max number of mappings used: 8
# CHECK-NEXT: Total number of mappings created: 0
# CHECK-NEXT: Max number of mappings used: 0

# CHECK: * Register File #2 -- JIntegerPRF:
# CHECK-NEXT: Number of physical registers: 64
Expand All @@ -73,51 +73,51 @@ movdqu %xmm5, %xmm0

# CHECK: Resource pressure per iteration:
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
# CHECK-NEXT: - - - 2.00 2.00 3.33 3.67 - - - - 1.33 1.67 -
# CHECK-NEXT: - - - - - - - - - - - - - -

# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
# CHECK-NEXT: - - - - - - - - - - - - - - pxor %mm0, %mm0
# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - movq %mm0, %mm1
# CHECK-NEXT: - - - - - - - - - - - - - - movq %mm0, %mm1
# CHECK-NEXT: - - - - - - - - - - - - - - xorps %xmm0, %xmm0
# CHECK-NEXT: - - - - 1.00 0.33 0.67 - - - - - - - movaps %xmm0, %xmm1
# CHECK-NEXT: - - - 1.00 - 0.33 0.67 - - - - - - - movups %xmm1, %xmm2
# CHECK-NEXT: - - - - 1.00 0.67 0.33 - - - - - - - movapd %xmm2, %xmm3
# CHECK-NEXT: - - - 1.00 - 0.33 0.67 - - - - - - - movupd %xmm3, %xmm4
# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - movdqa %xmm4, %xmm5
# CHECK-NEXT: - - - - - 0.67 0.33 - - - - 0.33 0.67 - movdqu %xmm5, %xmm0
# CHECK-NEXT: - - - - - - - - - - - - - - movaps %xmm0, %xmm1
# CHECK-NEXT: - - - - - - - - - - - - - - movups %xmm1, %xmm2
# CHECK-NEXT: - - - - - - - - - - - - - - movapd %xmm2, %xmm3
# CHECK-NEXT: - - - - - - - - - - - - - - movupd %xmm3, %xmm4
# CHECK-NEXT: - - - - - - - - - - - - - - movdqa %xmm4, %xmm5
# CHECK-NEXT: - - - - - - - - - - - - - - movdqu %xmm5, %xmm0

# CHECK: Timeline view:
# CHECK-NEXT: 012345678
# CHECK-NEXT: 01234
# CHECK-NEXT: Index 0123456789

# CHECK: [0,0] DR . . . . pxor %mm0, %mm0
# CHECK-NEXT: [0,1] DeER . . . . movq %mm0, %mm1
# CHECK-NEXT: [0,2] .D-R . . . . xorps %xmm0, %xmm0
# CHECK-NEXT: [0,3] .DeER. . . . movaps %xmm0, %xmm1
# CHECK-NEXT: [0,4] . DeER . . . movups %xmm1, %xmm2
# CHECK-NEXT: [0,5] . D=eER . . . movapd %xmm2, %xmm3
# CHECK-NEXT: [0,6] . D=eER . . . movupd %xmm3, %xmm4
# CHECK-NEXT: [0,7] . D==eER . . . movdqa %xmm4, %xmm5
# CHECK-NEXT: [0,8] . D==eER. . . movdqu %xmm5, %xmm0
# CHECK-NEXT: [1,0] . D----R. . . pxor %mm0, %mm0
# CHECK-NEXT: [1,1] . DeE--R . . movq %mm0, %mm1
# CHECK-NEXT: [1,2] . D----R . . xorps %xmm0, %xmm0
# CHECK-NEXT: [1,3] . .DeE--R . . movaps %xmm0, %xmm1
# CHECK-NEXT: [1,4] . .D=eE-R . . movups %xmm1, %xmm2
# CHECK-NEXT: [1,5] . . D=eE-R . . movapd %xmm2, %xmm3
# CHECK-NEXT: [1,6] . . D==eER . . movupd %xmm3, %xmm4
# CHECK-NEXT: [1,7] . . D==eER . . movdqa %xmm4, %xmm5
# CHECK-NEXT: [1,8] . . D===eER. . movdqu %xmm5, %xmm0
# CHECK-NEXT: [2,0] . . D----R. . pxor %mm0, %mm0
# CHECK-NEXT: [2,1] . . DeE---R . movq %mm0, %mm1
# CHECK-NEXT: [2,2] . . D----R . xorps %xmm0, %xmm0
# CHECK-NEXT: [2,3] . . DeE---R . movaps %xmm0, %xmm1
# CHECK-NEXT: [2,4] . . .DeE--R . movups %xmm1, %xmm2
# CHECK-NEXT: [2,5] . . .D=eE--R. movapd %xmm2, %xmm3
# CHECK-NEXT: [2,6] . . . D=eE-R. movupd %xmm3, %xmm4
# CHECK-NEXT: [2,7] . . . D==eE-R movdqa %xmm4, %xmm5
# CHECK-NEXT: [2,8] . . . D==eER movdqu %xmm5, %xmm0
# CHECK: [0,0] DR . . . pxor %mm0, %mm0
# CHECK-NEXT: [0,1] DR . . . movq %mm0, %mm1
# CHECK-NEXT: [0,2] .DR . . . xorps %xmm0, %xmm0
# CHECK-NEXT: [0,3] .DR . . . movaps %xmm0, %xmm1
# CHECK-NEXT: [0,4] . DR . . . movups %xmm1, %xmm2
# CHECK-NEXT: [0,5] . DR . . . movapd %xmm2, %xmm3
# CHECK-NEXT: [0,6] . DR. . . movupd %xmm3, %xmm4
# CHECK-NEXT: [0,7] . DR. . . movdqa %xmm4, %xmm5
# CHECK-NEXT: [0,8] . DR . . movdqu %xmm5, %xmm0
# CHECK-NEXT: [1,0] . DR . . pxor %mm0, %mm0
# CHECK-NEXT: [1,1] . DR . . movq %mm0, %mm1
# CHECK-NEXT: [1,2] . DR . . xorps %xmm0, %xmm0
# CHECK-NEXT: [1,3] . .DR . . movaps %xmm0, %xmm1
# CHECK-NEXT: [1,4] . .DR . . movups %xmm1, %xmm2
# CHECK-NEXT: [1,5] . . DR . . movapd %xmm2, %xmm3
# CHECK-NEXT: [1,6] . . DR . . movupd %xmm3, %xmm4
# CHECK-NEXT: [1,7] . . DR. . movdqa %xmm4, %xmm5
# CHECK-NEXT: [1,8] . . DR. . movdqu %xmm5, %xmm0
# CHECK-NEXT: [2,0] . . DR . pxor %mm0, %mm0
# CHECK-NEXT: [2,1] . . DR . movq %mm0, %mm1
# CHECK-NEXT: [2,2] . . DR . xorps %xmm0, %xmm0
# CHECK-NEXT: [2,3] . . DR . movaps %xmm0, %xmm1
# CHECK-NEXT: [2,4] . . .DR . movups %xmm1, %xmm2
# CHECK-NEXT: [2,5] . . .DR . movapd %xmm2, %xmm3
# CHECK-NEXT: [2,6] . . . DR. movupd %xmm3, %xmm4
# CHECK-NEXT: [2,7] . . . DR. movdqa %xmm4, %xmm5
# CHECK-NEXT: [2,8] . . . DR movdqu %xmm5, %xmm0

# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
Expand All @@ -126,12 +126,12 @@ movdqu %xmm5, %xmm0
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 3 0.0 0.0 2.7 pxor %mm0, %mm0
# CHECK-NEXT: 1. 3 1.0 1.0 1.7 movq %mm0, %mm1
# CHECK-NEXT: 2. 3 0.0 0.0 3.0 xorps %xmm0, %xmm0
# CHECK-NEXT: 3. 3 1.0 1.0 1.7 movaps %xmm0, %xmm1
# CHECK-NEXT: 4. 3 1.3 0.0 1.0 movups %xmm1, %xmm2
# CHECK-NEXT: 5. 3 2.0 0.0 1.0 movapd %xmm2, %xmm3
# CHECK-NEXT: 6. 3 2.3 0.0 0.3 movupd %xmm3, %xmm4
# CHECK-NEXT: 7. 3 3.0 0.0 0.3 movdqa %xmm4, %xmm5
# CHECK-NEXT: 8. 3 3.3 0.0 0.0 movdqu %xmm5, %xmm0
# CHECK-NEXT: 0. 3 0.0 0.0 0.0 pxor %mm0, %mm0
# CHECK-NEXT: 1. 3 0.0 0.0 0.0 movq %mm0, %mm1
# CHECK-NEXT: 2. 3 0.0 0.0 0.0 xorps %xmm0, %xmm0
# CHECK-NEXT: 3. 3 0.0 0.0 0.0 movaps %xmm0, %xmm1
# CHECK-NEXT: 4. 3 0.0 0.0 0.0 movups %xmm1, %xmm2
# CHECK-NEXT: 5. 3 0.0 0.0 0.0 movapd %xmm2, %xmm3
# CHECK-NEXT: 6. 3 0.0 0.0 0.0 movupd %xmm3, %xmm4
# CHECK-NEXT: 7. 3 0.0 0.0 0.0 movdqa %xmm4, %xmm5
# CHECK-NEXT: 8. 3 0.0 0.0 0.0 movdqu %xmm5, %xmm0

0 comments on commit 6eebbe0

Please sign in to comment.