Skip to content

Commit

Permalink
[AArch64] Set the latency of Cortex-A55 stores to 1
Browse files Browse the repository at this point in the history
This sets the latency of stores to 1 in the Cortex-A55 scheduling model,
to better match the values given in the software optimization guide.

The latency of a store in normal llvm scheduling does not appear to have
a lot of uses. If the store has no outputs then the latency is somewhat
meaningless (and pre/post increment update operands use the WriteAdr
write for those operands instead). The one place it does alter things is
the latency between a store and the end of the scheduling region, which
can in turn have an effect on the critical path length. As a result a
latency of 1 is more correct and offers ever-so-slightly better
scheduling of instructions near the end of the block.

They are marked as RetireOOO to keep the llvm-mca from introducing
stalls where non would exist.

Differential Revision: https://reviews.llvm.org/D105541
  • Loading branch information
davemgreen committed Jul 12, 2021
1 parent 0c6fad2 commit f73334c
Show file tree
Hide file tree
Showing 7 changed files with 189 additions and 183 deletions.
8 changes: 5 additions & 3 deletions llvm/lib/Target/AArch64/AArch64SchedA55.td
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,11 @@ def CortexA55WriteVLD8 : SchedWriteRes<[CortexA55UnitLd]> { let Latency = 11;
def : WriteRes<WriteAdr, []> { let Latency = 0; }

// Store
def : WriteRes<WriteST, [CortexA55UnitSt]> { let Latency = 4; }
def : WriteRes<WriteSTP, [CortexA55UnitSt]> { let Latency = 4; }
def : WriteRes<WriteSTIdx, [CortexA55UnitSt]> { let Latency = 4; }
let RetireOOO = 1 in {
def : WriteRes<WriteST, [CortexA55UnitSt]> { let Latency = 1; }
def : WriteRes<WriteSTP, [CortexA55UnitSt]> { let Latency = 1; }
def : WriteRes<WriteSTIdx, [CortexA55UnitSt]> { let Latency = 1; }
}
def : WriteRes<WriteSTX, [CortexA55UnitSt]> { let Latency = 4; }

// Vector Store - Similar to vector loads, can take 1-3 cycles to issue.
Expand Down
23 changes: 12 additions & 11 deletions llvm/test/tools/llvm-mca/AArch64/Cortex/A55-all-stats.s
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ str w0, [x21, x18, lsl #2]

# CHECK: Iterations: 2
# CHECK-NEXT: Instructions: 12
# CHECK-NEXT: Total Cycles: 23
# CHECK-NEXT: Total Cycles: 17
# CHECK-NEXT: Total uOps: 14

# CHECK: Dispatch Width: 2
# CHECK-NEXT: uOps Per Cycle: 0.61
# CHECK-NEXT: IPC: 0.52
# CHECK-NEXT: uOps Per Cycle: 0.82
# CHECK-NEXT: IPC: 0.71
# CHECK-NEXT: Block RThroughput: 3.5

# CHECK: Instruction Info:
Expand All @@ -32,27 +32,28 @@ str w0, [x21, x18, lsl #2]
# CHECK-NEXT: 1 4 1.00 madd w0, w5, w4, w0
# CHECK-NEXT: 1 3 0.50 add x3, x3, x13
# CHECK-NEXT: 1 3 0.50 subs x1, x1, #1
# CHECK-NEXT: 1 4 1.00 * str w0, [x21, x18, lsl #2]
# CHECK-NEXT: 1 1 1.00 * str w0, [x21, x18, lsl #2]

# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 8 (34.8%)
# CHECK-NEXT: RAT - Register unavailable: 8 (47.1%)
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
# CHECK-NEXT: SCHEDQ - Scheduler full: 0
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK-NEXT: USH - Uncategorised Structural Hazard: 0

# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
# CHECK-NEXT: 0, 13 (56.5%)
# CHECK-NEXT: 1, 6 (26.1%)
# CHECK-NEXT: 2, 4 (17.4%)
# CHECK-NEXT: 0, 7 (41.2%)
# CHECK-NEXT: 1, 6 (35.3%)
# CHECK-NEXT: 2, 4 (23.5%)

# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
# CHECK-NEXT: 0, 13 (56.5%)
# CHECK-NEXT: 1, 6 (26.1%)
# CHECK-NEXT: 2, 4 (17.4%)
# CHECK-NEXT: 0, 7 (41.2%)
# CHECK-NEXT: 1, 6 (35.3%)
# CHECK-NEXT: 2, 4 (23.5%)

# CHECK: Scheduler's queue usage:
# CHECK-NEXT: No scheduler resources used.
Expand Down
53 changes: 27 additions & 26 deletions llvm/test/tools/llvm-mca/AArch64/Cortex/A55-all-views.s
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ str w0, [x21, x18, lsl #2]

# CHECK: Iterations: 2
# CHECK-NEXT: Instructions: 12
# CHECK-NEXT: Total Cycles: 23
# CHECK-NEXT: Total Cycles: 17
# CHECK-NEXT: Total uOps: 14

# CHECK: Dispatch Width: 2
# CHECK-NEXT: uOps Per Cycle: 0.61
# CHECK-NEXT: IPC: 0.52
# CHECK-NEXT: uOps Per Cycle: 0.82
# CHECK-NEXT: IPC: 0.71
# CHECK-NEXT: Block RThroughput: 3.5

# CHECK: Instruction Info:
Expand All @@ -32,27 +32,28 @@ str w0, [x21, x18, lsl #2]
# CHECK-NEXT: 1 4 1.00 madd w0, w5, w4, w0
# CHECK-NEXT: 1 3 0.50 add x3, x3, x13
# CHECK-NEXT: 1 3 0.50 subs x1, x1, #1
# CHECK-NEXT: 1 4 1.00 * str w0, [x21, x18, lsl #2]
# CHECK-NEXT: 1 1 1.00 * str w0, [x21, x18, lsl #2]

# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 8 (34.8%)
# CHECK-NEXT: RAT - Register unavailable: 8 (47.1%)
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
# CHECK-NEXT: SCHEDQ - Scheduler full: 0
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK-NEXT: USH - Uncategorised Structural Hazard: 0

# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
# CHECK-NEXT: 0, 13 (56.5%)
# CHECK-NEXT: 1, 6 (26.1%)
# CHECK-NEXT: 2, 4 (17.4%)
# CHECK-NEXT: 0, 7 (41.2%)
# CHECK-NEXT: 1, 6 (35.3%)
# CHECK-NEXT: 2, 4 (23.5%)

# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
# CHECK-NEXT: 0, 13 (56.5%)
# CHECK-NEXT: 1, 6 (26.1%)
# CHECK-NEXT: 2, 4 (17.4%)
# CHECK-NEXT: 0, 7 (41.2%)
# CHECK-NEXT: 1, 6 (35.3%)
# CHECK-NEXT: 2, 4 (23.5%)

# CHECK: Scheduler's queue usage:
# CHECK-NEXT: No scheduler resources used.
Expand Down Expand Up @@ -89,21 +90,21 @@ str w0, [x21, x18, lsl #2]
# CHECK-NEXT: - - - - - - - - - - - 1.00 str w0, [x21, x18, lsl #2]

# CHECK: Timeline view:
# CHECK-NEXT: 0123456789
# CHECK-NEXT: Index 0123456789 012

# CHECK: [0,0] DeeE . . . . . ldr w4, [x2], #4
# CHECK-NEXT: [0,1] .DeeE. . . . . ldr w5, [x3]
# CHECK-NEXT: [0,2] . DeeeE . . . . madd w0, w5, w4, w0
# CHECK-NEXT: [0,3] . DeeE . . . . add x3, x3, x13
# CHECK-NEXT: [0,4] . DeeE . . . . subs x1, x1, #1
# CHECK-NEXT: [0,5] . . DeeeE . . . str w0, [x21, x18, lsl #2]
# CHECK-NEXT: [1,0] . . .DeeE. . . ldr w4, [x2], #4
# CHECK-NEXT: [1,1] . . . DeeE . . ldr w5, [x3]
# CHECK-NEXT: [1,2] . . . DeeeE. . madd w0, w5, w4, w0
# CHECK-NEXT: [1,3] . . . .DeeE. . add x3, x3, x13
# CHECK-NEXT: [1,4] . . . .DeeE. . subs x1, x1, #1
# CHECK-NEXT: [1,5] . . . . DeeeE str w0, [x21, x18, lsl #2]
# CHECK-NEXT: 0123456
# CHECK-NEXT: Index 0123456789

# CHECK: [0,0] DeeE . . .. ldr w4, [x2], #4
# CHECK-NEXT: [0,1] .DeeE. . .. ldr w5, [x3]
# CHECK-NEXT: [0,2] . DeeeE . .. madd w0, w5, w4, w0
# CHECK-NEXT: [0,3] . DeeE . .. add x3, x3, x13
# CHECK-NEXT: [0,4] . DeeE . .. subs x1, x1, #1
# CHECK-NEXT: [0,5] . . DE . .. str w0, [x21, x18, lsl #2]
# CHECK-NEXT: [1,0] . . DeeE .. ldr w4, [x2], #4
# CHECK-NEXT: [1,1] . . DeeE .. ldr w5, [x3]
# CHECK-NEXT: [1,2] . . . DeeeE madd w0, w5, w4, w0
# CHECK-NEXT: [1,3] . . . DeeE add x3, x3, x13
# CHECK-NEXT: [1,4] . . . DeeE subs x1, x1, #1
# CHECK-NEXT: [1,5] . . . DE str w0, [x21, x18, lsl #2]

# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
Expand Down
Loading

0 comments on commit f73334c

Please sign in to comment.