Skip to content

Commit

Permalink
[MachineCombiner] Add up latencies of all instructions in new pattern.
Browse files Browse the repository at this point in the history
Summary:
When calculating the RootLatency, we add up all the latencies of the
deleted instructions. But for NewRootLatency we only add the latency of
the new root instructions, ignoring the latencies of the other
instructions inserted. This leads the combiner to underestimate the cost
of patterns which add multiple instructions. This patch fixes that by
summing up the latencies of all new instructions. For NewRootNode, the
more complex getLatency function is used.

Note that we may be slightly more precise than just summing up
all latencies. For example, consider a pattern like

    r1 = INS1 ..
    r2 = INS2 ..
    r3 = INS3 r1, r2

I think in some other places, the total latency of the pattern would be
estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider
that worth changing, I think it would be best to do in a follow-up
patch.

Reviewers: Gerolf, sebpop, spop, fhahn

Reviewed By: fhahn

Subscribers: evandro, llvm-commits

Differential Revision: https://reviews.llvm.org/D40307

llvm-svn: 319951
  • Loading branch information
fhahn committed Dec 6, 2017
1 parent 9e776fb commit 001c3dd
Showing 1 changed file with 9 additions and 2 deletions.
11 changes: 9 additions & 2 deletions llvm/lib/CodeGen/MachineCombiner.cpp
Expand Up @@ -282,9 +282,16 @@ bool MachineCombiner::improvesCriticalPathLen(
// of the original code sequence. This may allow the transform to proceed
// even if the instruction depths (data dependency cycles) become worse.

unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);
unsigned RootLatency = 0;
// Account for the latency of the inserted and deleted instructions by
// adding up their latencies. This assumes that the inserted and deleted
// instructions are dependent instruction chains, which might not hold
// in all cases.
unsigned NewRootLatency = 0;
for (unsigned i = 0; i < InsInstrs.size() - 1; i++)
NewRootLatency += TSchedModel.computeInstrLatency(InsInstrs[i]);
NewRootLatency += getLatency(Root, NewRoot, BlockTrace);

unsigned RootLatency = 0;
for (auto I : DelInstrs)
RootLatency += TSchedModel.computeInstrLatency(I);

Expand Down

0 comments on commit 001c3dd

Please sign in to comment.