[MachinePipeliner] Fix store-store dependences (#72508) #72575

bcahoon · 2023-11-16T22:00:11Z

The pipeliner needs to mark store-store order dependences as
loop carried dependences. Otherwise, the stores may be scheduled
further apart than the MII. The order dependences implies that
the first instance of the dependent store is scheduled before the
second instance of the source store instruction.

The pipeliner needs to mark store-store order dependences as loop carried dependences. Otherwise, the stores may be scheduled further apart than the MII. The order dependences implies that the first instance of the dependent store is scheduled before the second instance of the source store instruction.

mmarjieh · 2023-11-19T10:58:40Z

llvm/lib/CodeGen/MachinePipeliner.cpp

-  // Only chain dependences between a load and store can be loop carried.
+  // Dependences between stores are loop carried to ensure that the dependent
+  // store is not scheduled after the source store on the next iteration.
+  if (Dep.isNormalMemory() && DI->mayStore() && SI->mayStore())


If two stores do not alias, Might there be a dependency of Order/Output between them?
If the answer is yes, then maybe we are too conservative.

The isNormalMemory function checks MayAliasMem or MustAliasMem. Is a different check more precise?

bool isNormalMemory() const { return getKind() == Order && (Contents.OrdKind == MayAliasMem || Contents.OrdKind == MustAliasMem); }

Seems enough for me, thanks

ytmukai · 2023-11-22T05:44:27Z

I don't fully understand, but it appears that adding edges for loop carried dependencies must be done in addLoopCarriedDependences. Then, isLoopCarriedDep is used to determine if the added order dependence is loop carried.
For example, the following code has no dependency edges between stores, so fixing isLoopCarriedDep is not enough to solve the problem: https://godbolt.org/z/xMdYT9Tfj

  ; The second store must be executed before the first store in the next iteration.
  store double 1.000000e+00, ptr %10, align 8 ; a[i] = 1
  store double %x1, ptr %11, align 8          ; a[i+1] = i

addLoopCarriedDependences does not add store-store dependency edges:

SU(5):   STD %16:g8rc, 8, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.10)
  # preds left       : 1
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 0
  Height             : 0
  Predecessors:
    SU(1): Data Latency=0 Reg=%2
SU(6):   DFSTOREf64 %14:vsfrc, 16, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.11)
  # preds left       : 2
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 9
  Height             : 0
  Predecessors:
    SU(4): Data Latency=7 Reg=%14
    SU(1): Data Latency=0 Reg=%2

The second store is scheduled illegally after the first store in the next iteration:

Inst (6)   DFSTOREf64 %14:vsfrc, 16, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.11)

	es:        9 ls: 7fffffff me: 7fffffff ms: 80000000
Trying to insert node between 9 and 11 II: 3
	insert at cycle 9   DFSTOREf64 %14:vsfrc, 16, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.11)

Inst (5)   STD %16:g8rc, 8, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.10)

	es:        0 ls: 7fffffff me: 7fffffff ms: 80000000
Trying to insert node between 0 and 2 II: 3
	failed to insert at cycle 0   STD %16:g8rc, 8, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.10)
	insert at cycle 1   STD %16:g8rc, 8, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.10)

bcahoon · 2023-11-23T02:29:52Z

Hi @ytmukai, thanks for providing an example. The case you provide is an interesting one, but an additional case to consider. The original case already has an order dependence, so we need to make sure that the second store isn't scheduled too far way from the first (such that the overlap when the final pipelined scheduled is formed). I also don't fully understand why the order dependence is needed since I would have thought that the first store is dead. I just assumed it's a valid case and it made sense to make sure that the stores aren't scheduled more than MII cycles apart.

ytmukai · 2023-11-24T08:24:59Z

The original case already has an order dependence, so we need to make sure that the second store isn't scheduled too far way from the first.

I believe that a dependence recognized by ScheduleDAGInstrs::buildSchedGraph() is loop independent, and it tells nothing about loop carried dependence.

There should be cases where there are loop independent dependencies but no loop carried ones. In those cases, this modification would be excessive.

Here's an example of each case.

only loop independent dependence (between stores)

for (int i=0; i<n; i++) {
  /* S0 */ a[i] = 1;
  /* S1 */ int tmp = a[x];  // To prevent optimization
  /* S2 */ a[i] = tmp;
}

There is a dependence from S0 to S2 but not from S2 to S0. Therefore, S2 can be scheduled after S0 at the next iteration. This modification may prevent it.

only loop carried dependence (between stores)

for (int i=0; i<n; i++) {
  /* S0 */ a[i] = 1;
  /* S1 */ int tmp = a[x];  // To prevent optimization
  /* S2 */ a[i+1] = tmp;
}

There is a dependence from S2 to S0 but not from S0 to S2. Therefore, S2 must be scheduled before S0 at the next iteration. The dependence is not recognized by SwingSchedulerDAG::addLoopCarriedDependences() currently.

both loop independent and loop carried dependence (between stores)

for (int i=0; i<n; i++) {
  // a and idx0/idx1 are disjoint
  /* S0 */ a[idx0[i]] = 1;
  /* S1 */ a[idx1[i]] = 2;
}

There is a loop independent dependence from S0 to S1 and a loop carried dependence from S1 to S0. (The loop carried dependence from S0 to S1 need not be considered since there is an loop independent one.) The former is recognized by ScheduleDAGInstrs::buildSchedGraph(). This modification would consider that when there is such a dependence, there is a loop carried dependence in the opposite direction. In this case, that would seem to be correct.

bcahoon · 2023-11-27T01:48:05Z

The first case is most similar to the one in the bug report (though, without the load), #72508

for (int i=0; i<n; i++) {
  /* S0 */ a[i] = 1;
  /* S1 */ int tmp = a[x];  // To prevent optimization
  /* S2 */ a[i] = tmp;
}

Perhaps the bug report is missing some details about why the second store cannot be scheduled in a later stage.

Edit: a comment left #72508 shows that the ticket is for the third case - a loop independent dependence that needs a loop carried dependence to be added.

Previously we handled the case when there is a store-load loop independence dependence. This change handles all memory-memory dependences. Unless it can be proven otherwise, the function assumes that the dependence is loop carried.

bcahoon · 2023-11-28T03:49:42Z

I updated the code to treat stores like loads in isLoopCarriedDep. This handles the case when there is a loop independent dependence between two memory operations. The function conservatively assumes there is a loop carried dependence unless it can be proven otherwise.
The case of loop carried dependence without an existing loop independent dependence will be handled in a subsequent patch, in a different place.

ytmukai · 2023-12-01T15:16:48Z

Thanks for the updates. Unfortunately, getIncrementValue() is not implemented for PowerPC, so isLoopCarriedDep() always returns true. The first case would therefore increase scheduling constraints unnecessarily. But it needs to be resolved by implementing getIncrementValue(), and I think this modification is correct.

The case of loop carried dependence without an existing loop independent dependence will be handled in a subsequent patch, in a different place.

Do you have any thoughts on how it should be modified? I think the following two modifications are needed.

Fix addLoopCarriedDependences() to verify dependencies for all pairs of memory accesses where at least one is a store.
Fix the graph so it does not need to be a DAG and represent upward dependencies in a straightforward manner.

bcahoon · 2023-12-04T21:15:26Z

Do you have any thoughts on how it should be modified? I think the following two modifications are needed.

Fix addLoopCarriedDependences() to verify dependencies for all pairs of memory accesses where at least one is a store.

Fix the graph so it does not need to be a DAG and represent upward dependencies in a straightforward manner.

That's a good question. I think the near term solution is to change addLoopCarriedDependences as you describe to fix the correctness issue. Longer term is to create a graph with the dependences. I think that type of change will take some non-trivial effort.

komalon1 · 2023-12-11T10:41:10Z

This LGTM and fixes the original issue I had in #72508
If it's ok for you, we can push it

bcahoon · 2023-12-11T19:10:07Z

Hi @ytmukai, are you ok with merging this. Your previous comment raises a concern about getIncrementValue(). I think the two changes are independent - this change is for a correctness issue and the implementation of getIncrementValue is to improve performance. Thoughts?

ytmukai · 2023-12-12T01:37:43Z

@bcahoon Sorry for the late reply. I agree with your descriptions.

ytmukai · 2024-01-25T01:00:14Z

Hi @bcahoon, my colleague @kasuga-fj has posted a proposal about this issue below. Could you please give us your opinion?
https://discourse.llvm.org/t/machinepipeliner-replace-swingschedulerdag-with-directed-graph-that-allows-cycles/76465

bcahoon requested a review from komalon1 November 16, 2023 22:46

bcahoon mentioned this pull request Nov 16, 2023

[MachinePipeliner] Store-After-Store order dependency is not considered #72508

Open

bcahoon requested review from dpenry and ytmukai November 17, 2023 01:56

mmarjieh reviewed Nov 19, 2023

View reviewed changes

ytmukai approved these changes Dec 12, 2023

View reviewed changes

bcahoon merged commit a19c7c4 into llvm:main Dec 12, 2023
3 checks passed

bcahoon deleted the brcahoon/pipeliner-72508 branch December 12, 2023 03:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MachinePipeliner] Fix store-store dependences (#72508) #72575

[MachinePipeliner] Fix store-store dependences (#72508) #72575

bcahoon commented Nov 16, 2023

mmarjieh Nov 19, 2023

bcahoon Nov 20, 2023

mmarjieh Nov 21, 2023

ytmukai commented Nov 22, 2023

bcahoon commented Nov 23, 2023

ytmukai commented Nov 24, 2023

bcahoon commented Nov 27, 2023 •

edited

bcahoon commented Nov 28, 2023

ytmukai commented Dec 1, 2023

bcahoon commented Dec 4, 2023

komalon1 commented Dec 11, 2023 •

edited

bcahoon commented Dec 11, 2023

ytmukai commented Dec 12, 2023

ytmukai commented Jan 25, 2024

[MachinePipeliner] Fix store-store dependences (#72508) #72575

[MachinePipeliner] Fix store-store dependences (#72508) #72575

Conversation

bcahoon commented Nov 16, 2023

mmarjieh Nov 19, 2023

Choose a reason for hiding this comment

bcahoon Nov 20, 2023

Choose a reason for hiding this comment

mmarjieh Nov 21, 2023

Choose a reason for hiding this comment

ytmukai commented Nov 22, 2023

bcahoon commented Nov 23, 2023

ytmukai commented Nov 24, 2023

bcahoon commented Nov 27, 2023 • edited

bcahoon commented Nov 28, 2023

ytmukai commented Dec 1, 2023

bcahoon commented Dec 4, 2023

komalon1 commented Dec 11, 2023 • edited

bcahoon commented Dec 11, 2023

ytmukai commented Dec 12, 2023

ytmukai commented Jan 25, 2024

bcahoon commented Nov 27, 2023 •

edited

komalon1 commented Dec 11, 2023 •

edited