Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MachinePipeliner] Fix store-store dependences (#72508) #72575

Merged
merged 2 commits into from
Dec 12, 2023

Conversation

bcahoon
Copy link
Contributor

@bcahoon bcahoon commented Nov 16, 2023

The pipeliner needs to mark store-store order dependences as
loop carried dependences. Otherwise, the stores may be scheduled
further apart than the MII. The order dependences implies that
the first instance of the dependent store is scheduled before the
second instance of the source store instruction.

The pipeliner needs to mark store-store order dependences as
loop carried dependences. Otherwise, the stores may be scheduled
further apart than the MII. The order dependences implies that
the first instance of the dependent store is scheduled before the
second instance of the source store instruction.
// Only chain dependences between a load and store can be loop carried.
// Dependences between stores are loop carried to ensure that the dependent
// store is not scheduled after the source store on the next iteration.
if (Dep.isNormalMemory() && DI->mayStore() && SI->mayStore())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two stores do not alias, Might there be a dependency of Order/Output between them?
If the answer is yes, then maybe we are too conservative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isNormalMemory function checks MayAliasMem or MustAliasMem. Is a different check more precise?

bool isNormalMemory() const {
  return getKind() == Order && (Contents.OrdKind == MayAliasMem
                                || Contents.OrdKind == MustAliasMem);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems enough for me, thanks

@ytmukai
Copy link
Contributor

ytmukai commented Nov 22, 2023

I don't fully understand, but it appears that adding edges for loop carried dependencies must be done in addLoopCarriedDependences. Then, isLoopCarriedDep is used to determine if the added order dependence is loop carried.
For example, the following code has no dependency edges between stores, so fixing isLoopCarriedDep is not enough to solve the problem: https://godbolt.org/z/xMdYT9Tfj

  ; The second store must be executed before the first store in the next iteration.
  store double 1.000000e+00, ptr %10, align 8 ; a[i] = 1
  store double %x1, ptr %11, align 8          ; a[i+1] = i

addLoopCarriedDependences does not add store-store dependency edges:

SU(5):   STD %16:g8rc, 8, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.10)
  # preds left       : 1
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 0
  Height             : 0
  Predecessors:
    SU(1): Data Latency=0 Reg=%2
SU(6):   DFSTOREf64 %14:vsfrc, 16, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.11)
  # preds left       : 2
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 9
  Height             : 0
  Predecessors:
    SU(4): Data Latency=7 Reg=%14
    SU(1): Data Latency=0 Reg=%2

The second store is scheduled illegally after the first store in the next iteration:

Inst (6)   DFSTOREf64 %14:vsfrc, 16, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.11)

	es:        9 ls: 7fffffff me: 7fffffff ms: 80000000
Trying to insert node between 9 and 11 II: 3
	insert at cycle 9   DFSTOREf64 %14:vsfrc, 16, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.11)

Inst (5)   STD %16:g8rc, 8, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.10)

	es:        0 ls: 7fffffff me: 7fffffff ms: 80000000
Trying to insert node between 0 and 2 II: 3
	failed to insert at cycle 0   STD %16:g8rc, 8, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.10)
	insert at cycle 1   STD %16:g8rc, 8, %2:g8rc_and_g8rc_nox0 :: (store (s64) into %ir.10)

@bcahoon
Copy link
Contributor Author

bcahoon commented Nov 23, 2023

Hi @ytmukai, thanks for providing an example. The case you provide is an interesting one, but an additional case to consider. The original case already has an order dependence, so we need to make sure that the second store isn't scheduled too far way from the first (such that the overlap when the final pipelined scheduled is formed). I also don't fully understand why the order dependence is needed since I would have thought that the first store is dead. I just assumed it's a valid case and it made sense to make sure that the stores aren't scheduled more than MII cycles apart.

@ytmukai
Copy link
Contributor

ytmukai commented Nov 24, 2023

The original case already has an order dependence, so we need to make sure that the second store isn't scheduled too far way from the first.

I believe that a dependence recognized by ScheduleDAGInstrs::buildSchedGraph() is loop independent, and it tells nothing about loop carried dependence.

There should be cases where there are loop independent dependencies but no loop carried ones. In those cases, this modification would be excessive.

Here's an example of each case.

  • only loop independent dependence (between stores)
for (int i=0; i<n; i++) {
  /* S0 */ a[i] = 1;
  /* S1 */ int tmp = a[x];  // To prevent optimization
  /* S2 */ a[i] = tmp;
}

There is a dependence from S0 to S2 but not from S2 to S0. Therefore, S2 can be scheduled after S0 at the next iteration. This modification may prevent it.

  • only loop carried dependence (between stores)
for (int i=0; i<n; i++) {
  /* S0 */ a[i] = 1;
  /* S1 */ int tmp = a[x];  // To prevent optimization
  /* S2 */ a[i+1] = tmp;
}

There is a dependence from S2 to S0 but not from S0 to S2. Therefore, S2 must be scheduled before S0 at the next iteration. The dependence is not recognized by SwingSchedulerDAG::addLoopCarriedDependences() currently.

  • both loop independent and loop carried dependence (between stores)
for (int i=0; i<n; i++) {
  // a and idx0/idx1 are disjoint
  /* S0 */ a[idx0[i]] = 1;
  /* S1 */ a[idx1[i]] = 2;
}

There is a loop independent dependence from S0 to S1 and a loop carried dependence from S1 to S0. (The loop carried dependence from S0 to S1 need not be considered since there is an loop independent one.) The former is recognized by ScheduleDAGInstrs::buildSchedGraph(). This modification would consider that when there is such a dependence, there is a loop carried dependence in the opposite direction. In this case, that would seem to be correct.

@bcahoon
Copy link
Contributor Author

bcahoon commented Nov 27, 2023

The first case is most similar to the one in the bug report (though, without the load), #72508

for (int i=0; i<n; i++) {
  /* S0 */ a[i] = 1;
  /* S1 */ int tmp = a[x];  // To prevent optimization
  /* S2 */ a[i] = tmp;
}

Perhaps the bug report is missing some details about why the second store cannot be scheduled in a later stage.

Edit: a comment left #72508 shows that the ticket is for the third case - a loop independent dependence that needs a loop carried dependence to be added.

Previously we handled the case when there is a store-load loop
independence dependence. This change handles all memory-memory
dependences. Unless it can be proven otherwise, the function assumes
that the dependence is loop carried.
@bcahoon
Copy link
Contributor Author

bcahoon commented Nov 28, 2023

I updated the code to treat stores like loads in isLoopCarriedDep. This handles the case when there is a loop independent dependence between two memory operations. The function conservatively assumes there is a loop carried dependence unless it can be proven otherwise.
The case of loop carried dependence without an existing loop independent dependence will be handled in a subsequent patch, in a different place.

@ytmukai
Copy link
Contributor

ytmukai commented Dec 1, 2023

Thanks for the updates. Unfortunately, getIncrementValue() is not implemented for PowerPC, so isLoopCarriedDep() always returns true. The first case would therefore increase scheduling constraints unnecessarily. But it needs to be resolved by implementing getIncrementValue(), and I think this modification is correct.

The case of loop carried dependence without an existing loop independent dependence will be handled in a subsequent patch, in a different place.

Do you have any thoughts on how it should be modified? I think the following two modifications are needed.

  • Fix addLoopCarriedDependences() to verify dependencies for all pairs of memory accesses where at least one is a store.
  • Fix the graph so it does not need to be a DAG and represent upward dependencies in a straightforward manner.

@bcahoon
Copy link
Contributor Author

bcahoon commented Dec 4, 2023

Do you have any thoughts on how it should be modified? I think the following two modifications are needed.

  • Fix addLoopCarriedDependences() to verify dependencies for all pairs of memory accesses where at least one is a store.
  • Fix the graph so it does not need to be a DAG and represent upward dependencies in a straightforward manner.

That's a good question. I think the near term solution is to change addLoopCarriedDependences as you describe to fix the correctness issue. Longer term is to create a graph with the dependences. I think that type of change will take some non-trivial effort.

@komalon1
Copy link
Contributor

komalon1 commented Dec 11, 2023

This LGTM and fixes the original issue I had in #72508
If it's ok for you, we can push it

@bcahoon
Copy link
Contributor Author

bcahoon commented Dec 11, 2023

Hi @ytmukai, are you ok with merging this. Your previous comment raises a concern about getIncrementValue(). I think the two changes are independent - this change is for a correctness issue and the implementation of getIncrementValue is to improve performance. Thoughts?

@ytmukai
Copy link
Contributor

ytmukai commented Dec 12, 2023

@bcahoon Sorry for the late reply. I agree with your descriptions.

@bcahoon bcahoon merged commit a19c7c4 into llvm:main Dec 12, 2023
3 checks passed
@bcahoon bcahoon deleted the brcahoon/pipeliner-72508 branch December 12, 2023 03:10
@ytmukai
Copy link
Contributor

ytmukai commented Jan 25, 2024

Hi @bcahoon, my colleague @kasuga-fj has posted a proposal about this issue below. Could you please give us your opinion?
https://discourse.llvm.org/t/machinepipeliner-replace-swingschedulerdag-with-directed-graph-that-allows-cycles/76465

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants