-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[MLIR][LLVM] Add Continuous Loop Peeling transform to SCF #77328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch adds continuous loop peeling to scf loop transforms in the MLIR backend. This transforms the target loop into a chain of loops, with step sizes that are powers of two and decrease exponetially across subsequent loops. Originally authored by Litu Zhou litu.zhou@huawei.com.
1. The transformation has been moved to mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp 2. Test in mlir/test/Dialect/SCF/loop-continuous-peel.mlir simplified using scf.for 3. Added -scf-for-loop-continuous-peeling pass for applying this transform independently on scf.for loops. This commit should be squashed into the original.
Add case for step size 1 This commit should be squashed into the original.
Removed redundant TODO comment. This commit should be squashed into the original.
Renamed test to conform to other similar tests. This commit should be squashed into the original.
1. Added comment for splitLoopHelper 2. Added comment for convertSingleIterFor. 3. Some minor changes as per review suggestion. This commit should be squashed into the original.
Use inlining function instead of manually cloning operations from replaced for loop. This commit should be squashed into the original.
…SCF" This reverts commit 1cfe41741ba0eb92e24861ba8bfd239059554540.
Clean up the rewriting of the affine op in the partial iterations. This required changing the return type of rewritePeeledMinMaxOp from LogicalResult to FailureOr<AffineApplyOp> to pass on the newly created affine op. This commit should be squashed into the original.
Use block inlining for moving operations inside a for op to the newly created target if op when converting single iteration for ops to if ops. This commit should be squashed into the original.
Some minor fixes as per reviewer comments. This commit should be squashed into the original.
Some minor reordering in the check. This commit should be squashed into the original.
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
Changes as per reviewer comments. Additional test for the case when lower loop bound is not divisible by step. This commit should be squashed into the original.
✅ With the latest revision this PR passed the C/C++ code formatter. |
Simplified correction of step size in newly created partial iterations loop by modifying step size at loop creation. This commit should be squashed into the original.
@matthias-springer I had to move the pull request #71555 here due to an erroneous force push. I have addressed all the comments you had pointed out in the original review. Most importantly
|
Can you re-open the old PR and force-push the contents of this PR to the old PR? Ideally, we'd keep using the old PR, so that we don't loose the review comments. |
// The new for loop for the remaining iterations has half the step size | ||
// as continuous peeling requires the step size to diminish exponentially | ||
// across subsequent loops. | ||
map.map(forOp.getStep(), constStepOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this won't work. The SSA value of forOp.getStep()
could be used in different ways inside of the loop and you don't want to change that.
E.g.:
scf.for ... step %c16 {
// This op should not be changed as part of loop peeling.
"test.foo"(%c16) : () -> ()
}
What's the purpose of this map.map
? Is it meant to canonicalize affine.min/max
ops, taking into account the fact that the loop was peeled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the purpose is to canonicalize min/max ops inside the new loop body. This requires the affine.min/max ops to have the correct step given the step size has changed (halved) for the partialIteraton loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, for whatever modification you make to the loop body, you have to be sure that the loop body is still computing the same thing as before. Just blanket replacing all occurrences of the old step size (even if it's just in affine.min/max
ops) with the new step size may change the semantics of the loop.
The only safe way of canonicalizing the loop body that we have at the moment is rewritePeeledMinMaxOp
. This function will look for affine.min/max
ops and select one of the provided options if it can prove that doing so would always be correct in the peeled or partial loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original transform was intended for applying tiling repeatedly by continuously halving the step size until 1. This doesn't work as a generic loop transform and the right place to implement this is in Linalg/TransformOps. The patch will be moved there and submitted in a separate PR.
This patch adds continuous loop peeling to scf loop transforms in the MLIR backend. This transforms the target loop into a chain of loops, with step sizes that are powers of two and decrease exponentially across subsequent loops.
The transform is similar to loop.peel in the effect that it creates a loop with a step (that is power of 2) to divide the range evenly, with the difference that the remaining iterations are spread across similar loops with exponentially decreasing step sizes, with the last loop with step size of 2^0 = 1.
Originally authored by Litu Zhou litu.zhou@huawei.com.
Original pull-request was here #71555
Was closed because of an erroneous force push.