Skip to content

Conversation

muneebkhan85
Copy link
Contributor

This patch adds continuous loop peeling to scf loop transforms in the MLIR backend. This transforms the target loop into a chain of loops, with step sizes that are powers of two and decrease exponentially across subsequent loops.

The transform is similar to loop.peel in the effect that it creates a loop with a step (that is power of 2) to divide the range evenly, with the difference that the remaining iterations are spread across similar loops with exponentially decreasing step sizes, with the last loop with step size of 2^0 = 1.

Originally authored by Litu Zhou litu.zhou@huawei.com.

Original pull-request was here #71555
Was closed because of an erroneous force push.

muneebkhan85 and others added 13 commits January 8, 2024 23:54
This patch adds continuous loop peeling to scf loop transforms
in the MLIR backend. This transforms the target loop into a
chain of loops, with step sizes that are powers of two and
decrease exponetially across subsequent loops. Originally
authored by Litu Zhou litu.zhou@huawei.com.
1. The transformation has been moved to
   mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp

2. Test in mlir/test/Dialect/SCF/loop-continuous-peel.mlir
   simplified using scf.for

3. Added -scf-for-loop-continuous-peeling pass for applying
   this transform independently on scf.for loops.

This commit should be squashed into the original.
Add case for step size 1

This commit should be squashed into the original.
Removed redundant TODO comment.

This commit should be squashed into the original.
Renamed test to conform to other similar tests.

This commit should be squashed into the original.
1. Added comment for splitLoopHelper
2. Added comment for convertSingleIterFor.
3. Some minor changes as per  review suggestion.

This commit should be squashed into the original.
Use inlining function instead of manually cloning operations from
replaced for loop.

This commit should be squashed into the original.
…SCF"

This reverts commit 1cfe41741ba0eb92e24861ba8bfd239059554540.
Clean up the rewriting of the affine op in the partial iterations.
This required changing the return type of rewritePeeledMinMaxOp
from LogicalResult to FailureOr<AffineApplyOp> to pass on the
newly created affine op.

This commit should be squashed into the original.
Use block inlining for moving operations inside a for op to the
newly created target if op when converting single iteration for
ops to if ops.

This commit should be squashed into the original.
Some minor fixes as per reviewer comments.

This commit should be squashed into the original.
Some minor reordering in the check.

This commit should be squashed into the original.
Copy link

github-actions bot commented Jan 8, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

Changes as per reviewer comments. Additional test for the
case when lower loop bound is not divisible by step.

This commit should be squashed into the original.
Copy link

github-actions bot commented Jan 9, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Simplified correction of step size in newly created partial
iterations loop by modifying step size at loop creation.

This commit should be squashed into the original.
@muneebkhan85
Copy link
Contributor Author

@matthias-springer I had to move the pull request #71555 here due to an erroneous force push.

I have addressed all the comments you had pointed out in the original review. Most importantly

  1. I have re-written the logic for the comment [MLIR][LLVM] Add Continuous Loop Peeling transform to SCF #71555 (comment)
    so that instead of rewriting ops after running rewritePeeledMinMaxOp, I instead correct the loop step in the map at the time the new loop is being created (by cloning) inside splitLoopHelper. This also means that there's no need for a change to the return type of rewritePeeledMinMaxOp as I had committed earlier. If this looks good to you, I can revert the changes to the return type of rewritePeeledMinMaxOp.

  2. I have added a new test case, so that both cases usePowerSplit = false and usePowerSplit = true are tested [MLIR][LLVM] Add Continuous Loop Peeling transform to SCF #71555 (comment)

@matthias-springer
Copy link
Member

Can you re-open the old PR and force-push the contents of this PR to the old PR? Ideally, we'd keep using the old PR, so that we don't loose the review comments.

// The new for loop for the remaining iterations has half the step size
// as continuous peeling requires the step size to diminish exponentially
// across subsequent loops.
map.map(forOp.getStep(), constStepOp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this won't work. The SSA value of forOp.getStep() could be used in different ways inside of the loop and you don't want to change that.

E.g.:

scf.for ... step %c16 {
  // This op should not be changed as part of loop peeling.
  "test.foo"(%c16) : () -> ()
}

What's the purpose of this map.map? Is it meant to canonicalize affine.min/max ops, taking into account the fact that the loop was peeled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the purpose is to canonicalize min/max ops inside the new loop body. This requires the affine.min/max ops to have the correct step given the step size has changed (halved) for the partialIteraton loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, for whatever modification you make to the loop body, you have to be sure that the loop body is still computing the same thing as before. Just blanket replacing all occurrences of the old step size (even if it's just in affine.min/max ops) with the new step size may change the semantics of the loop.

The only safe way of canonicalizing the loop body that we have at the moment is rewritePeeledMinMaxOp. This function will look for affine.min/max ops and select one of the provided options if it can prove that doing so would always be correct in the peeled or partial loop.

Copy link
Contributor Author

@muneebkhan85 muneebkhan85 Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original transform was intended for applying tiling repeatedly by continuously halving the step size until 1. This doesn't work as a generic loop transform and the right place to implement this is in Linalg/TransformOps. The patch will be moved there and submitted in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants