New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NNC] Adding API to distribute loops #53865
Conversation
💊 CI failures summary and remediationsAs of commit fd1e571 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Codecov Report
@@ Coverage Diff @@
## master #53865 +/- ##
==========================================
- Coverage 77.46% 77.46% -0.01%
==========================================
Files 1887 1887
Lines 184578 184610 +32
==========================================
+ Hits 142987 143007 +20
- Misses 41591 41603 +12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important case that's not covered in tests and documentation (comments) is a multi-level loop. If I understand the implementation correctly, it currently always distributes the top level loop. I think in some cases we might want to only apply distribution to inner loops, e.g.
*** Before ***
for (int i = ...) {
for (int j = ...) {
a[i,j] = i + j; # store_a
b[i,j] = i * j; # store_b
}
}
*** After distribute(store_b) /* current implementation */ ***
for (int i1 = ...) { # we now have two separate loops for 'i'
for (int j1 = ...) {
a[i1,j1] = i1 + j1; # store_a
}
}
for (int i2 = ...) {
for (int j2 = ...) {
b[i2,j2] = i2 * j2; # store_b
}
}
*** After distribute(store_b) /* potentially interesting result unachievable now */ ***
for (int i = ...) { # the loop for 'i' stays un-distributed
for (int j1 = ...) {
a[i,j1] = i + j1; # store_a
}
for (int j2 = ...) {
b[i,j2] = i * j2; # store_b
}
}
I wonder what do you guys think about that, @bertmaher, @navahgar? We could achieve that result by passing another parameter that would specify the scope in which the distribution needs to be performed.
If we decide not to do it, I think we still should add tests covering these scenarios to properly define behavior in such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more intuitive API would distribute a loop over its body.
so, if
loop =
For (i..
For (j
For (k
Then I'd like to call distribute(loop)
and get:
For (i
For (j
For (i
For (k
I'd also echo @ZolotukhinM's point about distributing an inner loop; I'd rather not split all the way to the root if possible.
Good points regarding the API. I agree that the API @bertmaher suggested is more intuitive in general. However, I see a couple of issues with that:
cannot be transformed to:
I am assuming this is a useful case as well. For this case, we need to specify one or more statements after which the split could happen (like what is done currently in this PR).
|
Yeah this is my general frustration with "eager" lowering of everything, in that handling of initializers & stuff kind of gets in the way of describing the loop structure you want. It might actually be OK to distribute over the initializer, though. The way I was thinking it could be handled is to add another pass I was thinking of was a "fuse" pass (not like fuse from TVM, sigh) that would merge adjacent loops with identical bounds. It would essentially be the opposite of distribute. So after distributing a loop, you could then "fuse" to re-merge the initializers. (You'd want to distribute and then simplify so that you can specialize the loop bounds to remove conditionals).
Yeah I'd prefer to explicitly distribute all the levels I want distributed, rather than automatically distribute everything. |
This the LoopFusion I have been planning to work on for a while now. But again, I haven't found any pressing use-case for that yet. One additional complexity with LoopFusion is that we have to check that no dependencies are violated after Fusion. So, it is not going to be as straight-forward as LoopDistribution. |
Conv-Relu is pretty much the fusion use case. |
Updated the PR to add the APIs as we discussed. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navahgar has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navahgar has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navahgar has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Fixes #53864
This PR adds the following APIs that perform loop distribution to
LoopNest
:For
stmt in its body.