Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NNC] Adding API to distribute loops #53865

Closed
wants to merge 1 commit into from

Conversation

navahgar
Copy link
Contributor

@navahgar navahgar commented Mar 12, 2021

Fixes #53864

This PR adds the following APIs that perform loop distribution to LoopNest:

static std::vector<For*> distributeLoop(For* loop, const std::unordered_set<Stmt*>& pivots);
static std::vector<For*> distributeLoop(For* loop);
static std::vector<For*> distributeLoopOverInnerLoops(For* loop);
  • The first method distributes the given loop over its body by splitting after every given pivot stmt.
  • The second method distributes the given loop over every stmt in its body.
  • The last method distributes the given loop over its body by splitting after every For stmt in its body.

@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels Mar 12, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 12, 2021

💊 CI failures summary and remediations

As of commit fd1e571 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@codecov
Copy link

codecov bot commented Mar 12, 2021

Codecov Report

Merging #53865 (fd1e571) into master (2eb3917) will decrease coverage by 0.00%.
The diff coverage is 90.62%.

@@            Coverage Diff             @@
##           master   #53865      +/-   ##
==========================================
- Coverage   77.46%   77.46%   -0.01%     
==========================================
  Files        1887     1887              
  Lines      184578   184610      +32     
==========================================
+ Hits       142987   143007      +20     
- Misses      41591    41603      +12     

Copy link

@ZolotukhinM ZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important case that's not covered in tests and documentation (comments) is a multi-level loop. If I understand the implementation correctly, it currently always distributes the top level loop. I think in some cases we might want to only apply distribution to inner loops, e.g.

*** Before ***
for (int i = ...) {
  for (int j = ...) {
    a[i,j] = i + j; # store_a
    b[i,j] = i * j; # store_b
  }
}
*** After distribute(store_b) /* current implementation */ ***
for (int i1 = ...) { # we now have two separate loops for 'i'
  for (int j1 = ...) {
    a[i1,j1] = i1 + j1; # store_a
  }
}
for (int i2 = ...) {
  for (int j2 = ...) {
    b[i2,j2] = i2 * j2; # store_b
  }
}
*** After distribute(store_b) /* potentially interesting result unachievable now */ ***
for (int i = ...) { # the loop for 'i' stays un-distributed
  for (int j1 = ...) {
    a[i,j1] = i + j1; # store_a
  }
  for (int j2 = ...) {
    b[i,j2] = i * j2; # store_b
  }
}

I wonder what do you guys think about that, @bertmaher, @navahgar? We could achieve that result by passing another parameter that would specify the scope in which the distribution needs to be performed.

If we decide not to do it, I think we still should add tests covering these scenarios to properly define behavior in such cases.

test/cpp/tensorexpr/test_loopnest.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@bertmaher bertmaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more intuitive API would distribute a loop over its body.
so, if

loop = 
  For (i..
    For (j
    For (k

Then I'd like to call distribute(loop) and get:

For (i
  For (j
For (i
  For (k

I'd also echo @ZolotukhinM's point about distributing an inner loop; I'd rather not split all the way to the root if possible.

@navahgar
Copy link
Contributor Author

Good points regarding the API.

I agree that the API @bertmaher suggested is more intuitive in general. However, I see a couple of issues with that:

  1. This API will distribute a loop over all its statements. So, there would be no way to group an initializer and its loop together. For example:
For (i..
    B = 
    For (j
    C =
    For (k

cannot be transformed to:

For (i..
    B = 
    For (j
For (i..
    C =
    For (k

I am assuming this is a useful case as well. For this case, we need to specify one or more statements after which the split could happen (like what is done currently in this PR).

  1. This applies nicely to distributing one loop. How about distributing multiple loops? Are we okay with constraining this to one loop at a time? If we restrict it to one loop, that also addresses the problem that @ZolotukhinM pointed out.

@bertmaher
Copy link
Contributor

  1. This API will distribute a loop over all its statements. So, there would be no way to group an initializer and its loop together. For example:

Yeah this is my general frustration with "eager" lowering of everything, in that handling of initializers & stuff kind of gets in the way of describing the loop structure you want. It might actually be OK to distribute over the initializer, though. The way I was thinking it could be handled is to add another pass I was thinking of was a "fuse" pass (not like fuse from TVM, sigh) that would merge adjacent loops with identical bounds. It would essentially be the opposite of distribute. So after distributing a loop, you could then "fuse" to re-merge the initializers. (You'd want to distribute and then simplify so that you can specialize the loop bounds to remove conditionals).

  1. This applies nicely to distributing one loop. How about distributing multiple loops? Are we okay with constraining this to one loop at a time? If we restrict it to one loop, that also addresses the problem that @ZolotukhinM pointed out.

Yeah I'd prefer to explicitly distribute all the levels I want distributed, rather than automatically distribute everything.

@navahgar
Copy link
Contributor Author

The way I was thinking it could be handled is to add another pass I was thinking of was a "fuse" pass (not like fuse from TVM, sigh) that would merge adjacent loops with identical bounds. It would essentially be the opposite of distribute.

This the LoopFusion I have been planning to work on for a while now. But again, I haven't found any pressing use-case for that yet.

One additional complexity with LoopFusion is that we have to check that no dependencies are violated after Fusion. So, it is not going to be as straight-forward as LoopDistribution.

@bertmaher
Copy link
Contributor

The way I was thinking it could be handled is to add another pass I was thinking of was a "fuse" pass (not like fuse from TVM, sigh) that would merge adjacent loops with identical bounds. It would essentially be the opposite of distribute.

This the LoopFusion I have been planning to work on for a while now. But again, I haven't found any pressing use-case for that yet.

Conv-Relu is pretty much the fusion use case.

@navahgar
Copy link
Contributor Author

Updated the PR to add the APIs as we discussed. PTAL.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navahgar has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navahgar has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navahgar has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@navahgar merged this pull request in 4b2abc4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[nnc] Need an API for loop distribution
4 participants