-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[NNC] New APIs to get loops corresponding to a Buf #53778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 342ad4d (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Codecov Report
@@ Coverage Diff @@
## master #53778 +/- ##
==========================================
- Coverage 77.29% 77.29% -0.01%
==========================================
Files 1888 1888
Lines 183504 183528 +24
==========================================
+ Hits 141838 141852 +14
- Misses 41666 41676 +10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out yesterday, and it's definitely better than not having it :-D.
That said, I'm very worried about the overall approach, because it's very difficult to keep track of the loop structure of a program when you're writing a schedule (and I suspect that would make autotuning messy as well, although maybe it could be overcome by throwing a ton of cycles at it).
As an example, I wanted to perform a conceptually simple optimization to special-case the 2-pixel padding on a 5x5 stride=1 convolution: peel the first and and last 2 iterations. The code for this looks like this impenetrable thicket:
For *head, *tail;
auto loops = nest.getAllLoopNestsWritingToBuf(conv->buf());
nest.sliceHead(loops[1][3], 1, &head, &tail);
loops = nest.getAllLoopNestsWritingToBuf(conv->buf());
nest.sliceHead(loops[3][3], 1, &head, &tail);
loops = nest.getAllLoopNestsWritingToBuf(conv->buf());
nest.sliceTail(loops[5][3], 1, &head, &tail);
loops = nest.getAllLoopNestsWritingToBuf(conv->buf());
nest.sliceTail(loops[5][3], 1, &head, &tail);
loops = nest.getAllLoopNestsWritingToBuf(conv->buf());
nest.sliceHead(loops[5][2], 1, &head, &tail);
loops = nest.getAllLoopNestsWritingToBuf(conv->buf());
nest.sliceHead(loops[15][2], 1, &head, &tail);
loops = nest.getAllLoopNestsWritingToBuf(conv->buf());
nest.sliceTail(loops[25][2], 1, &head, &tail);
loops = nest.getAllLoopNestsWritingToBuf(conv->buf());
nest.sliceTail(loops[25][2], 1, &head, &tail);
I might be able to do a cleaner job here by using head
and tail
more wisely to keep track of the "main loop" that I'm operating on (this schedule ends up being a loser anyways so I don't know if I'll put more effort into it). But I wish it didn't feel like an intellectually Herculean task to express "peel the first and last two iterations".
torch/csrc/jit/tensorexpr/stmt.h
Outdated
// Returns the For stmt that is immediately enclosing the given stmt. | ||
static For* getParentLoop(const Stmt* st) { | ||
if (st == nullptr) { | ||
return nullptr; | ||
} | ||
auto par = st->get_parent(); | ||
if (auto f = dynamic_cast<For*>(par)) { | ||
return f; | ||
} | ||
return For::getParentLoop(par); | ||
} | ||
|
||
// Returns the list of For stmts corresponding to the loopnest that is | ||
// enclosing the given stmt. | ||
static std::vector<For*> getEnclosingLoopNest(const Stmt* st) { | ||
std::vector<For*> loops; | ||
auto f = For::getParentLoop(st); | ||
while (f) { | ||
loops.push_back(f); | ||
f = For::getParentLoop(f); | ||
} | ||
std::reverse(loops.begin(), loops.end()); | ||
return loops; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any compelling reason to make these functions static methods of For
? I think they should be static free functions inside loopnest.cpp instead, to minimize clutter in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move them to LoopNest
, but why do you want them to be non-static? There is no use of any state here.
@ZolotukhinM and I have been leaning towards static methods in the hope that we can get rid of the state in LoopNest
eventually. That would make the code more clear IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved them to LoopNest
but retained as static
. Let me know if you have any reservations with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry , static
is overloaded. I meant a static
free function (not part of a class), as opposed to a class member. But a class member is fine too if they should be exposed to the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah okay. Ideally, we should have them as standalone function. That's probably my end goal as well, once we make all these transforms as static
in LoopNest
. For now, since we have several other static
functions (and no standalone functions), lets keep it that way for uniformity.
To be honest, that seems like a wrong usage, and not an issue with the API. Here is the code that'd peel first and last iterations, and it looks perfectly readable and reasonable to me:
|
Okay, so I'm an idiot. I tried out your way, and it's pretty nice:
|
I think the reason I wrote the first version is that I "want" to think in terms of the structure of the whole computation, so I keep doing "get all the loops". But that's not the best way to express this; you want to hang on to a handle to the main loop and navigate from there. My perf results are still bad, but that's LLVM's fault :-p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navahgar has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navahgar has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Fixes pytorch#53092 This PR adds the following APIs to NNC. ``` // In For: static For* getParentLoop(const Stmt* st); static std::vector<For*> getEnclosingLoopNest(const Stmt* st); // In LoopNest: std::vector<const Stmt*> getAllWritesToBuf(const Buf*) const; std::vector<For*> getAllInnermostLoopsWritingToBuf(const Buf*) const; std::vector<std::vector<For*>> getAllLoopNestsWritingToBuf(const Buf*) const; ``` These APIs are required for some usecases that involve multiple transformations like `splitWithTail` followed by `reorder` as shown in pytorch#53092 Pull Request resolved: pytorch#53778 Reviewed By: albanD Differential Revision: D26987013 Pulled By: navahgar fbshipit-source-id: 491459eddfff045132d2358631ad069bbcc520df
Fixes #53092
This PR adds the following APIs to NNC.
These APIs are required for some usecases that involve multiple transformations like
splitWithTail
followed byreorder
as shown in #53092