-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acyclic partition patch #86511
Acyclic partition patch #86511
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86511
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a80075b: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Now I regret not having this stacked with #86452 😿 |
# check if merge would create cyclic dependency. | ||
for node in merged_nodes: | ||
for user_node in node.users: | ||
if user_node not in merged_nodes and dfs_find_cycle(user_node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah... this is a lot of DFS... and duplicated computation...
is performance a concern here?
I am wondering if it's better to use a dynamic maintained dependency_map to keep track of this... the look up complexity for dependency_map would be O(1)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a lot of DFS... and duplicated computation...
I am wondering if it's better to use a dynamic maintained dependency_map to keep track of this
I don't think there is a lot of duplicated computation . We are traversing each node strictly once for each attempt to merge two nodes, since we record all visited
.
If we are to cache a dependency_map
, when a merge succeeds, we need to traverse and update the dependency_map
for every node to reflect the updated dependency map.
So between those two approach, I don't think one is obviously faster then the other. Not caching the dependency map definitely looks much simpler on the implementation side and we don't have to constantly deal with modifying a nested table. If I'm to take a guess I tend to think that this would run faster in more cases 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually ran into the error like max depth of recursion is reached 🥲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha... That's.... a pleasant surprise to see that we actually run into big graphs for fusion
I should have some bandwidth tomorrow to refactor this. Should be pretty straightforward. I'll cc you in the PR once I get it to work.
Also if you don't hear from me in a few days, feel free to ping and yell at me to get me back on it 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we go: #91042
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Principally, I am fine with this change. The only concern is the performance issue of doing a lot of DFS. :) But I guess compilation time is not too big of a concern at this moment.
cc @wschin as you are also using this Partitioner.
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@davidberard98 I saw you imported this PR. I'm wondering if it's OK for me to merge via the bot. Since this is not an nvfuser code-bump PR. I don't see the need to go through internal workflow for extra safety 🦺 |
@jjsjann123 yes, this PR is fine to merge via the bot. I just wanted to trigger some internal tests since I know this is used internally. Better to catch any issues now than have to wait for a revert & reland. |
Thanks a lot for keeping an eye out for us here. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…91042) Follow up on PR #86511 Python's 1000 limit on recursion depth is not practical for us to run cyclic check on larger graphs. This refactor avoids that issue. Pull Request resolved: #91042 Approved by: https://github.com/kit1980
Fixes #86159 and #86108
Refactored graph partition to check for cyclic dependency on each partition merge, instead of relying on a pre-baked dependency map.
The previous implementation suffers from not updating dependency on existing partition. When a fusion happens, the updated dependency map needs to be propagated to all nodes in the graph, so each node in a partition shares an identical dependency set. Previous implementation suffers from the not identifying cyclic dependency in issue #86159.
Updated implementation does a cyclic check on partitioned graph before attempting a merge of two partitions.
TestFXGraphPasses.forward12