Skip to content

Conversation

chaekit
Copy link
Contributor

@chaekit chaekit commented Oct 13, 2023

Summary:
without the all in the fix

node.kwargs.get("beta", 1.0) == 1.0
node.kwargs.get("alpha", 1.0) == 1.0
and len(input_shape) == 2
and len(weight_shape) == 2
and all(x % 2 == 0 for x in input_shape + weight_shape)
and shape <= MAX_FUSE_TENSOR_SIZE_GROUP_LINEAR # <----- HERE
for shape in input_shape + weight_shape

this statement defaults to a generator object which means it will always be true. One of the issues is that the shapes could be an odd number which forces gmm to load element-by-element rather than vectorized load. In VDDv3 torchbench example(posted in test plan), you can see there is a 37ms GMM call which swamps any gain from fusion. Overall this change makes the GMM fusion 24% faster

Differential Revision: D48696572

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 13, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111174

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 228cbdf with merge base 898482f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48696572

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48696572

…1174)

Summary:

without the `all` in the fix
```
node.kwargs.get("beta", 1.0) == 1.0
node.kwargs.get("alpha", 1.0) == 1.0
and len(input_shape) == 2
and len(weight_shape) == 2
and all(x % 2 == 0 for x in input_shape + weight_shape)
and shape <= MAX_FUSE_TENSOR_SIZE_GROUP_LINEAR # <----- HERE
for shape in input_shape + weight_shape
```
this statement defaults to a generator object which means it will always be true. One of the issues is that the shapes could be an odd number which forces gmm to load element-by-element rather than vectorized load. In VDDv3 torchbench example(posted in test plan), you can see there is a 37ms GMM call which swamps any gain from fusion. Overall this change makes the GMM fusion 24% faster

Reviewed By: davidberard98

Differential Revision: D48696572
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48696572

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 13, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the export-D48696572 branch October 17, 2023 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants