- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.7k
SumKernel (BFloat16): use float as accumulation type #55217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
| 💊 CI failures summary and remediationsAs of commit 5b0a4c6 (more details on the Dr. CI page and at hud.pytorch.org/pr/55217): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 Preview docs built from this PR This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. | 
| use float as accumulation type when input is of BFloat16. Since this PR is not related to parallelization feature, only single core perf is tested: 
 ### before
sum size: 100000, fp32: 0.005 ms; bf16: 0.023 ms
sum size: 1000000, fp32: 0.153 ms; bf16: 0.217 ms
sum size: 10000000, fp32: 2.272 ms; bf16: 2.130 ms
### after
sum size: 100000, fp32: 0.006 ms; bf16: 0.007 ms
sum size: 1000000, fp32: 0.153 ms; bf16: 0.084 ms
sum size: 10000000, fp32: 2.365 ms; bf16: 0.840 ms
 ### before
sum size: 100000, fp32: 0.025 ms; bf16: 0.042 ms
sum size: 1000000, fp32: 0.149 ms; bf16: 0.343 ms
sum size: 10000000, fp32: 3.494 ms; bf16: 3.293 ms
### after
sum size: 100000, fp32: 0.024 ms; bf16: 0.016 ms
sum size: 1000000, fp32: 0.135 ms; bf16: 0.093 ms
sum size: 10000000, fp32: 3.229 ms; bf16: 1.078 ms | 
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ghstack-source-id: 4f2c31e Pull Request resolved: pytorch#55217
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ghstack-source-id: 69bace8 Pull Request resolved: pytorch#55217
| @VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. | 
Differential Revision: [D28836794](https://our.internmc.facebook.com/intern/diff/D28836794) [ghstack-poisoned]
Differential Revision: [D28836794](https://our.internmc.facebook.com/intern/diff/D28836794) [ghstack-poisoned]
Differential Revision: [D28836794](https://our.internmc.facebook.com/intern/diff/D28836794) [ghstack-poisoned]
Differential Revision: [D28836794](https://our.internmc.facebook.com/intern/diff/D28836794) [ghstack-poisoned]
| @VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. | 
Differential Revision: [D28836794](https://our.internmc.facebook.com/intern/diff/D28836794) [ghstack-poisoned]
| @VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. | 
Differential Revision: [D28836794](https://our.internmc.facebook.com/intern/diff/D28836794) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this PR, but as of #60387 BFloat16 is accumulated in floats.
| @VitalyFedyunin merged this pull request in 4f5c688. | 
| @mingfeima can you please take a look if we should revert as per @peterbell10 comment | 
| Oh well, too late I guess. It's probably easier for me to revert it myself as I'm working on other sum PRs that might conflict. | 
A similar concept was implemented in gh-60387 which made this dead code (scalar_t in multi_row_sum will never be BFloat16 or Half). [ghstack-poisoned]
| 
 @peterbell10 @VitalyFedyunin feel free to revert this one. #60387 is enough to do the job :) | 
| Reverted. Please rebase (and update) rest of the stack. | 
| This pull request has been reverted by cb7d813. | 
Stack from ghstack:
Differential Revision: D28836794