Skip to content

Conversation

@mingfeima
Copy link
Collaborator

@mingfeima mingfeima commented Apr 2, 2021

Stack from ghstack:

Differential Revision: D28836794

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 2, 2021

💊 CI failures summary and remediations

As of commit 5b0a4c6 (more details on the Dr. CI page and at hud.pytorch.org/pr/55217):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mingfeima added a commit that referenced this pull request Apr 2, 2021
ghstack-source-id: a3e967e
Pull Request resolved: #55217
@mingfeima
Copy link
Collaborator Author

use float as accumulation type when input is of BFloat16.

Since this PR is not related to parallelization feature, only single core perf is tested:

  • performance update on avx512 machine: Xeon(R) Gold 6248 CPU @ 2.50GHz
### before
sum size: 100000, fp32: 0.005 ms; bf16: 0.023 ms
sum size: 1000000, fp32: 0.153 ms; bf16: 0.217 ms
sum size: 10000000, fp32: 2.272 ms; bf16: 2.130 ms

### after
sum size: 100000, fp32: 0.006 ms; bf16: 0.007 ms
sum size: 1000000, fp32: 0.153 ms; bf16: 0.084 ms
sum size: 10000000, fp32: 2.365 ms; bf16: 0.840 ms
  • performance update on avx2 machine: Xeon(R) CPU E5-2680 v3 @ 2.50GHz
### before
sum size: 100000, fp32: 0.025 ms; bf16: 0.042 ms
sum size: 1000000, fp32: 0.149 ms; bf16: 0.343 ms
sum size: 10000000, fp32: 3.494 ms; bf16: 3.293 ms

### after
sum size: 100000, fp32: 0.024 ms; bf16: 0.016 ms
sum size: 1000000, fp32: 0.135 ms; bf16: 0.093 ms
sum size: 10000000, fp32: 3.229 ms; bf16: 1.078 ms

mingfeima added a commit to mingfeima/pytorch that referenced this pull request Apr 28, 2021
dgl-intel pushed a commit to dgl-intel/pytorch that referenced this pull request May 14, 2021
@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

@peterbell10 peterbell10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this PR, but as of #60387 BFloat16 is accumulated in floats.

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 4f5c688.

@VitalyFedyunin
Copy link
Contributor

@mingfeima can you please take a look if we should revert as per @peterbell10 comment

@peterbell10
Copy link
Collaborator

Oh well, too late I guess. It's probably easier for me to revert it myself as I'm working on other sum PRs that might conflict.

peterbell10 added a commit that referenced this pull request Jun 30, 2021
A similar concept was implemented in gh-60387 which made this dead
code (scalar_t in multi_row_sum will never be BFloat16 or Half).

[ghstack-poisoned]
@mingfeima
Copy link
Collaborator Author

Oh well, too late I guess. It's probably easier for me to revert it myself as I'm working on other sum PRs that might conflict.

@peterbell10 @VitalyFedyunin feel free to revert this one. #60387 is enough to do the job :)

@VitalyFedyunin
Copy link
Contributor

Reverted. Please rebase (and update) rest of the stack.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by cb7d813.

@facebook-github-bot facebook-github-bot deleted the gh/mingfeima/18/head branch July 4, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants