Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable bf16 emb #94163

Closed
wants to merge 4 commits into from
Closed

enable bf16 emb #94163

wants to merge 4 commits into from

Conversation

zhuhaozhe
Copy link
Collaborator

@zhuhaozhe zhuhaozhe commented Feb 6, 2023

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94163

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 178a574:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zhuhaozhe
Copy link
Collaborator Author

Hi, @jianyuh. I do not know why FBGEMM is changed in #93895, I just updated it again in this PR.

@jianyuh
Copy link
Member

jianyuh commented Feb 6, 2023

Re "why FBGEMM is changed in #93895":

I guess @malfet is fixing some MacOS OpenMP issue. It should be safe to keep the latest version of FBGEMM.

@zhuhaozhe zhuhaozhe added this to the 2.0.0 milestone Feb 6, 2023
@zhuhaozhe zhuhaozhe added intel This tag is for PR from Intel module: cpu CPU specific problem (e.g., perf, algorithm) labels Feb 6, 2023
@malfet
Copy link
Contributor

malfet commented Feb 6, 2023

Re "why FBGEMM is changed in #93895":

I guess @malfet is fixing some MacOS OpenMP issue. It should be safe to keep the latest version of FBGEMM.

Sorry, that was a mistake. I didn't need to make any submodule update, but they are often hard to catch. I'll prioritize working on lint check that prevents accidental updates

@zhuhaozhe
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 6, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor

huydhn commented Feb 7, 2023

@pytorchbot revert -m 'Sorry for reverting your PR. But I suspect that it causes flaky SIGSEGV failure for linux-bionic-py3.8-clang9 / test (crossref) job in trunk. For example, https://hud.pytorch.org/pytorch/pytorch/commit/05397b12505f4fd1bc98af562e103f4162993c1a' -c weird

@huydhn huydhn reopened this Feb 7, 2023
@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@zhuhaozhe your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Feb 7, 2023
This reverts commit f3bf46e.

Reverted #94163 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. But I suspect that it causes flaky SIGSEGV failure for linux-bionic-py3.8-clang9 / test (crossref) job in trunk.  For example, https://hud.pytorch.org/pytorch/pytorch/commit/05397b12505f4fd1bc98af562e103f4162993c1a
@Skylion007
Copy link
Collaborator

Re "why FBGEMM is changed in #93895":
I guess @malfet is fixing some MacOS OpenMP issue. It should be safe to keep the latest version of FBGEMM.

Sorry, that was a mistake. I didn't need to make any submodule update, but they are often hard to catch. I'll prioritize working on lint check that prevents accidental updates

@malfet pre-commit has a check for that if you need a reference: https://github.com/pre-commit/pre-commit-hooks

@malfet
Copy link
Contributor

malfet commented Feb 7, 2023

@malfet pre-commit has a check for that if you need a reference: https://github.com/pre-commit/pre-commit-hooks

We have lintrunner, but I guess a bit of a challenge is to connect it to PR message (rather than to a commit message title)

facebook-github-bot pushed a commit to pytorch/FBGEMM that referenced this pull request Feb 10, 2023
Summary:
There is random failure https://hud.pytorch.org/pytorch/pytorch/commit/05397b12505f4fd1bc98af562e103f4162993c1a and
pytorch/pytorch#94163 is reverted.

The random failure is caused by re-use `lengths`(which is the `offset` in Pytorch Embedding Bag) addr.

For FP32->BF16 convert, we need to add a VEC with 2^15 and right shift 16 to do round-nearest
https://github.com/pytorch/FBGEMM/blob/main/src/FbgemmBfloat16ConvertAvx2.cc#L18-L21.
The first version I used
```
          a->mov(scratchReg2_, 1 << 15);
          a->vpbroadcastd(ones_vreg, scratchReg2_);
```
This will cause random fail but cannot work on AVX2 since `asmjit` do not support it broadcast from `GP`(scratchReg2_) to `VEC`(ones_vreg).
https://github.com/asmjit/asmjit/blob/996deae3273073bf75fbd6ddeac038dff5fdb6eb/src/asmjit/x86/x86emitter.h#L2794-L2796

As it showed on `asmjit` headers, We can broadcast from `mem` to `VEC`. So I re-use
`lengths` (it is the ptr for `offset` from Pytorch EmbeddingBag).
I first `mov` the content or `lenghts` to `scratchReg2_`, and `mov` `2^15` to `lenghts` ptr and broadcast it to VEC. After this, I recover `lenghts` content with `scratchReg2_`.
```
          // Cannot find a broadcast instruction for int from GP to VEC with
          // AVX2. We use lengths address to perform the broadcast and
          // write it back
          auto temp_addr = x86::dword_ptr(lengths, 0);
          a->mov(scratchReg2_, temp_addr);
          a->mov(temp_addr, 1 << 15);
          a->vpbroadcastd(ones_vreg, temp_addr);
          a->mov(temp_addr, scratchReg2_);
```
This temporary usage of `lengths` ptr caused the random failure (related to memory overlap with multithreaded order).
For example, we have threads `t1` and `t2`. After `t1 write to this addr a->mov(temp_addr, 1 << 15)` and before `t1 recovery this addr (a->mov(temp_addr, scratchReg2_))`. This addr is `read by t2`. That will cause a failure.
This may be because `a->mov(temp_addr, 1 << 15);` may not only write 32 (or even 64-bit or `lengths` ptr) since it may randomly fail while both `indices` and `offset` are int64_t.

I found another path to generate `VEC` with `2^15`. This way we will not do unsafe read/write with the given memory address anymore so we can solve the random failure.
```
          a->mov(scratchReg2_, 1 << 15);
          a->vpinsrd(ones_vreg.xmm(), ones_vreg.xmm(), scratchReg2_, 0);
          a->vpbroadcastd(ones_vreg, ones_vreg.xmm());
```

Pull Request resolved: #1583

Reviewed By: brad-mengchi, jiecaoyu, jiawenliu64

Differential Revision: D43112022

Pulled By: jianyuh

fbshipit-source-id: 54616eac9fb0277674de98143fde0491d0e78deb
@zhuhaozhe
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed (Rule superuser). The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@zhuhaozhe
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased bf16_emb onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout bf16_emb && git pull --rebase)

@zhuhaozhe
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@zhuhaozhe
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

ghstack-source-id: b515f4d291ce7ebec194aad7813a2239582c34a3
Pull Request resolved: pytorch#89199
ghstack-source-id: d2e631fb2c04b13ffdf9f432504eef04436b3008
Pull Request resolved: pytorch#91949
@pytorchmergebot
Copy link
Collaborator

Successfully rebased bf16_emb onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout bf16_emb && git pull --rebase)

@zhuhaozhe
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source Reverted
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

8 participants