Skip to content

Conversation

pruthvistony
Copy link
Collaborator

@pruthvistony pruthvistony commented Jun 28, 2022

torch.cuda.is_bf16_supported() return False on ROCm which is not correct, since BF16 is supported on all AMD GPU arch - gfx906, gfx908 and gfx90a.

cc @jithunnair-amd

@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Jun 28, 2022
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 28, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit 390f75c (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@pruthvistony pruthvistony added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Jun 28, 2022
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 28, 2022
@pruthvistony
Copy link
Collaborator Author

pruthvistony commented Jun 28, 2022

This is NOT good :( , I dont know how I missed it. Thanks @jeffdaily.

@pruthvistony
Copy link
Collaborator Author

This issue was report by an internal user trying to run a model which is FP32 and converting it to BF16 and encountered an error using above API. Didnt get any UT actively using this API for ROCm case and it is used in TEST_CUDA scenario.
However this API needs update for ROCm backend.

@jeffdaily jeffdaily changed the title Updated bf16 check for ROCm [ROCm] torch.cuda.is_bf16_supported() returns True Aug 1, 2022
Copy link
Collaborator

@jeffdaily jeffdaily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. CI failures not related to this change. Rebase requested to aid in merging.

@jeffdaily
Copy link
Collaborator

@pytorchbot rebase

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 1, 2022

You don't have permissions to rebase this PR, only the PR author and pytorch organization members may rebase this PR.

@pruthvistony
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch push -f https://github.com/ROCmSoftwarePlatform/pytorch.git pull/80410/head:rocm_bf16_check returned non-zero exit code 128

remote: Permission to ROCmSoftwarePlatform/pytorch.git denied to pytorchmergebot.
fatal: unable to access 'https://github.com/ROCmSoftwarePlatform/pytorch.git/': The requested URL returned error: 403

Raised by https://github.com/pytorch/pytorch/actions/runs/2778258239

@pruthvistony
Copy link
Collaborator Author

@malfet ,
Can you please review and help in merging this issue.

@malfet
Copy link
Contributor

malfet commented Aug 3, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2022

Hey @pruthvistony.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request Aug 4, 2022
Summary:
`torch.cuda.is_bf16_supported()` return False on ROCm which is not correct, since BF16 is supported on all AMD GPU arch - gfx906, gfx908 and gfx90a.

cc jithunnair-amd

Pull Request resolved: #80410
Approved by: https://github.com/jeffdaily, https://github.com/malfet

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/b57188760be857a9d4c49b5dfa2efd1f78c06af8

Reviewed By: kit1980

Differential Revision: D38394982

fbshipit-source-id: 036dbaa9eb1b3e62ca3dcaf0b61127dc4d981f32
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Sep 13, 2022
`torch.cuda.is_bf16_supported()` return False on ROCm which is not correct, since BF16 is supported on all AMD GPU arch - gfx906, gfx908 and gfx90a.

cc @jithunnair-amd
Pull Request resolved: pytorch#80410
Approved by: https://github.com/jeffdaily, https://github.com/malfet
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Sep 13, 2022
`torch.cuda.is_bf16_supported()` return False on ROCm which is not correct, since BF16 is supported on all AMD GPU arch - gfx906, gfx908 and gfx90a.

cc @jithunnair-amd
Pull Request resolved: pytorch#80410
Approved by: https://github.com/jeffdaily, https://github.com/malfet
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Sep 13, 2022
`torch.cuda.is_bf16_supported()` return False on ROCm which is not correct, since BF16 is supported on all AMD GPU arch - gfx906, gfx908 and gfx90a.

cc @jithunnair-amd
Pull Request resolved: pytorch#80410
Approved by: https://github.com/jeffdaily, https://github.com/malfet
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Sep 13, 2022
`torch.cuda.is_bf16_supported()` return False on ROCm which is not correct, since BF16 is supported on all AMD GPU arch - gfx906, gfx908 and gfx90a.

cc @jithunnair-amd
Pull Request resolved: pytorch#80410
Approved by: https://github.com/jeffdaily, https://github.com/malfet
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Sep 13, 2022
`torch.cuda.is_bf16_supported()` return False on ROCm which is not correct, since BF16 is supported on all AMD GPU arch - gfx906, gfx908 and gfx90a.

cc @jithunnair-amd
Pull Request resolved: pytorch#80410
Approved by: https://github.com/jeffdaily, https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged module: rocm AMD GPU support for Pytorch open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants