Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] enable nvfuser #82498

Closed
wants to merge 34 commits into from
Closed

Conversation

jeffdaily
Copy link
Collaborator

Description

The nvfuser is enabled for ROCm.

Testing

CI label ciflow/trunk covers the newly enabled ROCm functionality as well as any CUDA regressions caused by these changes.

@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Jul 29, 2022
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 29, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit 6bbe7a7 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jul 29, 2022
@jeffdaily jeffdaily added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 29, 2022
@jeffdaily jeffdaily requested a review from ngimel as a code owner August 29, 2022 20:18
@jeffdaily
Copy link
Collaborator Author

@davidberard98 added the ciflow/periodic label since this is where we run the slow tests for ROCm. It revealed an unexpected success.

2022-08-29T17:48:25.7390645Z   test_nvfuser_extremal_values_unique_consecutive_cuda_float16 (__main__.TestCudaFuserOpInfoCUDA) ... unexpected success (6.622s)
2022-08-29T17:48:32.3703269Z   test_nvfuser_extremal_values_unique_consecutive_cuda_float32 (__main__.TestCudaFuserOpInfoCUDA) ... unexpected success (6.631s)
2022-08-29T17:48:38.9635277Z   test_nvfuser_extremal_values_unique_consecutive_cuda_float64 (__main__.TestCudaFuserOpInfoCUDA) ... unexpected success (6.593s)
2022-08-29T17:48:54.6667431Z   test_nvfuser_extremal_values_unique_cuda_float16 (__main__.TestCudaFuserOpInfoCUDA) ... unexpected success (15.703s)
2022-08-29T17:49:10.5367634Z   test_nvfuser_extremal_values_unique_cuda_float32 (__main__.TestCudaFuserOpInfoCUDA) ... unexpected success (15.870s)
2022-08-29T17:49:26.3813820Z   test_nvfuser_extremal_values_unique_cuda_float64 (__main__.TestCudaFuserOpInfoCUDA) ... unexpected success (15.845s)

Should be fixed by 6bbe7a7.

Unskipped the few NVFuser tests you found in the logs. See 47da32a. They passed. So now just waiting on another round of CI.

@facebook-github-bot
Copy link
Contributor

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 29, 2022
@jeffdaily
Copy link
Collaborator Author

@davidberard98 I hope the phabricator build is going well. All CI is green, including rocm slow.

Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly just mechanical changes. went through most files except the rocm specific runtime headers.

A nitpick on the cmake file to guard rocm header under USE_ROCM. not sure if there's similar thing we can do with build_variables.bzl

torch/csrc/jit/codegen/cuda/nvfuser.cmake Show resolved Hide resolved
@davidberard98
Copy link
Contributor

@jeffdaily looks good, let me know if you want to wait on anything before landing - otherwise I'll land later today (or feel free to land it yourself).

@ngimel
Copy link
Collaborator

ngimel commented Aug 30, 2022

@jjsjann123 are there more instances of nv-specific asm you guys are using? What about bfloat16/fp16 conversions that also use inline asm?
Edit: bfloat16 conversions are covered

@jeffdaily
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@github-actions
Copy link

Hey @jeffdaily.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

pytorchmergebot pushed a commit that referenced this pull request Aug 31, 2022
facebook-github-bot pushed a commit that referenced this pull request Sep 1, 2022
Summary:
### Description
The nvfuser is enabled for ROCm.

### Testing
CI label ciflow/trunk covers the newly enabled ROCm functionality as well as any CUDA regressions caused by these changes.

Pull Request resolved: #82498
Approved by: https://github.com/jjsjann123, https://github.com/davidberard98

Test Plan:
contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d09486ab233284e9f298e45a43977fed8f075fe4

Original Phabricator Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Reviewed By: mehtanirav

Differential Revision: D39073112

Pulled By: mehtanirav

fbshipit-source-id: c12e759b77589c7191ed2fe4cc16bbcdc1ae86bd
@jithunnair-amd jithunnair-amd deleted the rocm_enable_nvfuser branch September 1, 2022 19:26
facebook-github-bot pushed a commit that referenced this pull request Sep 2, 2022
Summary:
Addresses comment in #82498 as a follow-up PR.

#82498 (comment)

Pull Request resolved: #84312
Approved by: https://github.com/jjsjann123

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/6efadf7e7e6655b543b5a9819b6e2eac2d76f09c

Reviewed By: mehtanirav

Differential Revision: D39213490

fbshipit-source-id: c6158d4880c0614289c23f5af4823773a9545dd1
@jeffdaily jeffdaily added release notes: rocm mandatorylabel topic: new features topic category labels Sep 12, 2022
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
### Description
The nvfuser is enabled for ROCm.

### Testing
CI label ciflow/trunk covers the newly enabled ROCm functionality as well as any CUDA regressions caused by these changes.

Pull Request resolved: pytorch/pytorch#82498
Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
### Description
The nvfuser is enabled for ROCm.

### Testing
CI label ciflow/trunk covers the newly enabled ROCm functionality as well as any CUDA regressions caused by these changes.

Pull Request resolved: pytorch/pytorch#82498
Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
pytorchmergebot pushed a commit that referenced this pull request Jan 16, 2023
In preparation for #89621.

Partial reverts of #82498 and #86369.

Pull Request resolved: #92182
Approved by: https://github.com/davidberard98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged module: rocm AMD GPU support for Pytorch oncall: jit Add this issue/PR to JIT oncall triage queue open source release notes: rocm mandatorylabel topic: new features topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants