-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Infrastructure for a new CUDA Fuser #34785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ange tests to print just the fusion.
…fix mutator test.
…. Clean up replaceAll test.
…sform replay/compute_at.
… Tensor* functions.
…be handled before rhs. Fixed now.
…ency chain of arithmetic operations will be broken.
…ced after reorder ops. Unrelated changes in dependency test.
davidberard98
added a commit
that referenced
this pull request
Mar 3, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 3, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 3, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 3, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 7, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 7, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 7, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 7, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 8, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 8, 2022
These tests have been disabled in OSS CI since #34785. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 31, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 31, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 31, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 31, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 31, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 31, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 31, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Mar 31, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Apr 1, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
davidberard98
added a commit
that referenced
this pull request
Apr 1, 2022
These tests have been disabled in OSS CI since #34785. This disables the windows tests, which currently aren't passing. Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844) [ghstack-poisoned]
jjsjann123
pushed a commit
to jjsjann123/nvfuser
that referenced
this pull request
Oct 29, 2022
Summary: **Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_ One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated. **Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser. **Short term goals:** Parity with current CUDA fuser (including performance): - Dynamic shapes (no recompilation) - Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code) - Dropout **Mid-term goals:** - Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation). - 1-D reductions fused with pointwise operations Pull Request resolved: pytorch/pytorch#34785 Reviewed By: ZolotukhinM Differential Revision: D20650977 Pulled By: soumith fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
jjsjann123
pushed a commit
to jjsjann123/nvfuser
that referenced
this pull request
Oct 29, 2022
Summary: Build fix stemming from pytorch/pytorch#34785 Pull Request resolved: pytorch/pytorch#35917 Differential Revision: D20829353 Pulled By: soumith fbshipit-source-id: 4ba84ecedd354efbc9ac47c9b0f0e3871b404f13
jjsjann123
pushed a commit
to jjsjann123/nvfuser
that referenced
this pull request
Nov 10, 2022
Summary: **Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_ One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated. **Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser. **Short term goals:** Parity with current CUDA fuser (including performance): - Dynamic shapes (no recompilation) - Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code) - Dropout **Mid-term goals:** - Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation). - 1-D reductions fused with pointwise operations Pull Request resolved: pytorch/pytorch#34785 Reviewed By: ZolotukhinM Differential Revision: D20650977 Pulled By: soumith fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
jjsjann123
pushed a commit
to jjsjann123/nvfuser
that referenced
this pull request
Nov 10, 2022
Summary: Build fix stemming from pytorch/pytorch#34785 Pull Request resolved: pytorch/pytorch#35917 Differential Revision: D20829353 Pulled By: soumith fbshipit-source-id: 4ba84ecedd354efbc9ac47c9b0f0e3871b404f13
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Merged
oncall: jit
Add this issue/PR to JIT oncall triage queue
open source
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary: This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing test/cpp/jit/test_gpu_fusion.cpp as well as the long comment section at the beginning of torch/csrc/jit/codegen/cuda/transform_replay.h One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.
Warning: This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.
Short term goals:
Parity with current CUDA fuser (including performance):
Mid-term goals: