Skip to content

Conversation

csarofeen
Copy link
Contributor

@csarofeen csarofeen commented Mar 15, 2020

Summary: This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing test/cpp/jit/test_gpu_fusion.cpp as well as the long comment section at the beginning of torch/csrc/jit/codegen/cuda/transform_replay.h One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.

Warning: This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.

Short term goals:

Parity with current CUDA fuser (including performance):

  • Dynamic shapes (no recompilation)
  • Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code)
  • Dropout

Mid-term goals:

  • Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation).
  • 1-D reductions fused with pointwise operations

csarofeen and others added 30 commits March 14, 2020 17:31
…ency chain of arithmetic operations will be broken.
…ced after reorder ops. Unrelated changes in dependency test.
davidberard98 added a commit that referenced this pull request Mar 3, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 3, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 3, 2022
1) remove test_jit_cuda_fuser from list of disabled tests
2) make the tests run on cpu (skip the tests instead of erroring)

These tests have been disabled in OSS CI since #34785.

ghstack-source-id: d54af41
Pull Request resolved: #73322
davidberard98 added a commit that referenced this pull request Mar 3, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 3, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 3, 2022
1) remove test_jit_cuda_fuser from list of disabled tests
2) make the tests run on cpu (skip the tests instead of erroring)

These tests have been disabled in OSS CI since #34785.

ghstack-source-id: 39ce824
Pull Request resolved: #73322
davidberard98 added a commit that referenced this pull request Mar 7, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 7, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 7, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 7, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 8, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 8, 2022
These tests have been disabled in OSS CI since #34785.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Apr 1, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Apr 1, 2022
These tests have been disabled in OSS CI since #34785.

This disables the windows tests, which currently aren't passing.

Differential Revision: [D34436844](https://our.internmc.facebook.com/intern/diff/D34436844)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Apr 1, 2022
Summary:
Pull Request resolved: #73322

These tests have been disabled in OSS CI since #34785.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34436844

Pulled By: davidberard98

fbshipit-source-id: c5b14b33e7f369a6fa1e9cfbcb484a30dffc659e
pytorchmergebot pushed a commit that referenced this pull request Apr 1, 2022
Summary:
Pull Request resolved: #73322

These tests have been disabled in OSS CI since #34785.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34436844

Pulled By: davidberard98

fbshipit-source-id: c5b14b33e7f369a6fa1e9cfbcb484a30dffc659e
(cherry picked from commit b08f515)
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
Summary:
**Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_  One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.

**Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.

**Short term goals:**

Parity with current CUDA fuser (including performance):
- Dynamic shapes (no recompilation)
- Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code)
- Dropout

**Mid-term goals:**

- Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation).
- 1-D reductions fused with pointwise operations
Pull Request resolved: pytorch/pytorch#34785

Reviewed By: ZolotukhinM

Differential Revision: D20650977

Pulled By: soumith

fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
Summary:
Build fix stemming from pytorch/pytorch#34785
Pull Request resolved: pytorch/pytorch#35917

Differential Revision: D20829353

Pulled By: soumith

fbshipit-source-id: 4ba84ecedd354efbc9ac47c9b0f0e3871b404f13
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
Summary:
**Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_  One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.

**Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.

**Short term goals:**

Parity with current CUDA fuser (including performance):
- Dynamic shapes (no recompilation)
- Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code)
- Dropout

**Mid-term goals:**

- Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation).
- 1-D reductions fused with pointwise operations
Pull Request resolved: pytorch/pytorch#34785

Reviewed By: ZolotukhinM

Differential Revision: D20650977

Pulled By: soumith

fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
Summary:
Build fix stemming from pytorch/pytorch#34785
Pull Request resolved: pytorch/pytorch#35917

Differential Revision: D20829353

Pulled By: soumith

fbshipit-source-id: 4ba84ecedd354efbc9ac47c9b0f0e3871b404f13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.