-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[FX] fuse permute021 linear pass for trt lowering #66362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slowFor more information, please take a look at the CI Flow Wiki. |
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 5c5fee8 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
This pull request was exported from Phabricator. Differential Revision: D31525307 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D31525307 |
a77562d to
601e659
Compare
601e659 to
274b612
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31525307 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D31525307 |
274b612 to
e2ddc22
Compare
Summary: Pull Request resolved: pytorch#66362 In general we cannot rely on Permute021Linear being kept as is before lowering phase before our transformation could have traced through this module. A acc based fx pass is more reliable to recover the perf. Test Plan: ``` buck run mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 //hpc/new/models/ads/benchmarks:ads_dense_benchmark -- over-arch --model-version=23x_3tb --batch-size=2048 OverArch, PyTorch, FP16, BS: 2048, TFLOP/s: 53.22, Time per iter: 14.46ms, QPS: 141629.45 OverArch, TensorRT, FP16, BS: 2048, TFLOP/s: 92.20, Time per iter: 8.35ms, QPS: 245354.15 ``` Unittest: ``` buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt ``` Reviewed By: jianyuh, wushirong, 842974287 Differential Revision: D31525307 fbshipit-source-id: 6a67991125792110c1aefd054bff1658a78016a0
e2ddc22 to
5c5fee8
Compare
|
This pull request was exported from Phabricator. Differential Revision: D31525307 |
…intermediate node (pytorch#66472) Summary: Pull Request resolved: pytorch#66472 A follow up of pytorch#66362. Same fix. Test Plan: ``` buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_matmul_trt buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt ``` Reviewed By: wushirong, 842974287 Differential Revision: D31567662 fbshipit-source-id: 43c366dea412346b4cecc7b17ba0f85ca98ac34a
…intermediate node (#66472) Summary: Pull Request resolved: #66472 A follow up of #66362. Same fix. Test Plan: ``` buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_matmul_trt buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt ``` Reviewed By: wushirong, 842974287 Differential Revision: D31567662 fbshipit-source-id: 2c9e6a138fc31996d790fd4d79e0bf931507fc99
Summary: Pull Request resolved: #66362 In general we cannot rely on Permute021Linear being kept as is before lowering phase before our transformation could have traced through this module. A acc based fx pass is more reliable to recover the perf. Test Plan: ``` buck run mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 //hpc/new/models/ads/benchmarks:ads_dense_benchmark -- over-arch --model-version=23x_3tb --batch-size=2048 OverArch, PyTorch, FP16, BS: 2048, TFLOP/s: 53.22, Time per iter: 14.46ms, QPS: 141629.45 OverArch, TensorRT, FP16, BS: 2048, TFLOP/s: 92.20, Time per iter: 8.35ms, QPS: 245354.15 ``` Unittest: ``` buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt ``` Reviewed By: jianyuh, wushirong, 842974287 Differential Revision: D31525307 fbshipit-source-id: b472a8c277aa4d156d933d6a5abec091133f22c5
…intermediate node (#66472) Summary: Pull Request resolved: #66472 A follow up of #66362. Same fix. Test Plan: ``` buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_matmul_trt buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt ``` Reviewed By: wushirong, 842974287 Differential Revision: D31567662 fbshipit-source-id: 2c9e6a138fc31996d790fd4d79e0bf931507fc99
Summary: In general we cannot rely on Permute021Linear being kept as is before lowering phase before our transformation could have traced through this module. A acc based fx pass is more reliable to recover the perf.
Test Plan:
Unittest:
Differential Revision: D31525307