Skip to content

Conversation

@yinghai
Copy link
Contributor

@yinghai yinghai commented Oct 9, 2021

Summary: In general we cannot rely on Permute021Linear being kept as is before lowering phase before our transformation could have traced through this module. A acc based fx pass is more reliable to recover the perf.

Test Plan:

buck run mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 //hpc/new/models/ads/benchmarks:ads_dense_benchmark -- over-arch --model-version=23x_3tb --batch-size=2048

OverArch, PyTorch, FP16, BS: 2048, TFLOP/s: 53.22, Time per iter: 14.46ms, QPS: 141629.45
OverArch, TensorRT, FP16, BS: 2048, TFLOP/s: 92.20, Time per iter: 8.35ms, QPS: 245354.15

Unittest:

buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt

Differential Revision: D31525307

@pytorch-probot
Copy link

pytorch-probot bot commented Oct 9, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/yinghai/pytorch/blob/5c5fee8cad1c7fc84a9998ac95369006f9804083/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Oct 9, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 5c5fee8 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31525307

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31525307

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31525307

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31525307

Summary:
Pull Request resolved: pytorch#66362

In general we cannot rely on Permute021Linear being kept as is before lowering phase before our transformation could have traced through this module. A acc based fx pass is more reliable to recover the perf.

Test Plan:
```
buck run mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 //hpc/new/models/ads/benchmarks:ads_dense_benchmark -- over-arch --model-version=23x_3tb --batch-size=2048

OverArch, PyTorch, FP16, BS: 2048, TFLOP/s: 53.22, Time per iter: 14.46ms, QPS: 141629.45
OverArch, TensorRT, FP16, BS: 2048, TFLOP/s: 92.20, Time per iter: 8.35ms, QPS: 245354.15
```

Unittest:
```
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt
```

Reviewed By: jianyuh, wushirong, 842974287

Differential Revision: D31525307

fbshipit-source-id: 6a67991125792110c1aefd054bff1658a78016a0
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31525307

yinghai pushed a commit to yinghai/pytorch that referenced this pull request Oct 13, 2021
…intermediate node (pytorch#66472)

Summary:
Pull Request resolved: pytorch#66472

A follow up of pytorch#66362. Same fix.

Test Plan:
```
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_matmul_trt
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt

```

Reviewed By: wushirong, 842974287

Differential Revision: D31567662

fbshipit-source-id: 43c366dea412346b4cecc7b17ba0f85ca98ac34a
facebook-github-bot pushed a commit that referenced this pull request Oct 13, 2021
…intermediate node (#66472)

Summary:
Pull Request resolved: #66472

A follow up of #66362. Same fix.

Test Plan:
```
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_matmul_trt
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt

```

Reviewed By: wushirong, 842974287

Differential Revision: D31567662

fbshipit-source-id: 2c9e6a138fc31996d790fd4d79e0bf931507fc99
wconstab pushed a commit that referenced this pull request Oct 20, 2021
Summary:
Pull Request resolved: #66362

In general we cannot rely on Permute021Linear being kept as is before lowering phase before our transformation could have traced through this module. A acc based fx pass is more reliable to recover the perf.

Test Plan:
```
buck run mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 //hpc/new/models/ads/benchmarks:ads_dense_benchmark -- over-arch --model-version=23x_3tb --batch-size=2048

OverArch, PyTorch, FP16, BS: 2048, TFLOP/s: 53.22, Time per iter: 14.46ms, QPS: 141629.45
OverArch, TensorRT, FP16, BS: 2048, TFLOP/s: 92.20, Time per iter: 8.35ms, QPS: 245354.15
```

Unittest:
```
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt
```

Reviewed By: jianyuh, wushirong, 842974287

Differential Revision: D31525307

fbshipit-source-id: b472a8c277aa4d156d933d6a5abec091133f22c5
wconstab pushed a commit that referenced this pull request Oct 20, 2021
…intermediate node (#66472)

Summary:
Pull Request resolved: #66472

A follow up of #66362. Same fix.

Test Plan:
```
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_matmul_trt
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt

```

Reviewed By: wushirong, 842974287

Differential Revision: D31567662

fbshipit-source-id: 2c9e6a138fc31996d790fd4d79e0bf931507fc99
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants