Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AOTI] support freezing for MKLDNN #124350

Closed
wants to merge 36 commits into from

Conversation

chunyuan-w
Copy link
Collaborator

@chunyuan-w chunyuan-w commented Apr 18, 2024

Stack from ghstack (oldest at bottom):

Description

Fixes #114450. This PR builds upon the work from @imzhuhl done in #114451.

This PR requires #126137 to land firstly.

We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so.
ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time.

Test plan:

python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu

TODOs in follow-up PRs

  1. We found that using AOTI_TORCH_CHECK will cause performance drop on several models (DistillGPT2, MBartForConditionalGeneration, T5ForConditionalGeneration, T5Small) compared with JIT Inductor which uses TORCH_CHECK. This may need further discussion how to address (AOTI_TORCH_CHECK is introduced in
    [aot_inductor] replace TORCH_CHECK with AOTI_CHECK in the generate cpp code #119220).
  2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: AssertionError: None, i.e. optional output is not supported.
    def extract_output_name(out):
    assert out is not None, "None, i.e. optional output is not supported"

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

Copy link

pytorch-bot bot commented Apr 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124350

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 39ed894 with merge base 5f15110 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Apr 18, 2024
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: cbc0a48c92b62c560599982d419623b9294155b9
Pull Request resolved: #124350
@chunyuan-w chunyuan-w marked this pull request as draft April 18, 2024 02:31
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: cdb82433fe67fe8d7aa1f88721937d11780ddd78
Pull Request resolved: #124350
@chunyuan-w chunyuan-w added the topic: not user facing topic category label Apr 18, 2024
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: b82b43c230819515d783de47bc17cba2ab086b4a
Pull Request resolved: #124350
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: 64c3a9d6e3d2ca4f6a1209384ddfe98180480e3f
Pull Request resolved: #124350
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: 124aaa2cfc3d39e470601921aa0a272fc1ed90cb
Pull Request resolved: #124350
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: c4ba2e648e9a524ff8541d9547fbe34aa1393851
Pull Request resolved: #124350
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: 8a41ca1388c3ce680d144423d5441a7054e15f12
Pull Request resolved: #124350
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: 32e7ba5d3536cab9441f0cad5e46928a62f52ea8
Pull Request resolved: #124350
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: 0d17984ddcd2b6c3a8d46e3ae2669b9046ef89dd
Pull Request resolved: #124350
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: b099da9376aa54a2021221d6fcadfb201f67c8b8
Pull Request resolved: #124350
This PR builds upon the work done in #114451.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@clee2000
Copy link
Contributor

@pytorchbot revert -m "Seems to have broken inductor/test_aot_inductor.py::AOTInductorTestNonABICompatibleCpu::test_freezing_non_abi_compatible_cpu https://hud.pytorch.org/pytorch/pytorch/commit/654afb6f3ae3ddbd926a753f9af95a6f6e22131c https://github.com/pytorch/pytorch/actions/runs/9224838183/job/25382780192" -c landrace

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request May 24, 2024
This reverts commit 654afb6.

Reverted #124350 on behalf of https://github.com/clee2000 due to Seems to have broken inductor/test_aot_inductor.py::AOTInductorTestNonABICompatibleCpu::test_freezing_non_abi_compatible_cpu https://hud.pytorch.org/pytorch/pytorch/commit/654afb6f3ae3ddbd926a753f9af95a6f6e22131c https://github.com/pytorch/pytorch/actions/runs/9224838183/job/25382780192 ([comment](#124350 (comment)))
@pytorchmergebot
Copy link
Collaborator

@chunyuan-w your PR has been successfully reverted.

## Description
Fixes #114450. This PR builds upon the work from imzhuhl done in #114451.

This PR requires #122472 to land firstly.

We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so.
ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time.

### Test plan:
```sh
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu
```

### TODOs in follow-up PRs
1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in 
 #119220).
2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`.
https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request May 25, 2024
ghstack-source-id: 5578a1ffa9441044f7fbe07364bb548ed06e9dd4
Pull Request resolved: #124350
## Description
Fixes #114450. This PR builds upon the work from imzhuhl done in #114451.

This PR requires #122472 to land firstly.

We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so.
ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time.

### Test plan:
```sh
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu
```

### TODOs in follow-up PRs
1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in 
 #119220).
2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`.
https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request May 25, 2024
ghstack-source-id: 74ed9d89450c8dfa6320b347d13d3d7bbdb1c300
Pull Request resolved: #124350
@chunyuan-w
Copy link
Collaborator Author

@pytorchbot revert -m "Seems to have broken inductor/test_aot_inductor.py::AOTInductorTestNonABICompatibleCpu::test_freezing_non_abi_compatible_cpu https://hud.pytorch.org/pytorch/pytorch/commit/654afb6f3ae3ddbd926a753f9af95a6f6e22131c https://github.com/pytorch/pytorch/actions/runs/9224838183/job/25382780192" -c landrace

This should be caused by conflicts with #126068 which also landed yesterday at similar time. When CI was executed on this PR yesterday, the change from #126068 is not yet included in this PR so this issue is not exposed at that moment.

I fixed this issue in this commit 39ed894.

@chunyuan-w
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

titaiwangms pushed a commit to titaiwangms/pytorch that referenced this pull request May 28, 2024
## Description
Fixes pytorch#114450. This PR builds upon the work from @imzhuhl done in pytorch#114451.

This PR requires pytorch#122472 to land firstly.

We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so.
ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time.

### Test plan:
```sh
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu
```

### TODOs in follow-up PRs
1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in
 pytorch#119220).
2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`.
https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024

Pull Request resolved: pytorch#124350
Approved by: https://github.com/jgong5, https://github.com/desertfire
titaiwangms pushed a commit to titaiwangms/pytorch that referenced this pull request May 28, 2024
titaiwangms pushed a commit to titaiwangms/pytorch that referenced this pull request May 28, 2024
## Description
Fixes pytorch#114450. This PR builds upon the work from @imzhuhl done in pytorch#114451.

This PR requires pytorch#122472 to land firstly.

We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so.
ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time.

### Test plan:
```sh
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu
```

### TODOs in follow-up PRs
1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in
 pytorch#119220).
2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`.
https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024

Pull Request resolved: pytorch#124350
Approved by: https://github.com/jgong5, https://github.com/desertfire
pytorchmergebot pushed a commit that referenced this pull request Jun 7, 2024
With #124350 landed, it is now suggested in AOTI to turn on freezing on CPU to get better performance.

Pull Request resolved: #128010
Approved by: https://github.com/desertfire
TharinduRusira pushed a commit to TharinduRusira/pytorch that referenced this pull request Jun 14, 2024
With pytorch#124350 landed, it is now suggested in AOTI to turn on freezing on CPU to get better performance.

Pull Request resolved: pytorch#128010
Approved by: https://github.com/desertfire
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor open source Reverted topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants