refactor autocast python APIs #124479

guangyey · 2024-04-19T14:10:55Z

Stack from ghstack (oldest at bottom):

Motivation

Refactor autocast usage scenario in torch/amp/autocast_mode.py and torch/utils/checkpoint.py to fix the bug - convention conflict between torch.xxx.get_autocast_xxx_dtype defined in autocast_mode.py and torch.xxx.get_autocast_dtype defined in checkpoint.py.

Solution

Use device-agnostic APIs like torch.get_autocast_dtype, ..., instead.

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @mcarilli @ptrblck @leslie-fang-intel @jgong5 @voznesenskym @EikanWang @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng

pytorch-bot · 2024-04-19T14:10:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124479

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit cab72c6 with merge base 7e095be ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: a0d1efdc93a59082a910d564691c1e220bc87985 Pull Request resolved: #124479

[ghstack-poisoned]

ghstack-source-id: 2c674cffb03326d4fdaf063c3319cedff4140786 Pull Request resolved: #124479

ghstack-source-id: 3449eab20931996afece69220ce9911a0b4f33e5 Pull Request resolved: #124479

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k mcarilli ptrblck leslie-fang-intel jgong5 voznesenskym EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

ghstack-source-id: a7999bea331954a717a161a9e7f4af55d7eab8eb Pull Request resolved: #124479

ghstack-source-id: 0b9883094b4639c3147fe43cc1807e0965cdcd28 Pull Request resolved: #124479

ghstack-source-id: e31b17087efc139e43c2969abc56a816b9326907 Pull Request resolved: #124479

pytorchmergebot · 2024-04-25T14:33:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`. # Solution Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead. Pull Request resolved: pytorch#124479 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: pytorch#124359

# Motivation As discussed in [#124479](#124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

# Motivation As discussed in [#124479](#124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. Add two new UTs to cover this change in eager and jit path respectively. cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

# Motivation As discussed in [#124479](#124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. Add two new UTs to cover this change in eager and jit path respectively. Pull Request resolved: #125103 Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/gujinghui

Summary: # Motivation As discussed in [#124479](pytorch/pytorch#124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. Add two new UTs to cover this change in eager and jit path respectively. X-link: pytorch/pytorch#125103 Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/gujinghui Reviewed By: izaitsevfb Differential Revision: D57138276 fbshipit-source-id: 17f883924e43f68dd6836d99b06fe8a47cfccbf6

Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`. Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead. Pull Request resolved: #124479 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #124359

This reverts commit 495b0c9.

* Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154) By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 744f341) * Revert "Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)" This reverts commit a1b04d8. * Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154) By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 744f341) * Revert "Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)" This reverts commit 5a28bad. * Refactor autocast C++ APIs to be device-agnostic (#124359) # Motivation This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast **C++** APIs. In C++ side, - `is_enabled()` -> `is_enabled(device_type)`. - `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`. - `get_autocast_dtype()` -> `get_autocast_dtype(device_type)` - `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)` These following C++ APIs are deprecated and should be removed in PyTorch 2.5 - `is_cpu_enabled` - `set_cpu_enabled` - `get_autocast_cpu_dtype` - `set_autocast_cpu_dtype` - `is_xpu_enabled` - `set_xpu_enabled` - `get_autocast_xpu_dtype` - `set_autocast_xpu_dtype` - `is_ipu_enabled` - `set_ipu_enabled` - `get_autocast_ipu_dtype` - `set_autocast_ipu_dtype` - `is_hpu_enabled` - `set_hpu_enabled` - `get_autocast_hpu_dtype` - `set_autocast_hpu_dtype` - `is_xla_enabled` - `set_xla_enabled` - `get_autocast_xla_dtype` - `set_autocast_xla_dtype` - `is_privateuseone_enabled` - `set_privateuseone_enabled` - `get_autocast_privateuseone_dtype` - `set_autocast_privateuseone_dtype` In Python side, provide 4 generic autocast APIs: - `torch.is_autocast_enabled(device_type)` - `torch.set_autocast_enabled(device_type, new_enabled)` - `torch.get_autocast_dtype(device_type)` - `torch.set_autocast_dtype(device_type, dtype)` # Additional Context We will submit another PR to refactor autocast **Python** APIs based on this PR. Pull Request resolved: #124359 Approved by: https://github.com/jgong5, https://github.com/albanD * refactor autocast python APIs (#124479) Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`. Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead. Pull Request resolved: #124479 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #124359 * Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154) By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD * Revert "refactor autocast python APIs (#124479)" This reverts commit 495b0c9. * Revert "Refactor autocast C++ APIs to be device-agnostic (#124359)" This reverts commit 83106b7. --------- Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: Huy Do <huydhn@gmail.com> Co-authored-by: Yu, Guangye <guangye.yu@intel.com>

guangyey requested review from BowenBao, thiagocrepaldi, wschin, albanD, jbschlosser and mikaylagawarecki as code owners April 19, 2024 14:10

guangyey mentioned this pull request Apr 19, 2024

Refactor autocast C++ APIs to be device-agnostic #124359

Closed

pytorch-bot bot added ciflow/inductor module: amp (automated mixed precision) autocast module: dynamo oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: onnx torch.onnx related changes that should show up in the release notes labels Apr 19, 2024

guangyey added a commit that referenced this pull request Apr 19, 2024

refactor autocast python APIs

a8ba209

ghstack-source-id: a0d1efdc93a59082a910d564691c1e220bc87985 Pull Request resolved: #124479

guangyey changed the title ~~refactor autocast python APIs~~ [WIP] refactor autocast python APIs Apr 19, 2024

guangyey marked this pull request as draft April 19, 2024 14:11

pytorchbot added the open source label Apr 19, 2024

refactor autocast python APIs

e0e0e4f

[ghstack-poisoned]

guangyey added a commit that referenced this pull request Apr 20, 2024

refactor autocast python APIs

f93cabf

ghstack-source-id: 2c674cffb03326d4fdaf063c3319cedff4140786 Pull Request resolved: #124479

guangyey added a commit that referenced this pull request Apr 20, 2024

refactor autocast python APIs

0f49d23

ghstack-source-id: 3449eab20931996afece69220ce9911a0b4f33e5 Pull Request resolved: #124479

guangyey added 2 commits April 20, 2024 23:38

guangyey added a commit that referenced this pull request Apr 21, 2024

refactor autocast python APIs

7c1e5c5

ghstack-source-id: a7999bea331954a717a161a9e7f4af55d7eab8eb Pull Request resolved: #124479

guangyey added a commit that referenced this pull request Apr 21, 2024

refactor autocast python APIs

061c456

ghstack-source-id: 0b9883094b4639c3147fe43cc1807e0965cdcd28 Pull Request resolved: #124479

guangyey added a commit that referenced this pull request Apr 21, 2024

refactor autocast python APIs

34ffe21

ghstack-source-id: e31b17087efc139e43c2969abc56a816b9326907 Pull Request resolved: #124479

guangyey marked this pull request as ready for review April 21, 2024 15:52

guangyey changed the title ~~[WIP] refactor autocast python APIs~~ refactor autocast python APIs Apr 21, 2024

guangyey requested review from gujinghui, EikanWang and jgong5 April 21, 2024 15:53

pytorchmergebot added the Merged label Apr 25, 2024

pytorchmergebot closed this in cdc66e9 Apr 25, 2024

pytorchmergebot removed the merging label Apr 25, 2024

guangyey mentioned this pull request Apr 25, 2024

add new API torch.amp.is_autocast_available #124938

Closed

albanD mentioned this pull request May 3, 2024

Migrate PyTorch/XLA's gradient checkpointing to upstream one pytorch/xla#7024

Open

guangyey mentioned this pull request May 6, 2024

make torch.amp.autocast more generic #125103

Closed

huydhn added a commit that referenced this pull request May 14, 2024

Revert "refactor autocast python APIs (#124479)"

f6fbda8

This reverts commit 495b0c9.

This was referenced May 14, 2024

Fix ref leak in dtype.to_complex()/to_real() #126101

Merged

[v2.3.1] Release Tracker #125425

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor autocast python APIs #124479

refactor autocast python APIs #124479

guangyey commented Apr 19, 2024 •

edited

pytorch-bot bot commented Apr 19, 2024 •

edited

pytorchmergebot commented Apr 25, 2024

refactor autocast python APIs #124479

refactor autocast python APIs #124479

Conversation

guangyey commented Apr 19, 2024 • edited

Motivation

Solution

pytorch-bot bot commented Apr 19, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124479

✅ No Failures

pytorchmergebot commented Apr 25, 2024

Merge started

guangyey commented Apr 19, 2024 •

edited

pytorch-bot bot commented Apr 19, 2024 •

edited