[FSDP] Re-support model dtype change after FSDP init #91192

awgu · 2022-12-20T20:03:09Z

Stack from ghstack:

[FSDP][RFC] Enforce rank r's current device is cuda:r #92035 [FSDP][RFC] Enforce rank r's current device is cuda:r
[FSDP][BE] Improve device_id + CPU offload test #92031 [FSDP][BE] Improve device_id + CPU offload test
[FSDP][BE] Rename prefixed_param_names -> fqns for consolidation #92028 [FSDP][BE] Rename prefixed_param_names -> fqns for consolidation
[FSDP][BE] Better error msg for incorrect device for training #92027 [FSDP][BE] Better error msg for incorrect device for training
[FSDP] Do not clean FQNs even for use_orig_params=True #91767 [FSDP] Do not clean FQNs even for use_orig_params=True
[FSDP] Test use_orig_params=True, no_sync(), mixed precision #91193 [FSDP] Test use_orig_params=True, no_sync(), mixed precision
[FSDP] Re-support model dtype change after FSDP init #91192 [FSDP] Re-support model dtype change after FSDP init
[FSDP] Clarify MixedPrecision docs #91974 [FSDP] Clarify MixedPrecision docs

To make mixed precision precise internally, #90660 changed the implementation to save _orig_param_dtype, _low_prec_param_dtype, and _reduce_dtype explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window.

Note that any mixed precision settings specified by the user take precedence over the model dtype.

[ghstack-poisoned]

pytorch-bot · 2022-12-20T20:03:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91192

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f2ea4a2:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 3815e5ae8eac082490112724bbc3e847161b4397 Pull Request resolved: pytorch#91192

To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window. Note that any mixed precision settings specified by the user take precedence over the model dtype. [ghstack-poisoned]

ghstack-source-id: c4fb1436b9925f203bbd8fcf84891e2e7d2048c0 Pull Request resolved: pytorch#91192

To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window. Note that any mixed precision settings specified by the user take precedence over the model dtype. [ghstack-poisoned]

ghstack-source-id: 177caa7b10d34939c3979d6f212db5fceb283e44 Pull Request resolved: pytorch#91192

Closes #90838. To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window. Note that any mixed precision settings specified by the user take precedence over the model dtype. [ghstack-poisoned]

ghstack-source-id: cdc64afe5bbb7f6c441958dfaff6afcb70bc308c Pull Request resolved: pytorch#91192

Closes #90838. To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window. Note that any mixed precision settings specified by the user take precedence over the model dtype. [ghstack-poisoned]

ghstack-source-id: cd846eca268ff9277ef68c15424e78de76e233f6 Pull Request resolved: pytorch#91192

[FSDP] Re-support model dtype change after FSDP init

101a1e6

[ghstack-poisoned]

awgu requested review from mrshenli, pritamdamania87, zhaojuanmao, rohan-varma, H-Huang, kwen2501 and wanchaol as code owners December 20, 2022 20:03

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Dec 20, 2022

awgu mentioned this pull request Dec 20, 2022

[FSDP] Test use_orig_params=True, no_sync(), mixed precision #91193

Closed

awgu added the topic: improvements topic category label Dec 20, 2022

awgu added a commit to awgu/pytorch that referenced this pull request Dec 20, 2022

[FSDP] Re-support model dtype change after FSDP init

bd4d4b8

ghstack-source-id: 3815e5ae8eac082490112724bbc3e847161b4397 Pull Request resolved: pytorch#91192

awgu added a commit to awgu/pytorch that referenced this pull request Dec 20, 2022

[FSDP] Re-support model dtype change after FSDP init

e3914f2

ghstack-source-id: c4fb1436b9925f203bbd8fcf84891e2e7d2048c0 Pull Request resolved: pytorch#91192

awgu added a commit to awgu/pytorch that referenced this pull request Dec 20, 2022

[FSDP] Re-support model dtype change after FSDP init

c3ff68d

ghstack-source-id: c4fb1436b9925f203bbd8fcf84891e2e7d2048c0 Pull Request resolved: pytorch#91192

awgu added a commit to awgu/pytorch that referenced this pull request Dec 21, 2022

[FSDP] Re-support model dtype change after FSDP init

c8ff55c

ghstack-source-id: 177caa7b10d34939c3979d6f212db5fceb283e44 Pull Request resolved: pytorch#91192

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 21, 2022

This was referenced Jan 5, 2023

[FSDP] Do not clean FQNs even for use_orig_params=True #91767

Closed

[PoC][FSDP] Async reduce-scatter #91865

Closed

awgu added a commit to awgu/pytorch that referenced this pull request Jan 10, 2023

[FSDP] Re-support model dtype change after FSDP init

862fb58

ghstack-source-id: 177caa7b10d34939c3979d6f212db5fceb283e44 Pull Request resolved: pytorch#91192

awgu added a commit to awgu/pytorch that referenced this pull request Jan 10, 2023

[FSDP] Re-support model dtype change after FSDP init

df0d551

ghstack-source-id: 177caa7b10d34939c3979d6f212db5fceb283e44 Pull Request resolved: pytorch#91192

awgu mentioned this pull request Jan 10, 2023

[FSDP] Clarify MixedPrecision docs #91974

Closed

awgu added a commit to awgu/pytorch that referenced this pull request Jan 10, 2023

[FSDP] Re-support model dtype change after FSDP init

22a0084

ghstack-source-id: cdc64afe5bbb7f6c441958dfaff6afcb70bc308c Pull Request resolved: pytorch#91192

awgu added a commit to awgu/pytorch that referenced this pull request Jan 11, 2023

[FSDP] Re-support model dtype change after FSDP init

ff5bf41

ghstack-source-id: cd846eca268ff9277ef68c15424e78de76e233f6 Pull Request resolved: pytorch#91192

This was referenced Jan 11, 2023

[FSDP][BE] Better error msg for incorrect device for training #92027

Closed

[FSDP][BE] Rename prefixed_param_names -> fqns for consolidation #92028

Closed

This was referenced Jan 11, 2023

[FSDP][BE] Improve device_id + CPU offload test #92031

Closed

[FSDP][RFC] Enforce rank r's current device is cuda:r #92035

Closed

zhaojuanmao approved these changes Jan 11, 2023

View reviewed changes

pytorchmergebot added the Merged label Jan 12, 2023

pytorchmergebot closed this in e5503ac Jan 12, 2023

facebook-github-bot deleted the gh/awgu/285/head branch June 8, 2023 15:33

awgu mentioned this pull request Nov 29, 2023

[WIP] Added per-parameter-sharding FSDP #114733

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP] Re-support model dtype change after FSDP init #91192

[FSDP] Re-support model dtype change after FSDP init #91192

awgu commented Dec 20, 2022 •

edited

pytorch-bot bot commented Dec 20, 2022 •

edited

[FSDP] Re-support model dtype change after FSDP init #91192

[FSDP] Re-support model dtype change after FSDP init #91192

Conversation

awgu commented Dec 20, 2022 • edited

pytorch-bot bot commented Dec 20, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91192

✅ No Failures

awgu commented Dec 20, 2022 •

edited

pytorch-bot bot commented Dec 20, 2022 •

edited