Fix num_batches_tracked of BatchNorm when load_state_dict #110850

FFFrog · 2023-10-09T07:34:16Z

Fixes #110361

as the title shown

pytorch-bot · 2023-10-09T07:34:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110850

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0dddef8 with merge base 73170b2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikaylagawarecki

Thanks!

mikaylagawarecki · 2023-10-24T01:14:16Z

@pytorchbot merge -r

pytorchmergebot · 2023-10-24T01:16:06Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-10-24T01:16:13Z

Successfully rebased mrl_fix_batchnorm onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout mrl_fix_batchnorm && git pull --rebase)

pytorchmergebot · 2023-10-24T01:17:29Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…0850) Fixes pytorch#110361 as the title shown Pull Request resolved: pytorch#110850 Approved by: https://github.com/mikaylagawarecki

…norm" I approved #110850 which did the following Previously: `num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor Now: `num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked` This causes the following issue: ``` with torch.device('meta'): m = BatchNorm(...) m.load_state_dict(state_dict, assign=True) ``` If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised ``` AssertionError: Does not support mixing cuda+meta ``` I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok [ghstack-poisoned]

I approved #110850 which did the following Previously: `num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor Now: `num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked` This causes the following issue: ``` with torch.device('meta'): m = BatchNorm(...) m.load_state_dict(state_dict, assign=True) ``` If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised ``` AssertionError: Does not support mixing cuda+meta ``` I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok Pull Request resolved: #115285 Approved by: https://github.com/albanD

…ch#115285) I approved pytorch#110850 which did the following Previously: `num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor Now: `num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked` This causes the following issue: ``` with torch.device('meta'): m = BatchNorm(...) m.load_state_dict(state_dict, assign=True) ``` If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised ``` AssertionError: Does not support mixing cuda+meta ``` I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok Pull Request resolved: pytorch#115285 Approved by: https://github.com/albanD

FFFrog requested review from albanD, jbschlosser and mikaylagawarecki as code owners October 9, 2023 07:34

pytorch-bot bot added the release notes: nn release notes category label Oct 9, 2023

pytorchbot added the open source label Oct 9, 2023

albanD removed their request for review October 9, 2023 15:37

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 9, 2023

mikaylagawarecki approved these changes Oct 20, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2023

fix the issue described by pytorch#110361

0dddef8

pytorchmergebot force-pushed the mrl_fix_batchnorm branch from 3efee14 to 0dddef8 Compare October 24, 2023 01:16

pytorchmergebot added the merging label Oct 24, 2023

pytorchmergebot added Merged and removed merging labels Oct 24, 2023

pytorchmergebot closed this in 0e0f6a2 Oct 24, 2023

FFFrog deleted the mrl_fix_batchnorm branch October 30, 2023 06:48

mikaylagawarecki mentioned this pull request Dec 6, 2023

Fix _load_from_state_dict for num_batches_tracked in batchnorm #115285

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix num_batches_tracked of BatchNorm when load_state_dict #110850

Fix num_batches_tracked of BatchNorm when load_state_dict #110850

FFFrog commented Oct 9, 2023

pytorch-bot bot commented Oct 9, 2023 •

edited

mikaylagawarecki left a comment

mikaylagawarecki commented Oct 24, 2023

pytorchmergebot commented Oct 24, 2023

pytorchmergebot commented Oct 24, 2023

pytorchmergebot commented Oct 24, 2023

Fix num_batches_tracked of BatchNorm when load_state_dict #110850

Fix num_batches_tracked of BatchNorm when load_state_dict #110850

Conversation

FFFrog commented Oct 9, 2023

pytorch-bot bot commented Oct 9, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110850

✅ No Failures

mikaylagawarecki left a comment

Choose a reason for hiding this comment

mikaylagawarecki commented Oct 24, 2023

pytorchmergebot commented Oct 24, 2023

pytorchmergebot commented Oct 24, 2023

pytorchmergebot commented Oct 24, 2023

Merge started

pytorch-bot bot commented Oct 9, 2023 •

edited