Fix _load_from_state_dict for num_batches_tracked in batchnorm #115285

mikaylagawarecki · 2023-12-06T19:32:17Z

I approved #110850 which did the following

Previously:
num_batches_tracked not in state_dict when doing m.load_state_dict(state_dict) --> always overwrite module's num_batches_tracked in load_from_state_dict with a 0 cpu tensor

Now:
num_batches_tracked not in state_dict loaded when doing m.load_state_dict(state_dict) --> only overwrite module's num_batches_tracked in load_from_state_dict with a 0 cpu tensor if module does not have num_batches_tracked

This causes the following issue:

with torch.device('meta'):
     m = BatchNorm(...)
m.load_state_dict(state_dict, assign=True)

If num_batches_tracked is not in state_dict, since modules's num_batches_tracked is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised

AssertionError: Does not support mixing cuda+meta

I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok

Stack from ghstack (oldest at bottom):

-> Fix _load_from_state_dict for num_batches_tracked in batchnorm #115285

[ghstack-poisoned]

pytorch-bot · 2023-12-06T19:32:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115285

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit f08f7a1 with merge base 0ced55e ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh)

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

trunk / linux-focal-rocm5.7-py3.8 / test (default, 1, 1, linux.rocm.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD

test? Also I guess the condition should be flipped?

…norm" I approved #110850 which did the following Previously: `num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor Now: `num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked` This causes the following issue: ``` with torch.device('meta'): m = BatchNorm(...) m.load_state_dict(state_dict, assign=True) ``` If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised ``` AssertionError: Does not support mixing cuda+meta ``` I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok [ghstack-poisoned]

ghstack-source-id: d2d6fd2897ea9c7180c25b097e80e1f3437d3ab8 Pull Request resolved: #115285

albanD

Thanks!

mikaylagawarecki · 2023-12-07T19:39:02Z

@pytorchbot merge

pytorchmergebot · 2023-12-07T19:42:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ch#115285) I approved pytorch#110850 which did the following Previously: `num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor Now: `num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked` This causes the following issue: ``` with torch.device('meta'): m = BatchNorm(...) m.load_state_dict(state_dict, assign=True) ``` If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised ``` AssertionError: Does not support mixing cuda+meta ``` I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok Pull Request resolved: pytorch#115285 Approved by: https://github.com/albanD

Fix _load_from_state_dict for num_batches_tracked in batchnorm

25affad

[ghstack-poisoned]

mikaylagawarecki mentioned this pull request Dec 6, 2023

Run inference in an Executor #115286

Closed

mikaylagawarecki requested a review from albanD December 6, 2023 19:57

mikaylagawarecki marked this pull request as ready for review December 6, 2023 19:58

mikaylagawarecki requested a review from jbschlosser as a code owner December 6, 2023 19:58

albanD reviewed Dec 6, 2023

View reviewed changes

mikaylagawarecki added a commit that referenced this pull request Dec 7, 2023

Fix _load_from_state_dict for num_batches_tracked in batchnorm

8637acb

ghstack-source-id: d2d6fd2897ea9c7180c25b097e80e1f3437d3ab8 Pull Request resolved: #115285

albanD approved these changes Dec 7, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 7, 2023

mikaylagawarecki added the topic: not user facing topic category label Dec 7, 2023

pytorchmergebot added the merging label Dec 7, 2023

pytorchmergebot added the Merged label Dec 7, 2023

pytorchmergebot closed this in f591933 Dec 7, 2023

pytorchmergebot removed the merging label Dec 7, 2023

facebook-github-bot deleted the gh/mikaylagawarecki/163/head branch December 11, 2023 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix _load_from_state_dict for num_batches_tracked in batchnorm #115285

Fix _load_from_state_dict for num_batches_tracked in batchnorm #115285

mikaylagawarecki commented Dec 6, 2023 •

edited

Loading

pytorch-bot bot commented Dec 6, 2023 •

edited

Loading

albanD left a comment

albanD left a comment

mikaylagawarecki commented Dec 7, 2023

pytorchmergebot commented Dec 7, 2023

Fix _load_from_state_dict for num_batches_tracked in batchnorm #115285

Fix _load_from_state_dict for num_batches_tracked in batchnorm #115285

Conversation

mikaylagawarecki commented Dec 6, 2023 • edited Loading

pytorch-bot bot commented Dec 6, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115285

✅ You can merge normally! (2 Unrelated Failures)

albanD left a comment

Choose a reason for hiding this comment

albanD left a comment

Choose a reason for hiding this comment

mikaylagawarecki commented Dec 7, 2023

pytorchmergebot commented Dec 7, 2023

Merge started

mikaylagawarecki commented Dec 6, 2023 •

edited

Loading

pytorch-bot bot commented Dec 6, 2023 •

edited

Loading