Reland "[JIT] during freezing, cast optional bias to half if weight is half" #77617

davidberard98 · 2022-05-17T00:47:18Z

Stack from ghstack:

-> Reland "[JIT] during freezing, cast optional bias to half if weight is half" #77617

Original PR: #77295

Original commit message:
On GPU, conv errors if not all its inputs have the same dtype.

In the case of autocasting during freezing, what we see is:

inputs to conv are casted to half
inputs to batchnorm are not casted, so many are still floats
we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm.

If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU.

Reland changes:
There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now.

…s half" Original PR: #77295 Original commit message: On GPU, conv errors if not all its inputs have the same dtype. In the case of autocasting during freezing, what we see is: 1) inputs to conv are casted to half 2) inputs to batchnorm are not casted, so many are still floats 3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm. If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU. Reland changes: There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now. [ghstack-poisoned]

facebook-github-bot · 2022-05-17T00:47:23Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/77617
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (1 Pending)

As of commit dc27594 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…s half" Original PR: #77295 Original commit message: On GPU, conv errors if not all its inputs have the same dtype. In the case of autocasting during freezing, what we see is: 1) inputs to conv are casted to half 2) inputs to batchnorm are not casted, so many are still floats 3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm. If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU. Reland changes: There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now. ghstack-source-id: 5c401ff Pull Request resolved: #77617

jjsjann123 · 2022-05-17T09:37:42Z

There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now.

I took a look at the failing test.

I noticed that the reported caching allocator leak is about in the same magnitude as of parameters in the model. Looking at the frozen graph, we are converting properties to prim::Constant in the graph. Looks like disabling that resolves the caching allocator failure for me locally. --> commenting out this line:

pytorch/torch/csrc/jit/passes/freeze_module.cpp

Line 131 in 21d91cf

propagateAttributes(graph);

Not sure how that would only break with caching allocator 😕
cc'ing @ngimel

davidberard98 · 2022-05-17T12:24:08Z

@pytorchbot merge this please

github-actions · 2022-05-17T12:26:00Z

Hey @davidberard98.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

…s half" (#77617) Summary: Original PR: #77295 Original commit message: On GPU, conv errors if not all its inputs have the same dtype. In the case of autocasting during freezing, what we see is: 1) inputs to conv are casted to half 2) inputs to batchnorm are not casted, so many are still floats 3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm. If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU. Reland changes: There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now. Pull Request resolved: #77617 Approved by: https://github.com/eellison Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d0dc7cb7743825eeb4eff804b22ec37cc1d78cb7 Reviewed By: atalman Differential Revision: D36445937 Pulled By: davidberard98 fbshipit-source-id: 9af6869b948697c0464da2d8ad6ba0a40a42fd55

facebook-github-bot added the cla signed label May 17, 2022

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label May 17, 2022

eellison approved these changes May 17, 2022

View reviewed changes

pytorchmergebot added the Merged label May 17, 2022

pytorchmergebot closed this in d0dc7cb May 17, 2022

facebook-github-bot deleted the gh/davidberard98/118/head branch May 20, 2022 14:17

davidberard98 mentioned this pull request Oct 15, 2022

[JIT] Frozen Graph Linear-BatchNormNd Folding #86706

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reland "[JIT] during freezing, cast optional bias to half if weight is half" #77617

Reland "[JIT] during freezing, cast optional bias to half if weight is half" #77617

Uh oh!

davidberard98 commented May 17, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented May 17, 2022 •

edited

Loading

Uh oh!

jjsjann123 commented May 17, 2022

Uh oh!

davidberard98 commented May 17, 2022

Uh oh!

github-actions bot commented May 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Reland "[JIT] during freezing, cast optional bias to half if weight is half" #77617

Reland "[JIT] during freezing, cast optional bias to half if weight is half" #77617

Uh oh!

Conversation

davidberard98 commented May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (1 Pending)

Uh oh!

jjsjann123 commented May 17, 2022

Uh oh!

davidberard98 commented May 17, 2022

Uh oh!

github-actions bot commented May 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

davidberard98 commented May 17, 2022 •

edited

Loading

facebook-github-bot commented May 17, 2022 •

edited

Loading