-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Reland "[JIT] during freezing, cast optional bias to half if weight is half" #77617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s half" Original PR: #77295 Original commit message: On GPU, conv errors if not all its inputs have the same dtype. In the case of autocasting during freezing, what we see is: 1) inputs to conv are casted to half 2) inputs to batchnorm are not casted, so many are still floats 3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm. If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU. Reland changes: There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now. [ghstack-poisoned]
🔗 Helpful links
✅ No Failures (1 Pending)As of commit dc27594 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
…s half" Original PR: #77295 Original commit message: On GPU, conv errors if not all its inputs have the same dtype. In the case of autocasting during freezing, what we see is: 1) inputs to conv are casted to half 2) inputs to batchnorm are not casted, so many are still floats 3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm. If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU. Reland changes: There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now. ghstack-source-id: 5c401ff Pull Request resolved: #77617
I took a look at the failing test. I noticed that the reported caching allocator leak is about in the same magnitude as of parameters in the model. Looking at the frozen graph, we are converting properties to prim::Constant in the graph. Looks like disabling that resolves the caching allocator failure for me locally. --> commenting out this line:
Not sure how that would only break with caching allocator 😕 |
@pytorchbot merge this please |
Hey @davidberard98. |
…s half" (#77617) Summary: Original PR: #77295 Original commit message: On GPU, conv errors if not all its inputs have the same dtype. In the case of autocasting during freezing, what we see is: 1) inputs to conv are casted to half 2) inputs to batchnorm are not casted, so many are still floats 3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm. If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU. Reland changes: There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now. Pull Request resolved: #77617 Approved by: https://github.com/eellison Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d0dc7cb7743825eeb4eff804b22ec37cc1d78cb7 Reviewed By: atalman Differential Revision: D36445937 Pulled By: davidberard98 fbshipit-source-id: 9af6869b948697c0464da2d8ad6ba0a40a42fd55
Stack from ghstack:
Original PR: #77295
Original commit message:
On GPU, conv errors if not all its inputs have the same dtype.
In the case of autocasting during freezing, what we see is:
If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU.
Reland changes:
There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now.