-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batchnorms force set to training mode on torch.onnx.export when running stats are None #75252
Comments
Thanks @8scarlet8, unfortunately I couldn't think of an easy fix. It turns out PyTorch "considers" batchnorm as training, when both running stats are None pytorch/torch/nn/modules/batchnorm.py Line 161 in 839109f
pytorch/torch/nn/modules/batchnorm.py Lines 168 to 180 in 839109f
So from kernel level, what the onnx exporter observe will be batchnorm with training=True . There isn't any extra information that can help differentiate it from real training.
To quickly unblock the issue, one possible solution is to post process the exported onnx graph to remove the unused outputs from BatchNorm. |
Thank you for your reply, @BowenBao. I've managed to solve the problem by initializing model with all batchnorms in Your solution seems plausible as well, in case we need to fix already converted model for example. What bothers me is that this problem seems counter intuitive to me. Please correct me, if I'm wrong, but existence of
|
The same issue also applies to InstanceNorm |
+1 on @aweinmann 's comment. it also happens to InstanceNorm
|
+1
|
We’ve gone ahead and closed this issue because it is stale. Thanks, |
I managed to remove this warning using this (inelegant) method: Loop over network modules and explicitly set training to False before exporting to onnx:
This removed the warning when exporting to Onnx. Hope this helps, |
I'm still getting this warning. Can I ignore it? |
@8scarlet8 could you, please, provide an example of the code? it should be done after the model.eval() function right?
Especially, I would like to know how did you manage to do that? |
@BowenBao could you suggest a way to do it? |
Hi folks :) What is the solution to this problem? @gedeon1310's hack did not work for me. |
Hi, everyone! Did anyone actually solve this problem? I added this code
after I loaded the state_dict, the warning still appeared as this fellow showed when exporting to onnx model.
Though it could successfully export to onnx model, another cudnn error happened which shows that the instancenorm was still in training mode when I transforred it to tensorRT model for succession work. So if anyone who has better solution, please give us a hand to get rid of this problem. Thx |
A workaround (as already mentioned by @8scarlet8) is to initialize respective operator explicite with
The operator in this case will track the running mean and variance, as described in the documentation: https://pytorch.org/docs/stable/generated/torch.nn.InstanceNorm2d.html |
None of the workarounds are working for me. No matter what I do, its still exporting it in training mode and I have no clue what to do. |
how to solove it. ONNX export mode is set to inference mode, but operator instance_norm is set to training mode. The operators will be exported in training , as specified by the functional operator. |
Actually I didn't care about this problem anymore. I try to transit YoloV7 to tensorrt, and this warning hurts nothing to my final inference result.
…---Original---
From: ***@***.***>
Date: Wed, Sep 27, 2023 08:57 AM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [pytorch/pytorch] Batchnorms force set to training mode ontorch.onnx.export when running stats are None (Issue #75252)
how to solove it. ONNX export mode is set to inference mode, but operator instance_norm is set to training mode. The operators will be exported in training , as specified by the functional operator.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
🐛 Describe the bug
When converting PyTorch model to
.onnx
it assumes that batchnorm layers are in training mode iftrack_running_stats=False
even though layers clearly have training attribute set to False. We can reproduce this by settingmodule.running_var = None
andmodule.running_mean = None
or by creating new model withnn.BatchNorm2d(channels[0], track_running_stats=True)
. Here I will provide basic conversion example with resnet50, where I forcibly set running stats toNone
. If needed, I can provide example where model has batchnorms initialized withtrack_running_stats=False
.This causes converter to wrongly assume that layers are in training mode which prevents further loading with openvino backend. Same thing happens when we convert model to TorchScript with tracing or scripting in advance. This happened to me on PyTorch 1.11.0 in my local testing environment and with PyTorch 1.10.0 on Google Colab. If needed I can reproduce it on Colab with PyTorch 1.11.0.
Is this intentional, so that batchnorms should always have
running_stats
or is this a bug?Here is conversion example where
track_running_stats=True
and conversion goes smoothly, loading with openvino backend.Here is conversion example where batchnorms have running stats set to
None
(as it happens withtrack_running_stats=False
).During conversion it gives following warning:
And throws error when attempting to load in openvino backend:
Versions
PyTorch version: 1.10.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.12.0
Libc version: glibc-2.26
Python version: 3.7.13 (default, Mar 16 2022, 17:37:17) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.144+-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: False
CUDA runtime version: 11.1.105
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.10.0+cu111
[pip3] torchaudio==0.10.0+cu111
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.11.0
[pip3] torchvision==0.11.1+cu111
[pip3] openvino==2022.1.0
[conda] Could not collect
The text was updated successfully, but these errors were encountered: