Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import ONNX LSTM converted from PyTorch #21118

Closed
SamHSlva opened this issue Nov 24, 2021 · 13 comments
Closed

Import ONNX LSTM converted from PyTorch #21118

SamHSlva opened this issue Nov 24, 2021 · 13 comments
Assignees
Labels
bug category: dnn (onnx) ONNX suport issues in DNN module category: dnn confirmed There is stable reproducer / investigation complete
Milestone

Comments

@SamHSlva
Copy link

Ubuntu 18, Python 3.6.13, OpenCV
Detailed description

I am trying to import a simple LSTM network converted from Pytorch to ONNX. The model imports and executes perfectly in both PyTorch and ONNX. When I try to import in OpenCV I get an error.

Node [LSTM]:(81) parse error: OpenCV(4.5.4) /tmp/pip-req-build-w88qv8vs/opencv/modules/dnn/src/onnx/onnx_importer.cpp:463: error: (-5:Bad argument) Blob 79 not found in const blobs in function 'getBlob'

I've submitted this question in OpenCV forum, and got a reply from a moderator, suggesting I should post it here.

Steps to reproduce
class LayerLSTM(nn.Module):
 def __init__(self):
     super(LayerLSTM, self).__init__()
     self.rnns = nn.LSTM(156, 512, 1, batch_first=False)

 def forward(self, x, hx, cx):
     x, (hx, cx) = self.rnns(x, (hx,cx))
     return x

model = LayerLSTM()

with torch.no_grad():
 x = torch.randn(1, 1, 156)
 hx = torch.randn(1, 1, 512)
 cx = torch.randn(1, 1, 512)
 out = model(x, hx, cx)
torch.onnx.export(model, (x, hx, cx), '/home/user/DVS_Original/deep-stabilization/dvs/onnx_model/sample_lstm.onnx', verbose=True, input_names=['x', 'hx', 'cx'], output_names=['output'])
# sess = onnxruntime.InferenceSession('/home/user/DVS_Original/deep-stabilization/dvs/onnx_model/sample_lstm.onnx')
# out_on = sess.run(None, {'x': x.cpu().numpy(), 'hx': hx.numpy(), 'cx': cx.numpy()})
# print(out.numpy() - out_on[0])

net = cv2.dnn.readNetFromONNX('/home/user/DVS_Original/deep-stabilization/dvs/onnx_model/sample_lstm.onnx')
Issue submission checklist
  • [X ] I report the issue, it's not a question
  • [ X ] I checked the problem with documentation, FAQ, open issues,
    forum.opencv.org, Stack Overflow, etc and have not found solution
  • [ X] I updated to latest OpenCV version and the issue is still there
  • [ X] There is reproducer code and related data files: videos, images, onnx, etc
@asmorkalov
Copy link
Contributor

@SamHSlva thanks for the report. Please provide more information about OpenCV, your system setup and the model you tries to load including cv.getBuildInformation() output. ONNX tensors could be filed with random values or zeros. See https://github.com/opencv/opencv/wiki/OpenCV-Debugging-Facilities for advanced options.

@SamHSlva
Copy link
Author

Hi, thank you for the reply.
I have created this toy example just to simplify my bigger problem, and the error is basically the same.
The full torch model and the conversion is as follows:

import torch
import torch.nn as nn
import cv2


class LayerLSTM(nn.Module):
 def __init__(self):
     super(LayerLSTM, self).__init__()
     self.rnns = nn.LSTM(156, 512, 1, batch_first=False)

 def forward(self, x, hx, cx):
     x, (hx, cx) = self.rnns(x, (hx,cx))
     return x

model = LayerLSTM()

with torch.no_grad():
 x = torch.randn(1, 1, 156)
 hx = torch.randn(1, 1, 512)
 cx = torch.randn(1, 1, 512)
 out = model(x, hx, cx)
torch.onnx.export(model, (x, hx, cx), 'toy_lstm.onnx', verbose=True, input_names=['x', 'hx', 'cx'], output_names=['output'])
# sess = onnxruntime.InferenceSession('/home/user/DVS_Original/deep-stabilization/dvs/onnx_model/sample_lstm.onnx')
# out_on = sess.run(None, {'x': x.cpu().numpy(), 'hx': hx.numpy(), 'cx': cx.numpy()})
# print(out.numpy() - out_on[0])

net = cv2.dnn.readNetFromONNX('toy_lstm.onnx')

The Output of this script is the following:

/home/user/anaconda3/envs/DFS_2/bin/python /home/user/DVS_Original/deep-stabilization/pytorch_LSTM_toy.py
/home/user/anaconda3/envs/DFS_2/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py:2174: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model. 
  "or define the initial states (h0/c0) as inputs of the model. ")
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
graph(%x : Float(1, 1, 156, strides=[156, 156, 1], requires_grad=0, device=cpu),
      %hx : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=0, device=cpu),
      %cx : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=0, device=cpu),
      %49 : Float(1, 2048, 156, strides=[319488, 156, 1], requires_grad=0, device=cpu),
      %50 : Float(1, 2048, 512, strides=[1048576, 512, 1], requires_grad=0, device=cpu),
      %51 : Float(1, 4096, strides=[4096, 1], requires_grad=0, device=cpu)):
  %7 : Tensor? = prim::Constant() # /home/user/anaconda3/envs/DFS_2/lib/python3.6/site-packages/torch/nn/modules/rnn.py:710:0
  %28 : Float(1, 1, 1, 512, strides=[512, 512, 512, 1], device=cpu), %29 : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=1, device=cpu), %30 : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=1, device=cpu) = onnx::LSTM[hidden_size=512](%x, %49, %50, %51, %7, %hx, %cx) # /home/user/anaconda3/envs/DFS_2/lib/python3.6/site-packages/torch/nn/modules/rnn.py:710:0
  %output : Float(1, 1, 512, strides=[512, 512, 1], requires_grad=1, device=cpu) = onnx::Squeeze[axes=[1]](%28) # /home/user/anaconda3/envs/DFS_2/lib/python3.6/site-packages/torch/nn/modules/rnn.py:710:0
  return (%output)

[ERROR:0] global /tmp/pip-req-build-w88qv8vs/opencv/modules/dnn/src/onnx/onnx_importer.cpp (718) handleNode DNN/ONNX: ERROR during processing node with 7 inputs and 3 outputs: [LSTM]:(28)
Traceback (most recent call last):
  File "/home/user/DVS_Original/deep-stabilization/pytorch_LSTM_toy.py", line 27, in <module>
    net = cv2.dnn.readNetFromONNX('toy_lstm.onnx')
cv2.error: OpenCV(4.5.4) /tmp/pip-req-build-w88qv8vs/opencv/modules/dnn/src/onnx/onnx_importer.cpp:739: error: (-2:Unspecified error) in function 'handleNode'
> Node [LSTM]:(28) parse error: OpenCV(4.5.4) /tmp/pip-req-build-w88qv8vs/opencv/modules/dnn/src/onnx/onnx_importer.cpp:463: error: (-5:Bad argument) Blob hx not found in const blobs in function 'getBlob'


This is the output of the requested command:

General configuration for OpenCV 4.5.4 =====================================
  Version control:               4.5.4-dirty

  Platform:
    Timestamp:                   2021-11-19T16:25:28Z
    Host:                        Linux 5.11.0-1021-azure x86_64
    CMake:                       3.22.0
    CMake generator:             Unix Makefiles
    CMake build tool:            /bin/gmake
    Configuration:               Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (15 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (4 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (30 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      NO
    C++ standard:                11
    C++ Compiler:                /usr/lib/ccache/compilers/c++  (ver 10.2.1)
    C++ flags (Release):         -Wl,-strip-all   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -Wl,-strip-all   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/lib/ccache/compilers/cc
    C flags (Release):           -Wl,-strip-all   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -Wl,-strip-all   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -L/root/ffmpeg_build/lib  -Wl,--gc-sections -Wl,--as-needed  
    Linker flags (Debug):        -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -L/root/ffmpeg_build/lib  -Wl,--gc-sections -Wl,--as-needed  
    ccache:                      YES
    Precompiled headers:         NO
    Extra dependencies:          /lib64/libopenblas.so Qt5::Core Qt5::Gui Qt5::Widgets Qt5::Test Qt5::Concurrent /lib64/libpng.so /lib64/libz.so dl m pthread rt
    3rdparty dependencies:       libprotobuf ade ittnotify libjpeg-turbo libwebp libtiff libopenjp2 IlmImf quirc ippiw ippicv

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python3 stitching video videoio
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 java python2 ts
    Applications:                -
    Documentation:               NO
    Non-free algorithms:         NO

  GUI:                           QT5
    QT:                          YES (ver 5.15.0 )
      QT OpenGL support:         NO
    GTK+:                        NO
    VTK support:                 NO

  Media I/O: 
    ZLib:                        /lib64/libz.so (ver 1.2.7)
    JPEG:                        libjpeg-turbo (ver 2.1.0-62)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         /lib64/libpng.so (ver 1.5.13)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.4.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES
      avcodec:                   YES (58.91.100)
      avformat:                  YES (58.45.100)
      avutil:                    YES (56.51.100)
      swscale:                   YES (5.7.100)
      avresample:                NO
    GStreamer:                   NO
    v4l/v4l2:                    YES (linux/videodev2.h)

  Parallel framework:            pthreads

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2020.0.0 Gold [2020.0.0]
           at:                   /tmp/pip-req-build-w88qv8vs/_skbuild/linux-x86_64-3.6/cmake-build/3rdparty/ippicv/ippicv_lnx/icv
    Intel IPP IW:                sources (2020.0.0)
              at:                /tmp/pip-req-build-w88qv8vs/_skbuild/linux-x86_64-3.6/cmake-build/3rdparty/ippicv/ippicv_lnx/iw
    VA:                          NO
    Lapack:                      YES (/lib64/libopenblas.so)
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.5.1)

  OpenCL:                        YES (no extra features)
    Include path:                /tmp/pip-req-build-w88qv8vs/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 3:
    Interpreter:                 /opt/_internal/cpython-3.6.15/bin/python (ver 3.6.15)
    Libraries:                   libpython3.6m.a (ver 3.6.15)
    numpy:                       /tmp/pip-build-env-ut15bkrp/overlay/lib/python3.6/site-packages/numpy/core/include (ver 1.13.3)
    install path:                python/cv2/python-3.6

  Python (for build):            /bin/python2.7

  Java:                          
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    /tmp/pip-req-build-w88qv8vs/_skbuild/linux-x86_64-3.6/cmake-install
-----------------------------------------------------------------

@asmorkalov
Copy link
Contributor

asmorkalov commented Nov 30, 2021

Model diagnostic tool output:

./opencv_model_diagnostics -m=./toy_lstm.onnx 
[ERROR:0] global /home/alexander/Projects/opencv/modules/dnn/src/onnx/onnx_importer.cpp (700) handleNode DNN/ONNX: Potential problem during processing node with 7 inputs and 3 outputs: [LSTM]:(28)
OpenCV(4.5.4-dev) /home/alexander/Projects/opencv/modules/dnn/src/onnx/onnx_importer.cpp:463: error: (-5:Bad argument) Blob hx not found in const blobs in function 'getBlob'

@asmorkalov
Copy link
Contributor

The issue is reproduced with OpenCV 4.5.4 on Ubuntu 18.04. OpenCV expects hx as constant tensor, but not as input tensor.

@asmorkalov asmorkalov added bug confirmed There is stable reproducer / investigation complete labels Nov 30, 2021
@asmorkalov
Copy link
Contributor

toy_lstm.onnx.zip

@leochan2009
Copy link

Any progress? I have same issue with LSTM model converted from Pytorch to Onnx

@asmorkalov
Copy link
Contributor

@asmorkalov
Copy link
Contributor

Status for current 4.x branch:

./bin/opencv_model_diagnostics -m=toy_lstm.onnx 
[ERROR:0@0.017] global /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/onnx/onnx_importer.cpp (1033) handleNode DNN/ONNX: Potential problem during processing node with 7 inputs and 3 outputs: [LSTM]:(onnx_node_output_0!28) from domain='ai.onnx'
OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/onnx/onnx_importer.cpp:1582: error: (-215:Assertion failed) shape(blob) == blobShape in function 'lstm_extractConsts'

[ERROR:0@0.018] global /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/onnx/onnx_importer.cpp (1045) handleNode DNN/ONNX: Layer of type LSTM(LSTM) cannot be created with parameters depth : 5
has_dynamic_shapes : 0
hidden_size : 512
. Error: OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/layers/recurrent_layers.cpp:162: error: (-215:Assertion failed) blobs.size() >= 3 in function 'LSTMLayerImpl'

@asmorkalov asmorkalov assigned Abdurrahheem and unassigned rogday Mar 13, 2023
asmorkalov pushed a commit that referenced this issue Apr 24, 2023
Fix ONNX parser for single-layer LSTM hidden and cell states #23475

### Fix ONNX parser for single-layer LSTM hidden and cell states

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake


This PR addresses #21118 [issue](#21118). The problem is that the ONNX parser is unable to read the hidden state and cell state for single-layer LSTMs. This PR fixes the issue by updating the parser to correctly read hidden and cell states.
@asmorkalov
Copy link
Contributor

@Abdurrahheem The parser issue is still there:

./opencv_model_diagnostics -m=toy_lstm.onnx 
[ERROR:0@0.016] global onnx_importer.cpp:1039 handleNode DNN/ONNX: Potential problem during processing node with 7 inputs and 3 outputs: [LSTM]:(onnx_node_output_0!28) from domain='ai.onnx'
OpenCV(4.7.0-dev) /home/alexander/Projects/OpenCV/opencv-master/modules/dnn/src/onnx/onnx_importer.cpp:1623: error: (-215:Assertion failed) shape(blob) == blobShape in function 'lstm_extractConsts'

[ERROR:0@0.016] global onnx_importer.cpp:1051 handleNode DNN/ONNX: Layer of type LSTM(LSTM) cannot be created with parameters depth : 5
has_dynamic_shapes : 0
hidden_size : 512
. Error: OpenCV(4.7.0-dev) /home/alexander/Projects/OpenCV/opencv-master/modules/dnn/src/layers/recurrent_layers.cpp:162: error: (-215:Assertion failed) blobs.size() >= 3 in function 'LSTMLayerImpl'

@asmorkalov asmorkalov added this to the 4.8.0 milestone Apr 24, 2023
@Abdurrahheem
Copy link
Contributor

Abdurrahheem commented Apr 24, 2023

Not able to reproduce on branch lstm_fix_initialization locally. Are you sure you have run the command it on that branch or on the 4.x?

@Abdurrahheem
Copy link
Contributor

The reason you see get this error might be because of the toy_lstm.onnx file you are using. Here is it onnx graph

image

And here is a graph generated using python script mentioned above

image

@Abdurrahheem
Copy link
Contributor

Abdurrahheem commented Apr 25, 2023

This type of ONNX LSTM graph (where weight matrixes are defined as inputs to LSTM layer) is not supported by OpenCV currently for this reason

AND

This type of graph (where hidden states matrixes are defined as inputs to LSTM layer) is fully supported and was added in this PR. Please use export_params=True when exporting your model in torch or onnx to avoid the first scenario

@asmorkalov
Copy link
Contributor

Test for the mentioned case:#23545

thewoz pushed a commit to thewoz/opencv that referenced this issue Jan 4, 2024
…tion

Fix ONNX parser for single-layer LSTM hidden and cell states opencv#23475

### Fix ONNX parser for single-layer LSTM hidden and cell states

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake


This PR addresses opencv#21118 [issue](opencv#21118). The problem is that the ONNX parser is unable to read the hidden state and cell state for single-layer LSTMs. This PR fixes the issue by updating the parser to correctly read hidden and cell states.
thewoz pushed a commit to thewoz/opencv that referenced this issue May 29, 2024
…tion

Fix ONNX parser for single-layer LSTM hidden and cell states opencv#23475

### Fix ONNX parser for single-layer LSTM hidden and cell states

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake


This PR addresses opencv#21118 [issue](opencv#21118). The problem is that the ONNX parser is unable to read the hidden state and cell state for single-layer LSTMs. This PR fixes the issue by updating the parser to correctly read hidden and cell states.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug category: dnn (onnx) ONNX suport issues in DNN module category: dnn confirmed There is stable reproducer / investigation complete
Projects
None yet
Development

No branches or pull requests

5 participants