Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM from pytorch to tensorflow: "Squeeze" messes up rank #1383

Open
rumschuettel opened this issue Sep 5, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@rumschuettel
Copy link

commented Sep 5, 2018

Hi! I'm trying to export a pytorch (installed from master branch) LSTM to ONNX (1.3.0), and then to import it into tensorflow (tf-nightly).

Exporting works. The relevant graph bit looks like

  %68 : Dynamic = onnx::Slice[axes=[0], ends=[2], starts=[1]](%1), scope: foo/LSTM[lstm]
  %69 : Dynamic = onnx::Slice[axes=[0], ends=[2], starts=[1]](%2), scope: foo/LSTM[lstm]
  %70 : Dynamic, %71 : Dynamic, %72 : Dynamic = onnx::LSTM[hidden_size=200](%47, %65, %66, %67, %21, %68, %69), scope: foo/LSTM[lstm]
  %73 : Dynamic = onnx::Squeeze[axes=[1]](%70), scope: foo/LSTM[lstm]
  %74 : Dynamic = onnx::Slice[axes=[0], ends=[200], starts=[0]](%11), scope: foo/LSTM[lstm]
  %75 : Dynamic = onnx::Slice[axes=[0], ends=[800], starts=[600]](%11), scope: foo/LSTM[lstm]

Inspecting said file after onnx.load gives me an identical output, so I assume the export from pytorch, and import into onnx works:

  %68 = Slice[axes = [0], ends = [2], starts = [1]](%1)
  %69 = Slice[axes = [0], ends = [2], starts = [1]](%2)
  %70, %71, %72 = LSTM[hidden_size = 200](%47, %65, %66, %67, %, %68, %69)
  %73 = Squeeze[axes = [1]](%70)
  %74 = Slice[axes = [0], ends = [200], starts = [0]](%11)
  %75 = Slice[axes = [0], ends = [800], starts = [600]](%11)

So far so good. Unfortunately, importing with onnx_tf.backend.prepare yields the following error:

File "./onnx2tf.py", line 28, in <module>
    tf_rep = prepare(model)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/onnx_tf/backend.py", line 348, in prepare
    model.graph, opset=model.opset_import[0].version))
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/onnx_tf/backend.py", line 324, in onnx_graph_to_tensorflow_net
    node, tensor_dict, opset=opset)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/onnx_tf/backend.py", line 407, in _onnx_node_to_tensorflow_op
    return method_to_call(node, input_dict)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/onnx_tf/backends/backend_v1.py", line 713, in handle_l_s_t_m
    cell, input_dict[node.inputs[0]], time_major=True, dtype=tf.float32)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 664, in dynamic_rnn
    dtype=dtype)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 727, in _dynamic_rnn_loop
    for input_ in flat_input)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 727, in <genexpr>
    for input_ in flat_input)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 765, in with_rank_at_least
    raise ValueError("Shape %s must have rank at least %d" % (self, rank))
ValueError: Shape (100, 200) must have rank at least 3

Indeed, the Squeeze call seems to eliminate the batch dimension, as exporting from pytorch with batch_size>1 results in an error being raised because Squeeze cannot eliminate a dimension with size not equal to 1.

Any idea what's going on here?
Let me know if you need more information.

Thanks a bunch!
/ J

@houseroad

This comment has been minimized.

Copy link
Member

commented Sep 5, 2018

@tjingrant @fumihwh would you like to take a look?

@fumihwh

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

@rumschuettel
Seems you are using old version of onnx-tf.
Could you try master branch?

@rumschuettel

This comment has been minimized.

Copy link
Author

commented Sep 6, 2018

Ok, I installed (in a fresh conda environment) tf-nightly, onnx from source, and onnx-tf from source; the graph after onnx.load looks identical to the one printed above, but the error now occurs somewhere else:

Fail to get since_version of Expand in domain `` with max_inclusive_version=7. Set to 1.
Traceback (most recent call last):
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1627, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [100,1,200], [200,27].

During handling of the above exception, another exception occurred:

  File "./onnx2tf.py", line 28, in <module>
    tf_rep = prepare(model)
  File "/home/foo/opt/onnx-tensorflow/onnx_tf/backend.py", line 76, in prepare
    return cls.onnx_model_to_tensorflow_rep(model, strict)
  File "/home/foo/opt/onnx-tensorflow/onnx_tf/backend.py", line 87, in onnx_model_to_tensorflow_rep
    return cls._onnx_graph_to_tensorflow_rep(model.graph, model.opset_import, strict)
  File "/home/foo/opt/onnx-tensorflow/onnx_tf/backend.py", line 141, in _onnx_graph_to_tensorflow_rep
    onnx_node, tensor_dict, handlers, opset=opset, strict=strict)
  File "/home/foo/opt/onnx-tensorflow/onnx_tf/backend.py", line 236, in _onnx_node_to_tensorflow_op
    return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
  File "/home/foo/opt/onnx-tensorflow/onnx_tf/handlers/handler.py", line 60, in handle
    return ver_handle(node, **kwargs)
  File "/home/foo/opt/onnx-tensorflow/onnx_tf/handlers/backend/mat_mul.py", line 14, in version_1
    return [cls.make_tensor_from_onnx_node(node, **kwargs)]
  File "/home/foo/opt/onnx-tensorflow/onnx_tf/handlers/backend_handler.py", line 111, in make_tensor_from_onnx_node
    return cls._run_tf_func(tf_func, inputs, attrs)
  File "/home/foo/opt/onnx-tensorflow/onnx_tf/handlers/backend_handler.py", line 180, in _run_tf_func
    **dict([(p, attrs[p]) for p in params if p in attrs]))
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2053, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul
    name=name)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3273, in create_op
    op_def=op_def)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1791, in __init__
    control_input_ops)
  File "/home/foo/opt/anaconda5/envs/nlp-onnx/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1630, in _create_c_op
    raise ValueError(str(e))
ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [100,1,200], [200,27].

The only MatMul layer in the graph is towards the end, where we have

  %121 = Slice[axes = [0], ends = [4], starts = [3]](%2)
  %122, %123, %124 = LSTM[hidden_size = 200](%99, %117, %118, %119, %, %120, %121)
  %125 = Squeeze[axes = [1]](%122)
  %126 = Concat[axis = 0](%45, %71, %97, %123)
  %127 = Concat[axis = 0](%46, %72, %98, %124)
  %128 = Transpose[perm = [1, 0]](%19)
  %129 = MatMul(%125, %128)
  %130 = Add(%129, %20)
  %131 = LogSoftmax[axis = 2](%130)
  return %131, %126, %127

So I dug a little deeper; the master branch of onnx-tf still sais it only works with onnx@1.1.2, so I installed that version with pip install "onnx==1.1.2", but that raised the same exception, and additionally some UserWarnings about unknown operations; so that's not better.

Any thoughts? Seems like we almost got to the root of the problem :) and I appreciate your help, thanks a lot!

/ J

@rumschuettel

This comment has been minimized.

Copy link
Author

commented Sep 6, 2018

Ok, I fixed this the following way:

instead of calling my model with a batch size of 1, I don't give it a batch at all, just a single vector.
Before feeding it into the LSTM, I unsqueeze(1) a fake batch dimension in; after that, and before the following linear layers, I squeeze() the dimension away again.

Works.

Thanks a lot for your help anyways! If this is a bug/something to be improved in ONNX let me know if I can be of help.

/ J

@rumschuettel

This comment has been minimized.

Copy link
Author

commented Sep 6, 2018

Just as a comment, that the error

Fail to get since_version of Expand in domain `` with max_inclusive_version=7. Set to 1.

still occurs, in case you want to check that bit.

@rumschuettel rumschuettel reopened this Sep 6, 2018

@fumihwh

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

@rumschuettel
It's not an error. ( Or will not affect backend, onnx->tensorflow. )
The reason you get this is your onnx model's opset is 7. We can not find Expand schema because it is added from 8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.