Conv2DTranspose - error due to Slice operation with None shape #801

carlosgalvezp · 2020-02-12T14:02:52Z

Hi!

I found some issue when converting a frozen network to ONNX. The problem comes with Conv2DTranspose (btw, is it officially supported? I can't find it in the list). Here comes the error, running from this PR: #797

2020-02-12 14:58:02,409 - INFO - Using tensorflow=1.14.0, onnx=1.6.0, tf2onnx=1.6.0/82f805
2020-02-12 14:58:02,409 - INFO - Using opset <onnx, 11>
2020-02-12 14:58:03,438 - ERROR - Failed to convert node conv_transposed/BiasAdd
OP=Add
Name=conv_transposed/BiasAdd
Inputs:
	Slice__1179:0=Slice, None, 1
	conv_transposed/bias:0=Const, [16], 1
Outpus:
	conv_transposed/BiasAdd:0=[1, 16, 152, 240], 1
Traceback (most recent call last):
  File "tf2onnx/tfonnx.py", line 354, in tensorflow_onnx_mapping
    func(g, node, **kwargs)
  File "tf2onnx/onnx_opset/nn.py", line 422, in version_7
    new_broadcast_shape = [shape1[0]] + [1] * (len(shape0) - 2)
TypeError: object of type 'NoneType' has no len()

The problem comes from Slice__1179:0 having None shape. This layer is not part of the frozen model, but instead is inserted by ONNX (I believe) here:

https://github.com/onnx/tensorflow-onnx/blob/master/tf2onnx/onnx_opset/nn.py#L288

That slice_node variable has shape None when created.

Why is that happening? Let me know if I can provide with more information to help figure this out.
Thanks!

The text was updated successfully, but these errors were encountered:

jignparm · 2020-02-12T22:35:24Z

PR #797 should not impact the conv_transposed/BiasAdd node.

Can you share the model? Also, are you able to reproduce the error without the changes in the PR ?

jignparm · 2020-02-12T22:59:40Z

I see the comment from issue #790 -- looks like you are not able to share the model :/

Can you re-create a smaller model with similar behavior and upload?

It looks like the shape information for input[0] is missing (quite odd). You should be able to set a breakpoint in TF2ONNX and inspect the ctx object to see if the input exists or not. It seems like the input node does not exist (and hence no shape), but would require a debugger to verify.

            shape0 = ctx.get_shape(node.input[0])  #<-- here
            shape1 = ctx.get_shape(node.input[1])
            if node.inputs[1].type == 'Const' and len(shape1) == 1:
                new_broadcast_shape = [shape1[0]] + [1] * (len(shape0) - 2)

carlosgalvezp · 2020-02-13T09:13:29Z

Hi,
I will try to recreate a smaller model reproducing the behavior. Meanwhile I can share the debugging process.

First of all, here's the log with more verbose output:

2020-02-13 10:01:24,499 - DEBUG - tf2onnx.graph: Making node: Name=conv_transposed/conv2d_transpose_const_one_two__1178, OP=Const
2020-02-13 10:01:24,500 - DEBUG - tf2onnx.graph: Made node: conv_transposed/conv2d_transpose_const_one_two__1178
OP=Const
Name=conv_transposed/conv2d_transpose_const_one_two__1178
Outpus:
        conv_transposed/conv2d_transpose_const_one_two__1178=None, 7
2020-02-13 10:01:24,500 - DEBUG - tf2onnx.graph: Making node: Name=Slice__1179, OP=Slice
2020-02-13 10:01:24,500 - DEBUG - tf2onnx.graph: Infer shape and dtype for [Slice__1179]
2020-02-13 10:01:24,501 - DEBUG - tf2onnx.graph: Set dtype of [Slice__1179:0] to 1
2020-02-13 10:01:24,501 - DEBUG - tf2onnx.graph: Inferred shape for [Slice__1179, type: Slice] is None, SKIP
2020-02-13 10:01:24,501 - DEBUG - tf2onnx.graph: Made node: Slice__1179
OP=Slice
Name=Slice__1179
Inputs:
        conv_transposed/conv2d_transpose:0=ConvTranspose, [1, 16, 152, 240], 1
        Concat__1176:0=Concat, [2], 7
        Concat__1177:0=Concat, [2], 7
        conv_transposed/conv2d_transpose_const_one_two__1178=Const, [2], 7
Outpus:
        Slice__1179:0=None, 1
2020-02-13 10:01:24,529 - DEBUG - tf2onnx.tfonnx: Process node: conv_transposed/BiasAdd
OP=BiasAdd
Name=conv_transposed/BiasAdd
Inputs:
        Slice__1179:0=Slice, None, 1
        conv_transposed/bias:0=Const, [16], 1
Outpus:
        conv_transposed/BiasAdd:0=[1, 16, 152, 240], 1
2020-02-13 10:01:24,529 - ERROR - tf2onnx.tfonnx: Failed to convert node conv_transposed/BiasAdd
OP=Add
Name=conv_transposed/BiasAdd
Inputs:
        Slice__1179:0=Slice, None, 1
        conv_transposed/bias:0=Const, [16], 1
Outpus:
        conv_transposed/BiasAdd:0=[1, 16, 152, 240], 1
Traceback (most recent call last):
  File "tf2onnx/tfonnx.py", line 354, in tensorflow_onnx_mapping
    func(g, node, **kwargs)
  File "tf2onnx/onnx_opset/nn.py", line 422, in version_7
    new_broadcast_shape = [shape1[0]] + [1] * (len(shape0) - 2)
TypeError: object of type 'NoneType' has no len()

I notice two things:

Outpus:
        conv_transposed/conv2d_transpose_const_one_two__1178=None, 7

2020-02-13 10:01:24,501 - DEBUG - tf2onnx.graph: Inferred shape for [Slice__1179, type: Slice] is None, SKIP

Why does the Slice node get None shape?!

Now, the step-by step debugging:

input[0] is not a node that we have in the TensorFlow graph. It's a node (Slice) that was inserted by tf2onnx, as I mentioned in the previous comment: https://github.com/onnx/tensorflow-onnx/blob/master/tf2onnx/onnx_opset/nn.py#L288:

            slice_node = ctx.make_node("Slice",
                                       [node.output[0], starts.output[0], ends.output[0], const_one_two.output[0]])

With the following parameters:

node.output[0] = conv_transposed/conv2d_transpose:0 (from TensorFlow model)
starts.output[0] = Concat__1176:0 (presumably inserted by tf2onnx)
ends.output[0] = Concat__1177:0 (presumably inserted by tf2onnx)
const_one_two.output[0] = conv_transposed/conv2d_transpose_const_one_two__1178 (presumably inserted by tf2onnx)

From make_node we go here: https://github.com/onnx/tensorflow-onnx/blob/master/tf2onnx/graph.py#L545

        if (not shapes or not dtypes) and infer_shape_dtype:
            self.update_node_shape_dtype(node, override=False)

In this case, both shapes and dtypes are None, and infer_shape_dtype=True.

Getting inside update_node_shape_dtype, we go to: https://github.com/onnx/tensorflow-onnx/blob/master/tf2onnx/graph.py#L657

        shapes, dtypes = infer_onnx_shape_dtype(node, self._opset, input_shapes, input_dtypes, initializers)
        if not shapes or not dtypes:
            return

This function returns shapes = [None], dtypes=[1], so the if condition evaluates to False (since [None] evaluates to True) and it doesn't return.

Hope this extra debugging information helps figuring out the issue. Otherwise I'll try to come with a reproducible sample :)

Thanks a lot!

carlosgalvezp · 2020-02-13T11:53:58Z

Hi again!

Got a reproducible model for you :) It contains simply the Conv2D Transpose. I attach the script to generate it is well as the protobuf. The error is the same as before:

python -m tf2onnx.convert --input model_tf_test.pb --inputs Placeholder:0 --outputs conv_transposed/BiasAdd:0 --opset 11

2020-02-13 12:49:21,351 - INFO - Using tensorflow=1.14.0, onnx=1.6.0, tf2onnx=1.6.0/82f805
2020-02-13 12:49:21,351 - INFO - Using opset <onnx, 11>
2020-02-13 12:49:21,374 - ERROR - Failed to convert node conv_transposed/BiasAdd
OP=Add
Name=conv_transposed/BiasAdd
Inputs:
	Slice__50:0=Slice, None, 1
	conv_transposed/bias:0=Const, [16], 1
Outpus:
	conv_transposed/BiasAdd_raw_output___3:0=[1, 16, 152, 240], 1
Traceback (most recent call last):
  File "tf2onnx/tfonnx.py", line 354, in tensorflow_onnx_mapping
    func(g, node, **kwargs)
  File "tf2onnx/onnx_opset/nn.py", line 422, in version_7
    new_broadcast_shape = [shape1[0]] + [1] * (len(shape0) - 2)
TypeError: object of type 'NoneType' has no len()

conv2d_transpose_bug.zip

Thanks!

jignparm · 2020-02-14T01:47:21Z

What version of TF are you using?

Using your script to generate a TF model (call it Model B), I was able to convert the pb model to onnx successfully, using TF 1.14, using the master branch of TF2ONNX.

However, the original model you shared (call it Model A) shows the same error that you posted above.

Model A looks different from Model B (see below). The same script generated 2 different looking models ( ? ) -- are you using TF 2.0? Can you try switching to TF 1.14?

EDIT -- ideally, the TF version should not matter, since TF2ONNX operators on the graph directly. However, the different graphs seem to indicate 2 different implementations of the Conv2DTranspose operator. It's possible that one of the implementations is not correct and was patched.

Model A

Model B

Conversion log for Model B

$ python -m tf2onnx.convert --input model_tf_test.pb --inputs Placeholder:0 --outputs conv_transposed/BiasAdd:0 --opset 11 --output converted.onnx
...
...
2020-02-14 01:40:44,580 - INFO - Using tensorflow=1.14.0, onnx=1.6.0, tf2onnx=1.6.0/82f805
2020-02-14 01:40:44,580 - INFO - Using opset <onnx, 11>
2020-02-14 01:40:44,636 - INFO - Optimizing ONNX model
2020-02-14 01:40:44,730 - INFO - After optimization: Const -53 (65->12), Gather +1 (0->1), Identity -1 (1->0), Reshape -1 (1->0), Squeeze -5 (7->2), Transpose -2 (4->2), Unsqueeze -6 (8->2)
2020-02-14 01:40:44,733 - INFO -
2020-02-14 01:40:44,733 - INFO - Successfully converted TensorFlow model model_tf_test.pb to ONNX
2020-02-14 01:40:44,734 - INFO - ONNX model is saved at converted.onnx

carlosgalvezp · 2020-02-14T08:09:37Z

Wow, that's very strange! Here's my TensorFlow version:

$ pip freeze | grep tensorflow 
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0

Are you using the CPU or GPU version?

I'm verifying checksums but unfortunately the models seem to contain some time information, since the checksum varies every time I generate the model. I'll check the models with your tools as well.

My current commands are:

python3 conv2d_transpose_bug.py 
python3 -m tf2onnx.convert --input model_tf_test.pb --inputs Placeholder:0 --outputs conv_transposed/BiasAdd:0 --opset 11 --output converted.onnx

This is tested on latest tf2onnx master (3b4f375)

Just in case, I'm also using Python 3.6.9.

carlosgalvezp · 2020-02-14T08:20:07Z

Bingo, I uninstalled the GPU version and installed the CPU one, and it works now!

Not sure what to make out of this though :) We do have some layers in our network that only have GPU implementation, so I guess we still want to freeze our graph using the GPU version of TF?

jignparm · 2020-02-14T18:01:25Z

Good to know the difference is from CPU vs GPU implementation. A graph should be able to run on any hardware however. I'll investigate to see why the conversion is failing for the GPU version of the graph.

guschmue · 2020-02-14T18:09:57Z

We do remove all device information from the model so in theory gpu should not have an impact (and I never saw an issue). But I can take a closer look how we could get here.

Using the gpu (that really don't help us) has the disadvantage that another app might be using the memory in which case tf2onnx would fail without good reason. I have a bug to bind tf2onnx to cpu here: #606 which I should fix.

jignparm · 2020-02-14T23:26:36Z

so in theory gpu should not have an impact

Thanks @guschmue . The Onnx graphs are agnostic of the hardware, so every graph runs on every hardware to produce exactly the same results. TF graphs should be similar (I have not see any counter-example so far).

It seems bug #606 is running unit tests and pre-trained model tests, where tf2onnx should be forced to run on cpu.

In this case though, the graph generated by TF-GPU package is different from TF-CPU package when using the exact same Python code. In theory, it seems like both should both be convertible to Onnx (but only 1 converts successfully).

guschmue · 2020-02-14T23:37:06Z

#606 is for example: you have a model in a jupyter notebook and you convert the model in the same notebook ... depending on how you write the notebook a session might be still open and the gpu memory could be allocated to the model ... tf2onnx would fail.

In tensorflow - the model is aware of the device various nodes run on so it is possible that some optimizer kicks in and applies device specific optimizations.
Its relatively easy to fix by just adding ```with tf.device("/device/cpu:0") when we open the session in convert.py. For unit tests - the tf2 branch has a single function tf_session() and we can do that in there so it applies to all places where we open a session.

carlosgalvezp · 2020-02-17T16:21:28Z

A graph should be able to run on any hardware however.

That's not necessarily true, as there are some layers with some configurations that only have GPU implementation, right? For example we have a layer with "channels_first" configuration that only has GPU implementation, whereas "channels_last" has both implementations. Since it doesn't have CPU implementation it cannot be deployed into a protobuf because the model cannot be instantiated. I don't remember exactly what layer it was, will come back with that info.

jignparm · 2020-02-17T23:35:35Z

For example we have a layer with "channels_first" configuration that only has GPU implementation, whereas "channels_last" has both implementations.
... will come back with that info.

Thanks! -- an example in TF would be great (i.e. a graph that is valid only for GPU but not for CPU).

In ONNX, a graph is always valid for every type of hardware, if it consists of the standard set of operators. The operators will always have a CPU implementation, but may additionally contain a GPU implementation for acceleration (depending on the runtime). When a graph is run in a session, only the operators with a GPU implementation will execute on GPU, while the remaining operators execute on a CPU.

carlosgalvezp · 2020-02-18T08:35:12Z

Hi! Here's an example similar to the one I sent before, but using "Conv2D" instead of "Conv2DTranspose". When running on TF-CPU, I get the following error:

tensorflow.python.framework.errors_impl.UnimplementedError: Generic conv implementation only supports NHWC tensor format for now.
	 [[{{node conv_transposed/Conv2D}}]]

conv2d_gpu_only.zip

jignparm · 2020-02-18T09:31:44Z

Thanks for the sample script!

According to the documentation https://www.tensorflow.org/api_docs/python/tf/compat/v1/layers/conv2d,, NCHW tensors (i.e. the option data_format='channels_first' in the attached script) should be supported, but the error message seems to indicate that the implementation is missing for now (i.e. had it been implemented, the rest of the model graph would run fine).

This seems to be a rare case of of an operator having a GPU implementation before CPU implementation is finished, rather than a graph that is only valid for CPU. If the operator is updated to support NCHW tensors, the graph would run fine on CPU as well (i.e. graph is valid, but implementation of operator on CPU is lacking as of this time).

In ONNX there is no such case (i.e. a CPU implementation is always present for every operator). It seems that in TF there is no such guarantee.

It seems like the GPU graph that you generated should convert to ONNX as well. I'll investigate further to see why it does not.

carlosgalvezp · 2020-02-18T14:29:04Z

Thanks for the help! Let me know if you need anything :)

carlosgalvezp · 2020-03-09T14:14:07Z

Hi! Did you make any progress on this one? Anything I can help with?

jignparm · 2020-03-12T00:02:12Z

@carlosgalvezp -- sorry for the delay. There's a shape inference error which requires a bit more debugging. Hopefully will resolve this shortly.

jignparm · 2020-03-24T20:00:09Z

@carlosgalvezp , can you use the master branch of tf2onnx to convert this model?

You should now be able to convert the model generated from the TensorFlow-GPU 1.14 package successfully now, due to a combination of fixes. Also, I tested that the model loads in OnnxRuntime without any shape inference errors.

Feel free to close out this issue if the solution works for you.

carlosgalvezp · 2020-03-25T13:38:53Z

Works like a charm on latest master, thanks a lot!! I had to choose opset 11 for it to work, not sure if it's expected.

Closing the issue :)

Hyrtsi · 2022-03-16T13:35:26Z

I had the same issue. Downgrading tensorflow-gpu from 1.15 to 1.14 and tensorflow from 1.15 to 1.14 solved it for me.

carlosgalvezp changed the title ~~Conv2D transpose - error due to Slice operation with None shape~~ Conv2DTranspose - error due to Slice operation with None shape Feb 12, 2020

rmccorm4 mentioned this issue Mar 16, 2020

SSDMobilenet2ONNX Tutorial #842

Closed

carlosgalvezp closed this as completed Mar 25, 2020

carlosgalvezp mentioned this issue Apr 16, 2020

Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32 #883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conv2DTranspose - error due to Slice operation with None shape #801

Conv2DTranspose - error due to Slice operation with None shape #801

carlosgalvezp commented Feb 12, 2020

jignparm commented Feb 12, 2020

jignparm commented Feb 12, 2020

carlosgalvezp commented Feb 13, 2020 •

edited

carlosgalvezp commented Feb 13, 2020

jignparm commented Feb 14, 2020 •

edited

carlosgalvezp commented Feb 14, 2020

carlosgalvezp commented Feb 14, 2020

jignparm commented Feb 14, 2020

guschmue commented Feb 14, 2020

jignparm commented Feb 14, 2020

guschmue commented Feb 14, 2020

carlosgalvezp commented Feb 17, 2020

jignparm commented Feb 17, 2020

carlosgalvezp commented Feb 18, 2020

jignparm commented Feb 18, 2020

carlosgalvezp commented Feb 18, 2020

carlosgalvezp commented Mar 9, 2020

jignparm commented Mar 12, 2020

jignparm commented Mar 24, 2020

carlosgalvezp commented Mar 25, 2020

Hyrtsi commented Mar 16, 2022

Conv2DTranspose - error due to Slice operation with None shape #801

Conv2DTranspose - error due to Slice operation with None shape #801

Comments

carlosgalvezp commented Feb 12, 2020

jignparm commented Feb 12, 2020

jignparm commented Feb 12, 2020

carlosgalvezp commented Feb 13, 2020 • edited

carlosgalvezp commented Feb 13, 2020

jignparm commented Feb 14, 2020 • edited

Model A

Model B

Conversion log for Model B

carlosgalvezp commented Feb 14, 2020

carlosgalvezp commented Feb 14, 2020

jignparm commented Feb 14, 2020

guschmue commented Feb 14, 2020

jignparm commented Feb 14, 2020

guschmue commented Feb 14, 2020

carlosgalvezp commented Feb 17, 2020

jignparm commented Feb 17, 2020

carlosgalvezp commented Feb 18, 2020

jignparm commented Feb 18, 2020

carlosgalvezp commented Feb 18, 2020

carlosgalvezp commented Mar 9, 2020

jignparm commented Mar 12, 2020

jignparm commented Mar 24, 2020

carlosgalvezp commented Mar 25, 2020

Hyrtsi commented Mar 16, 2022

carlosgalvezp commented Feb 13, 2020 •

edited

jignparm commented Feb 14, 2020 •

edited