cannot convert tf savedmodel to onnx #1287

zhaohb · 2021-01-22T02:27:38Z

System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Linux Ubuntu 18.04
TensorFlow installed from (source or binary):
binary
TensorFlow version (use command below):
tf-nightly-gpu 2.5.0.dev20210119
Python version:
3.6 (Anaconda)
Tensorflow-onnx version:
1.8.0. build from source

my command line :

python -m tf2onnx.convert --saved-model ./model.savedmodel --output fea.onnx --custom-ops Bucketize,AsString,StringToHashBucketFast --signature_def serving_default --tag serve --opset 12

But I got the following error：

......
2021-01-21 11:29:41,413 - ERROR - Could not find table resource to replace placeholder unknown_172
2021-01-21 11:29:41,415 - ERROR - Could not find table resource to replace placeholder unknown_174
2021-01-21 11:29:41,416 - ERROR - Could not find table resource to replace placeholder unknown_176
2021-01-21 11:29:41,417 - ERROR - Could not find table resource to replace placeholder unknown_178
2021-01-21 11:29:41,418 - ERROR - Could not find table resource to replace placeholder unknown_180
2021-01-21 11:29:41,418 - ERROR - Could not find table resource to replace placeholder unknown_183
2021-01-21 11:29:41,418 - ERROR - Could not find table resource to replace placeholder unknown_185
2021-01-21 11:29:41,418 - ERROR - Could not find table resource to replace placeholder unknown_187
2021-01-21 11:29:41,418 - ERROR - Could not find table resource to replace placeholder unknown_189
2021-01-21 11:29:41,418 - ERROR - Could not find table resource to replace placeholder unknown_193
2021-01-21 11:29:41,418 - ERROR - Could not find table resource to replace placeholder unknown_195
2021-01-21 11:29:41,419 - ERROR - Could not find table resource to replace placeholder unknown_197
......
tensorflow.python.framework.errors_impl.InvalidArgumentError: 'func' argument to TF_GraphCopyFunction cannot be null
Exception ignored in: <bound method CapturableResourceDeleter.__del__ of <tensorflow.python.training.tracking.tracking.CapturableResourceDeleter object at 0x7f70486cbcf8>>
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py", line 208, in __del__
    self._destroy_resource()
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 797, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 841, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 695, in _initialize
    *args, **kwds))
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2981, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3373, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3218, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 998, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 603, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/saved_model/function_deserialization.py", line 257, in restored_function_body
    return _call_concrete_function(function, inputs)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/saved_model/function_deserialization.py", line 75, in _call_concrete_function
    result = function._call_flat(tensor_inputs, function._captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 116, in _call_flat
    cancellation_manager)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1944, in _call_flat
    flat_outputs = forward_function.call(ctx, args_with_tangents)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 590, in call
    executor_type=executor_type)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/ops/functional_ops.py", line 1206, in partitioned_call
    f.add_to_graph(graph)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 506, in add_to_graph
    g._add_function(self)
  File "/usr/local/anaconda3/envs/tf2.2-n/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3403, in _add_function
    gradient)

I want to get the ONNX model, desperate for some advice！

thank you very much

TomWildenhain-Microsoft · 2021-01-22T19:28:26Z

Hi @zhaohb, can you please upload a copy of the saved model?

zhaohb · 2021-01-23T03:06:53Z

hi,@TomWildenhain-Microsoft, colab link https://colab.research.google.com/drive/1wxu8piPR9qyAC8EjtDd6-STZqek77BO7?usp=sharing,
I can also send the model file to you by email. Is that OK?

TomWildenhain-Microsoft · 2021-01-27T19:55:24Z

I have requested access to the colab

zhaohb · 2021-01-28T07:25:08Z

@TomWildenhain-Microsoft , sorry for the late reply, I have added you to the user group

TomWildenhain-Microsoft · 2021-01-28T18:14:05Z

Shoot, it looks like this colab requires a tar file I don't have. Can you please zip the saved model directory you are trying to convert and upload it to google drive?

zhaohb · 2021-01-29T02:15:10Z

model file link : https://drive.google.com/file/d/1OmfoxcalmJMpW3QyOXnFWkUygn58CTZe/view , I had add you to user group.

zhaohb · 2021-01-29T15:26:18Z

@TomWildenhain-Microsoft Can you repeat the experiment? I'm waiting for your reply.

TomWildenhain-Microsoft · 2021-01-29T19:28:54Z

Taking a look now.

TomWildenhain-Microsoft · 2021-01-29T22:02:45Z

The error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: 'func' argument to TF_GraphCopyFunction cannot be null
Is during resource destruction and can be safely ignored. The bigger problem are the errors of the form:
2021-01-21 11:29:41,413 - ERROR - Could not find table resource to replace placeholder unknown_172

I'm not sure why it isn't finding the tables, but interestingly they are int64 to int64 tables, not the usual string to int64 tables, so currently we won't be able to convert them anyway.

Do you know why this model has int64 -> int64 hash tables? What type of model is this?

TomWildenhain-Microsoft · 2021-01-29T22:18:39Z

From the values, it looks like these tables are storing some sort of permutations. Can you change the model to use gather ops instead of tables?

LookupTableExportV2(keys=<tf.Tensor: shape=(31,), dtype=int64, numpy=
array([21,  2, 14, 26,  7, 19,  0, 12, 24,  5, 17, 29, 10, 22,  3, 15, 27,
        8, 20,  1, 13, 25,  6, 18, 30, 23, 11,  4, 28, 16,  9],
      dtype=int64)>, values=<tf.Tensor: shape=(31,), dtype=int64, numpy=
array([28,  4, 11, 29,  0, 10,  8, 15, 23,  1, 18, 26, 13, 20,  2, 17, 27,
        9, 21,  6, 12, 30,  7, 16, 25, 24, 14,  5, 22, 19,  3],
      dtype=int64)>)

zhaohb · 2021-01-30T02:38:57Z

Thank you for your reply, I also think the most important mistake is:

 ERROR - Could not find table resource to replace placeholder unknown_172

but now I can't change the model, so should we add a feature to onnx to fix this bug？can't find table source, probably because resource type is used.I think this bug is going to be very common.

TomWildenhain-Microsoft · 2021-01-30T03:15:29Z

@MoFHeka yes I think that could work though the hard part is getting the data out of the tables first. I don't know how to get the tables if I don't know their key/value types

TomWildenhain-Microsoft · 2021-01-30T03:16:54Z

Normally we can find the initializer for the table by looking at the imported saved model, but I can't find it in this particular instance. I'm not sure if it is in the Python object, or if after loading it TensorFlow has destroyed the data.

TomWildenhain-Microsoft · 2021-01-30T03:17:47Z

I'm able to get the data out of the table if I know the type in advance, which I do in this case. However if I guess the type incorrectly TensorFlow aborts.

TomWildenhain-Microsoft · 2021-01-30T03:19:52Z

Worst case we could jump into the protobuf of the saved model directly to find the initializers, but it would be much cleaner if we can get it out of the imported savedmodel Python object.

zhaohb · 2021-01-30T04:16:41Z

@TomWildenhain-Microsoft https://drive.google.com/file/d/1dNhMOn9h7FtcuLSMXFK9AhBRdU1NitPg/view?usp=sharing
this is keras model, and we can verify whether we can get tables from the H5 model.

TomWildenhain-Microsoft · 2021-02-02T02:50:40Z

I used a bit of a hacky method but it should work for this model. I'm not merging it, but try converting the model using this branch: #1310

zhaohb · 2021-02-02T03:08:49Z

@TomWildenhain-Microsoft ok,thank you very much, I will close this issue.

TomWildenhain-Microsoft · 2021-02-02T03:14:26Z

Don't close it just yet. I got a model but I don't have data to test it. Let me know if you can convert and once you do whether the results are correct and fast enough. I converted the tables by casting the int keys to strings which might cause a slow down. If so, I can make a better conversation using Gather ops.

zhaohb · 2021-02-02T04:32:43Z

ok, I had reopen, I will test it as soon as possible.

zhaohb · 2021-02-03T07:09:27Z

@TomWildenhain-Microsoft I had test that branch, and now I can get onnx model, it is great. But when I want to implement Bucketize op(onnxruntime does not implement Bucketize and must be customized), I find that the Dtypes of the input Bucketize are abnormal. Some are float32, but others are None :

import onnx_graphsurgeon as gs
import numpy as np
import onnx

graph = gs.import_onnx(onnx.load("fea.onnx"))

bucketizes = [node for node in graph.nodes if node.op == "Bucketize"]
for item in bucketizes:
    item_type = item.inputs[0].dtype
    print(item.op, " : ", item_type)

output:

Bucketize  :  float32
Bucketize  :  float32
......
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  None
Bucketize  :  float32
Bucketize  :  None
Bucketize  :  None
Bucketize  :  float32
Bucketize  :  None
Bucketize  :  None
Bucketize  :  None
Bucketize  :  float32
Bucketize  :  None
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  float32
Bucketize  :  None
Bucketize  :  None
Bucketize  :  float32
......

TomWildenhain-Microsoft · 2021-02-03T07:19:46Z

I think that's just because graph surgeon doesn't know how to do type inference for some of the custom ops in the graph, so it thinks the type is unknown but really it is float32. The onnx file only stores types for the input and output tensors. Everything else is inferred.

TomWildenhain-Microsoft · 2021-02-03T07:21:04Z

For those bucket nodes are the buckets constant or are they passed in as an input? If they are constant I might be able to make a conversion for them.

zhaohb · 2021-02-03T10:25:33Z

I've solved the Bucketize Op problem, and it works, but It's very slow, about 10x times slower than TensorFlow itself, how do you optimize it. I also found that can not be executed in parallel.

zhaohb · 2021-02-03T17:31:18Z

I've set up a test environment on coLab, tenosrflow-onnx has added modifications to your branch, onnxruntime has been recompiled, and custom operators have been implemented in the ort-customops project. You can test the onnx model in this environment.
The link is as follows:
https://colab.research.google.com/drive/1wxu8piPR9qyAC8EjtDd6-STZqek77BO7?usp=sharing

zhaohb · 2021-02-03T17:38:20Z

@TomWildenhain-Microsoft The onnx model used in the tests was simpler, but was also converted from savedModel based on your branching, and tests showed that the onnx model was much slower than the saved model.

TomWildenhain-Microsoft · 2021-02-03T19:35:50Z

I'm not too surprised the perf is really bad since this model seems to use a ton of ops that normally aren't very common and we haven't optimized for (normally models have 1 or 2 table lookups, this model has hundreds). We can change those table lookups into gather ops which should be faster. Also you can run ORT with profiling turned on and we can see where the slowdown is.

Out of curiosity, what does this model do and why are you converting it to onnx?

zhaohb · 2021-02-04T07:26:14Z

@TomWildenhain-Microsoft I had turn on the profiling option and generated the corresponding JSON file, from which we can see that IO still accounts for a large proportion. The corresponding model is new_coarse. onnx, which I have shared with you.
json link:
https://drive.google.com/file/d/1CBXm6wPXHNRxUhM_OZeKqmGloGpX9_-V/view?usp=sharing
json which add parallel link:
https://drive.google.com/file/d/1CA-MPsMOzKjK4HfxDlUZlawbkzBlpBtN/view?usp=sharing
onnx model link:
https://drive.google.com/file/d/1plOQS-aFPukrfeTFw2rBS3-64zU0A8h4/view?usp=sharing

This model is recommendation model, we did this converting to speed up the model, but right now, it's not working well.

zhaohb · 2021-02-07T06:55:57Z

we can make a better conversation using Gather ops. will we support this change？

TomWildenhain-Microsoft · 2021-02-08T19:21:50Z

If you make the conversion and add tests for it we will merge it into master and maintain it.

TomWildenhain-Microsoft · 2021-02-11T21:06:11Z

To debug the poor performance, try using this script:
https://github.com/microsoft/onnxconverter-common/blob/4f4940d8e23c996dc16245d1b9747956ff61e9a0/onnxconverter_common/perfstats.py

guschmue · 2021-04-07T14:45:48Z

assume this is resolved.

chinhuang007 mentioned this issue Jan 22, 2021

can not convert tf savedmodel to onnx onnx/onnx-tensorflow#856

Closed

zhaohb closed this as completed Feb 2, 2021

zhaohb reopened this Feb 2, 2021

TomWildenhain-Microsoft added the pending on user response Waiting for more information or validation from user label Feb 22, 2021

jvishnuvardhan mentioned this issue Feb 27, 2021

cannot convert tf savedmodel to onnx tensorflow/tensorflow#46573

Closed

guschmue closed this as completed Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot convert tf savedmodel to onnx #1287

cannot convert tf savedmodel to onnx #1287

zhaohb commented Jan 22, 2021

TomWildenhain-Microsoft commented Jan 22, 2021

zhaohb commented Jan 23, 2021

TomWildenhain-Microsoft commented Jan 27, 2021

zhaohb commented Jan 28, 2021

TomWildenhain-Microsoft commented Jan 28, 2021

zhaohb commented Jan 29, 2021 •

edited

zhaohb commented Jan 29, 2021

TomWildenhain-Microsoft commented Jan 29, 2021

TomWildenhain-Microsoft commented Jan 29, 2021

TomWildenhain-Microsoft commented Jan 29, 2021

zhaohb commented Jan 30, 2021 •

edited

TomWildenhain-Microsoft commented Jan 30, 2021

TomWildenhain-Microsoft commented Jan 30, 2021

TomWildenhain-Microsoft commented Jan 30, 2021

TomWildenhain-Microsoft commented Jan 30, 2021

zhaohb commented Jan 30, 2021

TomWildenhain-Microsoft commented Feb 2, 2021

zhaohb commented Feb 2, 2021

TomWildenhain-Microsoft commented Feb 2, 2021

zhaohb commented Feb 2, 2021

zhaohb commented Feb 3, 2021

TomWildenhain-Microsoft commented Feb 3, 2021

TomWildenhain-Microsoft commented Feb 3, 2021

zhaohb commented Feb 3, 2021

zhaohb commented Feb 3, 2021

zhaohb commented Feb 3, 2021

TomWildenhain-Microsoft commented Feb 3, 2021

zhaohb commented Feb 4, 2021

zhaohb commented Feb 7, 2021

TomWildenhain-Microsoft commented Feb 8, 2021

TomWildenhain-Microsoft commented Feb 11, 2021

guschmue commented Apr 7, 2021

cannot convert tf savedmodel to onnx #1287

cannot convert tf savedmodel to onnx #1287

Comments

zhaohb commented Jan 22, 2021

TomWildenhain-Microsoft commented Jan 22, 2021

zhaohb commented Jan 23, 2021

TomWildenhain-Microsoft commented Jan 27, 2021

zhaohb commented Jan 28, 2021

TomWildenhain-Microsoft commented Jan 28, 2021

zhaohb commented Jan 29, 2021 • edited

zhaohb commented Jan 29, 2021

TomWildenhain-Microsoft commented Jan 29, 2021

TomWildenhain-Microsoft commented Jan 29, 2021

TomWildenhain-Microsoft commented Jan 29, 2021

zhaohb commented Jan 30, 2021 • edited

TomWildenhain-Microsoft commented Jan 30, 2021

TomWildenhain-Microsoft commented Jan 30, 2021

TomWildenhain-Microsoft commented Jan 30, 2021

TomWildenhain-Microsoft commented Jan 30, 2021

zhaohb commented Jan 30, 2021

TomWildenhain-Microsoft commented Feb 2, 2021

zhaohb commented Feb 2, 2021

TomWildenhain-Microsoft commented Feb 2, 2021

zhaohb commented Feb 2, 2021

zhaohb commented Feb 3, 2021

TomWildenhain-Microsoft commented Feb 3, 2021

TomWildenhain-Microsoft commented Feb 3, 2021

zhaohb commented Feb 3, 2021

zhaohb commented Feb 3, 2021

zhaohb commented Feb 3, 2021

TomWildenhain-Microsoft commented Feb 3, 2021

zhaohb commented Feb 4, 2021

zhaohb commented Feb 7, 2021

TomWildenhain-Microsoft commented Feb 8, 2021

TomWildenhain-Microsoft commented Feb 11, 2021

guschmue commented Apr 7, 2021

zhaohb commented Jan 29, 2021 •

edited

zhaohb commented Jan 30, 2021 •

edited