New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot convert tf savedmodel to onnx #1287
Comments
Hi @zhaohb, can you please upload a copy of the saved model? |
hi,@TomWildenhain-Microsoft, colab link https://colab.research.google.com/drive/1wxu8piPR9qyAC8EjtDd6-STZqek77BO7?usp=sharing, |
I have requested access to the colab |
@TomWildenhain-Microsoft , sorry for the late reply, I have added you to the user group |
Shoot, it looks like this colab requires a tar file I don't have. Can you please zip the saved model directory you are trying to convert and upload it to google drive? |
model file link : https://drive.google.com/file/d/1OmfoxcalmJMpW3QyOXnFWkUygn58CTZe/view , I had add you to user group. |
@TomWildenhain-Microsoft Can you repeat the experiment? I'm waiting for your reply. |
Taking a look now. |
The error: I'm not sure why it isn't finding the tables, but interestingly they are int64 to int64 tables, not the usual string to int64 tables, so currently we won't be able to convert them anyway. Do you know why this model has int64 -> int64 hash tables? What type of model is this? |
From the values, it looks like these tables are storing some sort of permutations. Can you change the model to use gather ops instead of tables?
|
Thank you for your reply, I also think the most important mistake is: ERROR - Could not find table resource to replace placeholder unknown_172 but now I can't change the model, so should we add a feature to onnx to fix this bug?can't find table source, probably because resource type is used.I think this bug is going to be very common. |
@MoFHeka yes I think that could work though the hard part is getting the data out of the tables first. I don't know how to get the tables if I don't know their key/value types |
Normally we can find the initializer for the table by looking at the imported saved model, but I can't find it in this particular instance. I'm not sure if it is in the Python object, or if after loading it TensorFlow has destroyed the data. |
I'm able to get the data out of the table if I know the type in advance, which I do in this case. However if I guess the type incorrectly TensorFlow aborts. |
Worst case we could jump into the protobuf of the saved model directly to find the initializers, but it would be much cleaner if we can get it out of the imported savedmodel Python object. |
@TomWildenhain-Microsoft https://drive.google.com/file/d/1dNhMOn9h7FtcuLSMXFK9AhBRdU1NitPg/view?usp=sharing |
I used a bit of a hacky method but it should work for this model. I'm not merging it, but try converting the model using this branch: #1310 |
@TomWildenhain-Microsoft ok,thank you very much, I will close this issue. |
Don't close it just yet. I got a model but I don't have data to test it. Let me know if you can convert and once you do whether the results are correct and fast enough. I converted the tables by casting the int keys to strings which might cause a slow down. If so, I can make a better conversation using Gather ops. |
ok, I had reopen, I will test it as soon as possible. |
@TomWildenhain-Microsoft I had test that branch, and now I can get onnx model, it is great. But when I want to implement Bucketize op(onnxruntime does not implement Bucketize and must be customized), I find that the Dtypes of the input Bucketize are abnormal. Some are float32, but others are None : import onnx_graphsurgeon as gs
import numpy as np
import onnx
graph = gs.import_onnx(onnx.load("fea.onnx"))
bucketizes = [node for node in graph.nodes if node.op == "Bucketize"]
for item in bucketizes:
item_type = item.inputs[0].dtype
print(item.op, " : ", item_type) output: Bucketize : float32
Bucketize : float32
......
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : None
Bucketize : float32
Bucketize : None
Bucketize : None
Bucketize : float32
Bucketize : None
Bucketize : None
Bucketize : None
Bucketize : float32
Bucketize : None
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : float32
Bucketize : None
Bucketize : None
Bucketize : float32
...... |
I think that's just because graph surgeon doesn't know how to do type inference for some of the custom ops in the graph, so it thinks the type is unknown but really it is float32. The onnx file only stores types for the input and output tensors. Everything else is inferred. |
For those bucket nodes are the buckets constant or are they passed in as an input? If they are constant I might be able to make a conversion for them. |
I've solved the Bucketize Op problem, and it works, but It's very slow, about 10x times slower than TensorFlow itself, how do you optimize it. I also found that can not be executed in parallel. |
I've set up a test environment on coLab, tenosrflow-onnx has added modifications to your branch, onnxruntime has been recompiled, and custom operators have been implemented in the ort-customops project. You can test the onnx model in this environment. |
@TomWildenhain-Microsoft The onnx model used in the tests was simpler, but was also converted from savedModel based on your branching, and tests showed that the onnx model was much slower than the saved model. |
I'm not too surprised the perf is really bad since this model seems to use a ton of ops that normally aren't very common and we haven't optimized for (normally models have 1 or 2 table lookups, this model has hundreds). We can change those table lookups into gather ops which should be faster. Also you can run ORT with profiling turned on and we can see where the slowdown is. Out of curiosity, what does this model do and why are you converting it to onnx? |
@TomWildenhain-Microsoft I had turn on the profiling option and generated the corresponding JSON file, from which we can see that IO still accounts for a large proportion. The corresponding model is new_coarse. onnx, which I have shared with you. This model is recommendation model, we did this converting to speed up the model, but right now, it's not working well. |
we can make a better conversation using Gather ops. will we support this change? |
If you make the conversion and add tests for it we will merge it into master and maintain it. |
To debug the poor performance, try using this script: |
assume this is resolved. |
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Linux Ubuntu 18.04
TensorFlow installed from (source or binary):
binary
TensorFlow version (use command below):
tf-nightly-gpu 2.5.0.dev20210119
Python version:
3.6 (Anaconda)
Tensorflow-onnx version:
1.8.0. build from source
my command line :
But I got the following error:
I want to get the ONNX model, desperate for some advice!
thank you very much
The text was updated successfully, but these errors were encountered: