Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:model/multi_category_encoding/AsString : No Op registered for AsString with domain_version of 9 #1645

Closed
hanzigs opened this issue Aug 5, 2021 · 92 comments · Fixed by #1654

Comments

@hanzigs
Copy link

hanzigs commented Aug 5, 2021

Below code works perfect when run in python file (python==3.9.5, tensorflow==2.5.0, keras2onnx==1.7.0, onnxruntime==1.8.0,
keras==2.4.3, tf2onnx==1.9.1)

autoKeras_model = StructuredDataClassifier(max_trials=MaxTrials)
autoKeras_model.fit(x=X_train, y=y_train, validation_data=(X_valid, y_valid), epochs=Epochs, verbose=1)
ExportedautoKeras_model = autoKeras_model.export_model()

onnx_model, _ = tf2onnx.convert.from_keras(ExportedautoKeras_model )
content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content)

Same code inside Flask App, InferenceSession throws error

sess = onnxruntime.InferenceSession(content)

  File "C:\Users\plg\Anaconda3\envs\automl04augpy395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\plg\Anaconda3\envs\automl04augpy395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 312, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:model/multi_category_encoding/AsString : No Op registered for AsString with domain_version of 9
I am mainly after input_name

If that's a converter bug, how should I find the correct opset? (I have tried opset from 9 to 13, all throws error) then why that error not raised in standalone run?

Any help please, Thanks

@guschmue
Copy link
Collaborator

guschmue commented Aug 5, 2021

Looks we don't support the AsString() op. Let me check if we can handle this in the converter.

@hanzigs
Copy link
Author

hanzigs commented Aug 5, 2021

Is there a work around like custom op till we get the converter update please, Thanks

@guschmue
Copy link
Collaborator

guschmue commented Aug 5, 2021

I have some code that maps AsString to ONNX Case which kind of works but doesn't honor all attribute AsString has.
But maybe its good enough for autokeras. If it works for autokeras I'll send a PR.

@guschmue
Copy link
Collaborator

guschmue commented Aug 6, 2021

worked for me for the structured_classifier example so we merged a PR:
#1648

You can try with

pip install git+https://github.com/onnx/tensorflow-onnx

@hanzigs
Copy link
Author

hanzigs commented Aug 6, 2021

@guschmue Thank you very much for quick response

Now I am getting this error, No Op registered for LookupTableFindV2 with domain_version of 9
created a fresh env and installed pip install git+https://github.com/onnx/tensorflow-onnx
(python==3.9.5, tensorflow==2.5.0, tf2onnx==1.10.0, onnxruntime==1.8.0)

sess = onnxruntime.InferenceSession(content)

  File "C:\Users\plg\Anaconda3\envs\automl07augpy395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)

  File "C:\Users\plg\Anaconda3\envs\automl07augpy395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 312, in _create_inference_session

    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2 : No Op registered for LookupTableFindV2 with domain_version of 9

As before it works normally in a python file, but in a flask app throws error,
found similar in #1228

@TomWildenhain-Microsoft
Copy link
Contributor

What are the shapes and dtypes of X_train, y_train, X_valid, y_valid? Can you upload a zipped saved model of the keras model? Sorry I've never used autokeras before.

@hanzigs
Copy link
Author

hanzigs commented Aug 6, 2021

X_train, y_train, X_valid, y_valid , (1056, 16) (1056,) (191, 16) (191,) respectively, all numpy.ndarray
(python==3.9.5, tensorflow==2.5.0, tf2onnx==1.10.0, onnxruntime==1.8.0)
creating model is simple, I can attach the pickles of X_train, y_train, X_valid, y_valid, May I know where please

pip install autokeras==1.0.15
from autokeras import StructuredDataClassifier
akmodel = StructuredDataClassifier(max_trials=10)
akmodel.fit(x=X_train, y=y_train, validation_data=(X_valid, y_valid), epochs=100)
autoKeras_model = akmodel.export_model()

onnx_model, _ = tf2onnx.convert.from_keras(model)
content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content)
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name

@TomWildenhain-Microsoft
Copy link
Contributor

Can you upload the pickles to OneDrive/GoogleDrive/Dropbox and post a link? Are those all np.int32 or np.float32?

@hanzigs
Copy link
Author

hanzigs commented Aug 6, 2021

yes they are

https://drive.google.com/drive/folders/1HfB00dOuk-awSmIrSg92hmJFYzTpQNCr?usp=sharing

attached in google drive, you can open with

import pickle
with open('filename','rb') as f: arrayname1 = pickle.load(f)

@TomWildenhain-Microsoft
Copy link
Contributor

Great, I just requested access to the drive link.

@TomWildenhain-Microsoft
Copy link
Contributor

I just run conversion and it works for me. The resulting model runs in ORT and produces results. However, my model does not contain AsString, maybe I'm using a different version of autokeras. My converted onnx model looks like this:

image

@hanzigs
Copy link
Author

hanzigs commented Aug 6, 2021

Actually as said before model created successfully in python file, and InferenceSession creates successful in python file

InferenceSession throws error in flask app

@TomWildenhain-Microsoft
Copy link
Contributor

Ah, so sorry. Didn't catch that. What version of onnxruntime does the flask application use?

@hanzigs
Copy link
Author

hanzigs commented Aug 6, 2021

Same kind of issue as in #1228

@hanzigs
Copy link
Author

hanzigs commented Aug 6, 2021

All same versions

@TomWildenhain-Microsoft
Copy link
Contributor

Same kind of issue as in #1228

Can you please elaborate on this? Are you getting a "Default value of table lookup must be const." error? Are you running the conversion code within flask too, or just onnxruntime? You can save both the keras saved model and the onnx model with:

ExportedautoKeras_model.save("autokerasmodel")
onnx_model, _ = tf2onnx.convert.from_keras(ExportedautoKeras_model, output_path="autokeras.onnx")

I find it very surprising that you get different results in flask. Is your flask running from a different virtualenv? Are you sure your autokeras version is the same?

@hanzigs
Copy link
Author

hanzigs commented Aug 6, 2021

Yes I am using same model for conversion to onnx

Inside flask, creating the model and converting it, and trying to get the session results all at once

@TomWildenhain-Microsoft
Copy link
Contributor

I think it is very likely that the keras models you get in flask and the plain python script are different. Can you please add this line:
ExportedautoKeras_model.save("autokerasmodel")
and zip the results of the python and flask scripts?

@hanzigs
Copy link
Author

hanzigs commented Aug 6, 2021

Inside Flask App, I have two functions, one model creation and passing the model to onnxconverter function, not sure is that a issue,
now will try to put both in same function,

@TomWildenhain-Microsoft
Copy link
Contributor

That should not be an issue. Again to confirm, are you using the same virtualenv for flask as the python script?

@hanzigs
Copy link
Author

hanzigs commented Aug 6, 2021

yes, python and flask are in same env

@TomWildenhain-Microsoft
Copy link
Contributor

Is the training data you are using (X_train, y_train, X_valid, y_valid) the same values for both?

@hanzigs
Copy link
Author

hanzigs commented Aug 7, 2021

Also in normal python file, onnxConversion and InferenceSession works
But when i do prediction from onnx model it throws error
like

content = ONNXModel.SerializeToString()
sess = onnxruntime.InferenceSession(content)
input_name = sess.get_inputs()[0].name 
label_name = sess.get_outputs()[0].name
pred_onnx = sess.run([label_name], {input_name: test_record})[0]

@TomWildenhain-Microsoft
Copy link
Contributor

Are you able to capture the keras saved model from flask?

@hanzigs
Copy link
Author

hanzigs commented Aug 7, 2021

Can you please confirm the prediction test.

@hanzigs
Copy link
Author

hanzigs commented Aug 7, 2021

yes i can

@TomWildenhain-Microsoft
Copy link
Contributor

Can you please confirm the prediction test.

I am able to successfully run predictions using the onnx model I have generated. The model is uploaded to the shared drive folder as as autokeras_tw.onnx

yes i can

Awesome. Please capture and upload the keras saved models and converted onnx models for flask and the python script and upload them to the Google Drive folder as autokeras_flask.zip, autokeras_flask.onnx, autokeras_python.zip, autokeras_python.onnx. If I have those, I may be able to reproduce the issue. So far, I can't reproduce it at all.

@hanzigs
Copy link
Author

hanzigs commented Aug 7, 2021

I have uploaded a "ONNXmodel.onnx" and "creditloan_prediction_20210806T210907" in the drive, can you please try to create a session from any of the two, both created in flask

@TomWildenhain-Microsoft
Copy link
Contributor

Both models give me the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\tomwi\AppData\Local\Programs\Python\Python39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\tomwi\AppData\Local\Programs\Python\Python39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 310, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from C:\Users\tomwi\Downloads\ONNXModel.onnx failed:This is an invalid model. Error in Node:model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2 : No Op registered for LookupTableFindV2 with domain_version of 9

But I will need a saved model to diagnose the cause of the conversion failure. If you are not able to upload a saved model due to privacy/security concerns, I can try to walk through the debugging on your end, or we can wait for @guschmue who might have better luck reproducing the issue with autokeras.

@hanzigs
Copy link
Author

hanzigs commented Aug 7, 2021

Yes, thats the saved model from flask, and thats the error
Regarding files, will have to create a separate one, because thats have a huge links with other files, will send once created
yeah, i am ok for the walk through, let me know how

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

May I know these warnings affect anything

INFO:tf2onnx.tfonnx:Using tensorflow=2.5.0, onnx=1.10.0, tf2onnx=1.10.0/32d758
INFO:tf2onnx.tfonnx:Using opset <onnx, 9>
WARNING:tf2onnx.shape_inference:Cannot infer shape for model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2: model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2:0
WARNING:tf2onnx.shape_inference:Cannot infer shape for model/multi_category_encoding/Cast_1: model/multi_category_encoding/Cast_1:0
INFO:tf2onnx.tf_utils:Computed 0 values for constant folding
WARNING:tf2onnx.onnx_opset.tensor:ONNX does not support precision, scientific and fill attributes for AsString
INFO:tf2onnx.optimizer:Optimizing ONNX model
INFO:tf2onnx.optimizer:After optimization: Const -20 (29->9), Identity -2 (2->0)

sorry,
there is a difference in the prediction results between python and onnx model from Flask,
But in python file, both produces same results.
Above is the only warnings, no errors happening.

@TomWildenhain-Microsoft
Copy link
Contributor

Awwww, thought we had fixed this. The models from python and flask are different. Keep in mind that autokeras can choose very different model architectures depending on the data it is given. I think it is likely you are training the python and flask models on different data.

The "does not support precision, scientific and fill attributes for AsString" might or might not matter depending on how the lookup table is formatted. Can you upload the converted onnx model from flask again?

@guschmue
Copy link
Collaborator

guschmue commented Aug 9, 2021

The warning - we can't handle all attributes from AsString(), ie. instead of float 123. onnx would have float 123.000000. Not sure if it hurts in this case - it might if the category mapper is behind it because the lookup table would have the tf representation.
int32, int64 should be ok, float might run into issues.
Not sure when autokeras starts using AsString() - for the examples I tried I saw it always used String_To_Number().

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

Thanks for that, uploaded the converted onnx model in the drive.

Regarding different results, actually I meant the python model in flask and the same model converted to onnx in flask, those two prediction results are different

But if I build model in python file and convert to onnx, those prediction results are same, I'm very confused why is that

@TomWildenhain-Microsoft
Copy link
Contributor

Ah, do you get the same results between the flask keras model and the flask onnx model?

I think you are almost certainly getting different models in flask and python. I'm not sure why, but I suspect you are giving autokeras different input data or different args.

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

yeah, here

onnx_model, _ = tf2onnx.convert.from_keras(model)

This is inside Flask
model is python object, the prediction result is 0.60987216
onnx_model result is 0.5559953
for same test data

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

Not sure whether those warnings make any difference

@TomWildenhain-Microsoft
Copy link
Contributor

The category mapper in the model looks like:
image

That's likely not going to work. That said, I find the whole thing a little strange since the result is immediately cast back to float. Seems highly unlikely that this lookup table is useful. @hanzigs are you using real testing data on this? Do you find that the TF model produces useful results on non-training data?

Also are you certain you are running the python script with the same data, args, and virtual environment as the flask app?

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

Reg: same data, args, and virtual environment, yes I'm sure about that. because, this flask app has got 7 models, keras seq, lgbm, xgb, randomforest, extratrees, decisiontree and autokeras, all other 6 models working perfect, same way the data, args are passed, so I'm sure those are correct in flask app.

Reg cast back to float, not sure what's that, but testing data is correct,

May I understand what will be the problem for the above please

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

There is no complex code happening in autokeras

    akmodel = StructuredDataClassifier(max_trials=AK_Hyperparameters['max_trials'])
    akmodel.fit(x=X_train, y=y_train, validation_data=(X_valid, y_valid), epochs=AK_Hyperparameters['epochs'])
    autoKeras_model = akmodel.export_model()

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

Category Fields are normalized using woe transformation, Numeric fields are normalized using MinMaxScalar, separately
Is this an issue

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

Testing data transformation follows the same steps of normalization for prediction

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

Flask app is bit complex to take out a miniature version, because it is linked the database which is elasticsearch on each and every step, thats why I can't send the flask app code.

@TomWildenhain-Microsoft
Copy link
Contributor

The issue is that the input to the CategoryMapper (lookup table) comes from an AsString op in TF, which converts a number to a string. There is no corresponding op in ONNX, so we convert to a Cast, but that won't necessarily use the same precision. aka 0.0 becomes "0.0" not "0.00000". The lookup for 0.0 will return 1 not 5 and the results may be different. If we were doing int to string, it would be consistent, but float to string is more problematic.

@TomWildenhain-Microsoft
Copy link
Contributor

Flask app is bit complex to take out a miniature version, because it is linked the database which is elasticsearch on each and every step, thats why I can't send the flask app code.

Is the data from the database used to train the autokeras model?

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

Yes it is

@TomWildenhain-Microsoft
Copy link
Contributor

How does the python script get the data then? What data does it use?

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

python elasticsearch client to pull the data

@TomWildenhain-Microsoft
Copy link
Contributor

Can you pickle the data from each and compare that they are identical?

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

yes i can

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

the prediction testing happens from POSTMAN

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

The issue is that the input to the CategoryMapper (lookup table) comes from an AsString op in TF, which converts a number to a string. There is no corresponding op in ONNX, so we convert to a Cast, but that won't necessarily use the same precision. aka 0.0 becomes "0.0" not "0.00000". The lookup for 0.0 will return 1 not 5 and the results may be different. If we were doing int to string, it would be consistent, but float to string is more problematic.

But if I build model step by step in a python file by calling only functions of flask app and test, it works fine

@hanzigs
Copy link
Author

hanzigs commented Aug 9, 2021

Anyway will check that, Thanks for the support, much appreciated, You can close this ticket.

@hanzigs
Copy link
Author

hanzigs commented Aug 12, 2021

Hi @TomWildenhain-Microsoft
I have uploaded 4 models in the drive,
Out of that,
3 model with name tf2onnx....
Is it possible to check the CategoryMapper cast back to float is in those models. Because all these models giving perfect results, but build from python file using functions from Flask app.

The 4th model having the catmapper build from flask app, not giving correct result
Thanks

@hanzigs
Copy link
Author

hanzigs commented Aug 12, 2021

I visualized in netron, couldn't find Categorymapper in the 3 models
image
Not sure why it's not there

model created with flask having it
image
image

Not sure what's the Flask app making difference, why Categorymapper showing up in flask model not in python file model

@hanzigs
Copy link
Author

hanzigs commented Aug 17, 2021

Hi @TomWildenhain-Microsoft
whether 'tom/keras_hash_tables' branch not available for installation?
Thanks

@hanzigs
Copy link
Author

hanzigs commented Aug 23, 2021

Hi,
Added a Colab notebook with data and autokeras model building with the prediction difference (shared in the drive)
https://colab.research.google.com/drive/1DqlJgGZuKf5nev9G6Do7DYEMEU4aAQhy

https://drive.google.com/drive/folders/1HfB00dOuk-awSmIrSg92hmJFYzTpQNCr?usp=sharing

Let me know whether its possible to convert, Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants