Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Could not resolve inputs at top-level" issue loading ONNX file #141

Closed
psiphi75 opened this issue Aug 14, 2019 · 13 comments
Closed

"Could not resolve inputs at top-level" issue loading ONNX file #141

psiphi75 opened this issue Aug 14, 2019 · 13 comments

Comments

@psiphi75
Copy link

The following code:

  let path = Path::new("GRU128KeywordSpotter.onnx");
  let mut model = tract_onnx::onnx().model_for_path(path)?;

fails with the following error message:

Error: TractError(Msg("Could not resolve inputs at top-level: [\"\"]"), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })

The ONNX file appears to be valid since it can be used otherwise. The offending ONNX file can be found here: https://psiphi75.github.io/workingfiles/GRU128KeywordSpotter.onnx

@kali
Copy link
Collaborator

kali commented Aug 15, 2019

Thanks for the report ! I'll have a look.

@kali
Copy link
Collaborator

kali commented Aug 15, 2019

So a first thing of notice: the graph uses a "ConstantFill" operator (nodes 20 and 54). This operator is not part of Onnx, so you may want to tweak your pytorch network somehow.

The error message is bad, though. I'll try to see how it gets reported like this.

This is not documented anywhere (except with --help) but tract has an auditing command line that can help investigating these thing... cargo install tract will install the utility, then tract GRU128KeywordSpotter.onnx --pass analyse dump will dump the network. The two ConstantFill nodes show.

@kali
Copy link
Collaborator

kali commented Aug 16, 2019

OK, i found out what the other problem is. I have issues dealing with optional inputs and the way they are encoded in ONNX. ONNX uses an empty string as a input specifier to denote a missing input when it needs to skip it, and I have some problems modelling this in tract. Here is a dump of the problematic node:

input: "input.1"
input: "39"
input: "40"
input: "41"
input: ""
input: "20"
output: "42"
output: "43"
op_type: "GRU"
attribute {
  name: "hidden_size"
  type: INT
  i: 128
}
attribute {
  name: "linear_before_reset"
  type: INT
  i: 1
}
doc_string: "/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/rnn.py(179): forward\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__\n/home/ubuntu/ELL/ELL/tools/utilities/pythonlibs/audio/training/train_classifier.py(316): forward\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/jit/__init__.py(252): forward\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py(489): __call__\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/jit/__init__.py(197): get_trace_graph\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/onnx/utils.py(192): _trace_and_get_graph_from_model\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/onnx/utils.py(224): _model_to_graph\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/onnx/utils.py(281): _export\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/onnx/utils.py(104): export\n/home/ubuntu/miniconda3/envs/py36/lib/python3.6/site-packages/torch/onnx/__init__.py(27): export\n/home/ubuntu/ELL/ELL/tools/utilities/pythonlibs/audio/training/train_classifier.py(96): export\n/home/ubuntu/ELL/ELL/tools/utilities/pythonlibs/audio/training/train_classifier.py(561): train\n/home/ubuntu/ELL/ELL/tools/utilities/pythonlibs/audio/training/train_classifier.py(652): <module>\n"

So this GRU is missing its sequence_lens input. As an immediate workaround, you may actually be able to provide this input to the node, depending on your pytorch code, while I figure out a real fix for this recurring issue.

Note: I'm very interested in helping you gettiing tract to work, as you're obviously doing voice and you're not a colleague :) But tract may not be work greatly out of the box. Support for recurring operators is relatively recent, with a lot of ongoing work through the kaldi support. I have to warn you that as it stands, the ONNX GRU operator implementation is just "passing the ONNX tests", (and as you've just discovered, they are not covering everything). I have never used it for real situations. The good news being, the current strategy is to re-express the complicated recurring operators (LSTM, GRU, ...) for the three frameworks (tf, onnx, and kaldi) in terms of simpler ops (Scan, MatMatMul, Sigmoid, etc) implemented in tract core. So as soon as the GRU translation is implemented, you will benefit from the all the work we are currently doing to implement and optimize kaldi inference.

So to summarize:

  • in any case, you need to get rid of ConstantFill. As far as I can tell it is not part of the ONNX spec.
  • I will have a look at the optional input support, it may take a while (days) if I don't find something simple that I can do in a few hours
  • you may be able to workaround that issue by providing the sequence_lens to the GRU operator that do not get it

This should get us to a network that loads, and we'll take it from there :)

@kali
Copy link
Collaborator

kali commented Aug 16, 2019

This may help about ConstantFill : onnx/onnx#1434

@kali
Copy link
Collaborator

kali commented Aug 16, 2019

#142

@psiphi75
Copy link
Author

Wow, thanks for the investigation and support. Yes, I had realised that the ConstantFill and GRU ops may not be supported. It appears ConstantFill was only ever an experimental feature on ONNX. I'll have to re-export the ONNX from PyTorch.

My other option was to use ELL, but I prefer a Rust implementation. My ultimate goal is to get one of these running on an ARM processor, of some sort.

I'll investigate your comments more tomorrow.

@kali
Copy link
Collaborator

kali commented Aug 16, 2019

Ha, real time voice on ARM should be the sweet spot indeed. Happy to see somebody out of the office give it a shot :)

@kali
Copy link
Collaborator

kali commented Aug 17, 2019

For your information, I have a POC for ONNX GRU translation here #143

@psiphi75
Copy link
Author

Thanks. With the latest tract code changes it's getting further. I've also updated my ONNX model and replaced the ConstantFill with ConstantFillOfShape. I've also replaced the GRU with an LSTM. I'll keep chipping away at the other issues I have.

@kali
Copy link
Collaborator

kali commented Aug 19, 2019

aha, LSTM :) I have not implemented LSTM translation to core ops yet, just the GRU one, but will try to do that soon. It should work anyway, just be relatively ineficient. I'll keep you posted.

@psiphi75
Copy link
Author

Ok, that makes sense. The LSTM took longer that 10 minutes in debug mode so I cancelled it. In release mode it crashed my computer, seriously, I had to hold the power button for 6 six seconds. But that's not an issue.

@kali
Copy link
Collaborator

kali commented Aug 20, 2019

That sounds a bit extreme... i’ll be happy to have a try at running your lstm model if you can share it.

@psiphi75
Copy link
Author

I found out that the optimisation step was taking a long time. I'll raise a bug soon, hopefully. I'm currently busy with other things. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants