Translation Model in ONNX: Choosable Output Formats #9784

oborchers · 2021-01-25T10:36:17Z

🚀 Feature request

I am requesting to provide an option to specify the output format for the translation_xx_to_yy export to ONNX models. Currently, the output of convert_graph_to_onnx.convert will provide the raw tensors as output (working prototype code under #9722)

Motivation

When putting the models into production it would be great if one could chose, whether one wants to have the actual tensors or the output-tokens returned when exporting a translation pipeline to ONNX. Thereby, one is not forced to do a custom re-implementation of the model.generate function, which then uses the ONNX model instead of the torch one.

As for now, the part which is could be replaced by an ONNX inference session lives under the model.generate function. Using this in production would mean to keep a TranslationPipeline object with all corresponding model information and config plus an ONNX inference session.

Your contribution

There may be multiple solutions to this problem:

User-specific re-implementation of model.generate (This is what Ill try to accomplish in the future)
Is it possible to rewrite the code under model.generate to full torch? Then it should be possible to create a custom model for all translation models, that just places this "generate layer" on top of it. I have provided an example here which adds a simple pooling layer on an already extant transformers model. (That would require more study from my side to develop a prototype and follows step 1)
Provide support for the ort-customops library by Microsoft. Essentially, this enables ONNX to handle strings (but introduces dependency to a very experimental extension). For example, that way one can export the universal sentence encoder (including tokenizer) to ONNX. Example here. I cannot provide anything useful here.

The text was updated successfully, but these errors were encountered:

ierezell · 2021-03-30T18:38:10Z

Hello, Thanks for you work on that @oborchers ! I also saw the notebook on SentenceTransformers and it helped a lot !

Any status about this feature ? I also need to run models in onnx but most of them need to call a .generate function which is for now not supported... (I could replicate all the generate code in nodejs but i'm sure there is a nicer solution)
Is there any fix, status update or hack ?

Thanks a lot in advance,
Have a great day.

oborchers · 2021-04-03T07:06:19Z

Hi @lerezell! Welcome! Glad the notebook helped.

Not from my side, unfortunately. I had postponed the issue from our side, because we had more pressing stuff to work on. But lately, the need for this feature starts to become larger as well at our company. Does your solution of porting the .generate function work by placing it on top of the ONNX version?

Edit:
Just out of curiosity I went through the .generate code and it should be possible to place the existing .generate code on top of an CausalLMOutput model, very similar as done in the notebook. This requires an extension of the forward method.

In an initial implementation, it should be perfectly sufficient to port just the sample section and see if it works. However, this does not necessarily apply to beam_search, which I haven't figured out how it works. And the raw implementation shouldn't be too complex, because one might strip away a set of the convenience functions/arguments.

Downsides of this are, that, there needs to be some way of defining the arguments of .generate at runtime for inference. For example, the min_length and max_length and eos_token_id parameter should be included in the forward method arguments, because otherwise they would be static and defined via configuration at runtime. This may be sensible for some applications, but requires re-exporting the model every-time those change, which isn't really a nice way of doing this. Or at least if I didn't miss something completely

Best regards and have a nice eastern

ierezell · 2021-04-07T15:14:48Z

Hi @oborchers,

I still haven't implemented "my" solution as I wanted to know if there was any other solution than writing all the logic again.
I would rather not and exporting the logic in the forward (and then in the onnx model) seems to be the best solution.

For the x_length arguments, that a downside, passing them as optional in the forward method could do ?

I need to focus on other things right now but I definitely keep an eye open for that !

Have a great day

ierezell · 2021-09-14T15:42:23Z

Hi, any update on how to export full pipelines to onnx?

For now, we're still obliged to keep a custom/hugging face lib code to handle the "post output embeddings" logic....

Thanks in advance,
Have a great day

oborchers · 2021-09-16T10:01:17Z

Hi @ierezell!

Sorry for not coming back on the issue. To be honest, for our use case there are quite a few problems we've encountered in exporting full pipelines to ONNX:

How to best deal with caching (past_key_values)
Less than optimal performance when used with some generative models (GPT-Neo: Torch CUDA 2x faster than ONNX CUDA microsoft/onnxruntime#7238)
The problem of batching requests on inference servers which is very difficult due to the dynamic dimensions of past_key_values
Similar gains in inference time by using custom kernels (e.g. deepspeed inference) + regular pytorch

This blog post from Microsoft may help though:

ierezell · 2021-09-16T13:47:03Z

Hi @oborchers, thanks a lot for the feedback!

Onnx is nice to be able to change stack for me (javascript etc...) but in the light of what you're saying it will be better to keep my GPU inference server.

Thanks a lot,
Have a great day !

johncookds · 2021-10-27T16:25:56Z

Hi,

Is there an alternative to onnx that you'd recommend? The able to keep and manipulate past_key_values is the most crucial part that I cannot find for many inference optimizations.

Thank you!

oborchers mentioned this issue Jan 25, 2021

convert_graph_to_onnx.convert broken for translation model facebook/wmt19-en-de #9722

Closed

4 tasks

github-actions bot added the wontfix label Mar 6, 2021

github-actions bot closed this as completed Mar 6, 2021

LysandreJik reopened this Mar 6, 2021

LysandreJik added Feature request Request for a new feature and removed wontfix labels Mar 6, 2021

huggingface deleted a comment from github-actions bot Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation Model in ONNX: Choosable Output Formats #9784

Translation Model in ONNX: Choosable Output Formats #9784

oborchers commented Jan 25, 2021 •

edited

Loading

ierezell commented Mar 30, 2021

oborchers commented Apr 3, 2021 •

edited

Loading

ierezell commented Apr 7, 2021

ierezell commented Sep 14, 2021

oborchers commented Sep 16, 2021 •

edited

Loading

ierezell commented Sep 16, 2021

johncookds commented Oct 27, 2021

Translation Model in ONNX: Choosable Output Formats #9784

Translation Model in ONNX: Choosable Output Formats #9784

Comments

oborchers commented Jan 25, 2021 • edited Loading

🚀 Feature request

Motivation

Your contribution

ierezell commented Mar 30, 2021

oborchers commented Apr 3, 2021 • edited Loading

ierezell commented Apr 7, 2021

ierezell commented Sep 14, 2021

oborchers commented Sep 16, 2021 • edited Loading

ierezell commented Sep 16, 2021

johncookds commented Oct 27, 2021

oborchers commented Jan 25, 2021 •

edited

Loading

oborchers commented Apr 3, 2021 •

edited

Loading

oborchers commented Sep 16, 2021 •

edited

Loading