Export finetuned PEFT / LoRA model to ONNX #670

ingo-m · 2023-07-06T15:39:35Z

System Info

Platform: google colab with T4 GPU
Python version: 3.10
peft.__version__: 0.3.0
accelerate.__version__: 0.20.3
transformers.__version__: 4.30.2

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

I'm trying to export a model finetuned with PEFT / LoRA to ONNX. The base model is bigscience/bloom-560m.

Basically, I merge the LoRA weights into the base model after finetuning, and then try to convert the resulting merged model to ONNX. When I'm using optimum.onnxruntime.ORTModelForCausalLM, the export works, and I can run inference with the ONNX model, but the model outputs are degraded.

Alternatively, using the lower-level torch.onnx.export() approach, I get an error.

Here's a minimal example showing both approaches (optimum.onnxruntime.ORTModelForCausalLM and torch.onnx.export()): https://colab.research.google.com/drive/1ImNLTJ11JBeaSn-76eAejf5Pjk0ahPTY?usp=sharing

(The colab example can run on a free T4 instance.)

Is model export to ONNX after PEFT / LoRA finetuning already supposed to work? I found this issue #118 but I'm not quite sure.

Expected behavior

I expect the model outputs to be the same before & after conversion to ONNX. In the minimal example (colab notebook) the difference might look small (nonsensical output either way), but in a real-life use case (involving finetuning) a model that performs very well on a given task can have completely degraded performance after ONNX conversion. I also observed this with a larger model (bigscience/bloom-3b), but that doesn't work in free tier google colab.

The text was updated successfully, but these errors were encountered:

fxmarty · 2023-07-06T16:48:17Z

Hi @ingo-m , which optimum version did you try with? Could you try on optimum main branch? In the best case scenario, if you still hit the issue on main, could you share the model and a reproduction script in an issue in optimum repo? Thanks!

You may have been hurt by this bug specific to bloom, fixed since then but not yet in a release: huggingface/optimum#1152

ingo-m · 2023-07-06T17:17:03Z

@fxmarty thanks

optimum has no optimum.__version__ attribute, but from the pip install logs I can see that my colab environment uses optimum-1.9.0 (which was apparently released last week).

Yes I can try installing from optimum main branch tomorrow and report the issue on optimum repo if it persists.

By the way I used the bloom base model without any modifications at all, just applying standard PEFT / LoRA functions https://colab.research.google.com/drive/1ImNLTJ11JBeaSn-76eAejf5Pjk0ahPTY?usp=sharing

ingo-m · 2023-07-07T09:47:43Z

@fxmarty as suggested I installed optimum from main branch, i.e. version optimum 1.9.1.dev0. The issue of degraded model output after ONNX conversion persists also with the latest version.

This is the updated minimal example (colab notebook): https://colab.research.google.com/drive/1ImNLTJ11JBeaSn-76eAejf5Pjk0ahPTY?usp=sharing

I will report this issue in the optimum repo.

ingo-m · 2023-07-09T22:57:23Z

Sorry I had overlooked something in my original bug report:

This issue is not specific to PEFT / LoRA models.

Here's a minimal example where a vanilla-flavor "bigscience/bloom-560m" model, without any modifications, generates degraded predictions after conversion to ONNX with ORTModelForCausalLM.from_pretrained(): https://colab.research.google.com/drive/1XF2jy0WGHgqxQjfOH01t8Grof0SAfgJ-?usp=sharing

So since it's not PEFT related, I suppose this issue doesn't belong here, but into the optimum repo huggingface/optimum#1171

This was referenced Jul 6, 2023

Problem with latency #113

Closed

LoRA support onnx/onnx#5326

Open

Community contribution - optimum.exporters.onnx support for new models! huggingface/optimum#555

Open

whisper-peft model to onnx #660

Closed

ingo-m mentioned this issue Jul 7, 2023

Degraded predictions after conversion to ONNX huggingface/optimum#1171

Closed

4 tasks

ingo-m closed this as completed Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export finetuned PEFT / LoRA model to ONNX #670

Export finetuned PEFT / LoRA model to ONNX #670

ingo-m commented Jul 6, 2023 •

edited

Loading

fxmarty commented Jul 6, 2023 •

edited

Loading

ingo-m commented Jul 6, 2023

ingo-m commented Jul 7, 2023

ingo-m commented Jul 9, 2023 •

edited

Loading

Export finetuned PEFT / LoRA model to ONNX #670

Export finetuned PEFT / LoRA model to ONNX #670

Comments

ingo-m commented Jul 6, 2023 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

fxmarty commented Jul 6, 2023 • edited Loading

ingo-m commented Jul 6, 2023

ingo-m commented Jul 7, 2023

ingo-m commented Jul 9, 2023 • edited Loading

ingo-m commented Jul 6, 2023 •

edited

Loading

fxmarty commented Jul 6, 2023 •

edited

Loading

ingo-m commented Jul 9, 2023 •

edited

Loading