-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export finetuned PEFT / LoRA model to ONNX #670
Comments
Hi @ingo-m , which optimum version did you try with? Could you try on optimum You may have been hurt by this bug specific to bloom, fixed since then but not yet in a release: huggingface/optimum#1152 |
@fxmarty thanks
Yes I can try installing from optimum By the way I used the bloom base model without any modifications at all, just applying standard PEFT / LoRA functions https://colab.research.google.com/drive/1ImNLTJ11JBeaSn-76eAejf5Pjk0ahPTY?usp=sharing |
@fxmarty as suggested I installed optimum from This is the updated minimal example (colab notebook): https://colab.research.google.com/drive/1ImNLTJ11JBeaSn-76eAejf5Pjk0ahPTY?usp=sharing I will report this issue in the optimum repo. |
Sorry I had overlooked something in my original bug report: This issue is not specific to PEFT / LoRA models. Here's a minimal example where a vanilla-flavor "bigscience/bloom-560m" model, without any modifications, generates degraded predictions after conversion to ONNX with So since it's not PEFT related, I suppose this issue doesn't belong here, but into the optimum repo huggingface/optimum#1171 |
System Info
peft.__version__
: 0.3.0accelerate.__version__
: 0.20.3transformers.__version__
: 4.30.2Who can help?
No response
Information
Tasks
examples
folderReproduction
I'm trying to export a model finetuned with PEFT / LoRA to ONNX. The base model is
bigscience/bloom-560m
.Basically, I merge the LoRA weights into the base model after finetuning, and then try to convert the resulting merged model to ONNX. When I'm using
optimum.onnxruntime.ORTModelForCausalLM
, the export works, and I can run inference with the ONNX model, but the model outputs are degraded.Alternatively, using the lower-level
torch.onnx.export()
approach, I get an error.Here's a minimal example showing both approaches (
optimum.onnxruntime.ORTModelForCausalLM
andtorch.onnx.export()
): https://colab.research.google.com/drive/1ImNLTJ11JBeaSn-76eAejf5Pjk0ahPTY?usp=sharing(The colab example can run on a free T4 instance.)
Is model export to ONNX after PEFT / LoRA finetuning already supposed to work? I found this issue #118 but I'm not quite sure.
Expected behavior
I expect the model outputs to be the same before & after conversion to ONNX. In the minimal example (colab notebook) the difference might look small (nonsensical output either way), but in a real-life use case (involving finetuning) a model that performs very well on a given task can have completely degraded performance after ONNX conversion. I also observed this with a larger model (
bigscience/bloom-3b
), but that doesn't work in free tier google colab.The text was updated successfully, but these errors were encountered: