-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The exported ONNX model of Qwen/Qwen1.5-0.5B-Chat does not produce a cache-enabled model. #1747
Comments
H @anilmartha, thank you for the issue. #1746 should be merged today which should make it as straightforward as:
Now regarding your code: So you do not need to use from optimum.exporters.onnx import main_export
from transformers import AutoConfig
from optimum.exporters.onnx.model_configs import LlamaOnnxConfig
class CustomQwenOnnxConfig(LlamaOnnxConfig):
pass
model_id = "fxmarty/tiny-dummy-qwen2"
config = AutoConfig.from_pretrained(model_id)
onnx_config_with_past = CustomQwenOnnxConfig(config, task="text-generation", use_past=True)
custom_onnx_configs = {
"model": onnx_config_with_past,
}
main_export(
model_id,
output="Qwen1.5-0.5B-Chat",
task="text-generation-with-past",
custom_onnx_configs=custom_onnx_configs,
) just works. Note that #1746 is needed to have the exported model work with |
@fxmarty
The resulting files are as follows:
I would like to ask what the files |
Hi @MrRace these files are an artifact of a step in the ONNX export where all external data are saved under the same Basically, This will be fixed in #1808. |
System Info
Who can help?
Hi @michaelbenayoun,
I have exported Qwen/Qwen1.5-0.5B-Chat model with text-generation-with-past. When running the exported ONNX model with the ORTModelForCausalLM class, the following error is observed.
File "/proj/mldata/users/anilm/repos/qwen/run.py", line 11, in
model = ORTModelForCausalLM.from_pretrained("Qwen1.5-0.5B-Chat")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/proj/mldata/users/anilm/workspace/AIE/miniconda/envs/py311/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py", line 662, in from_pretrained
return super().from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/proj/mldata/users/anilm/workspace/AIE/miniconda/envs/py311/lib/python3.11/site-packages/optimum/modeling_base.py", line 399, in from_pretrained
return from_pretrained_method(
^^^^^^^^^^^^^^^^^^^^^^^
File "/proj/mldata/users/anilm/workspace/AIE/miniconda/envs/py311/lib/python3.11/site-packages/optimum/onnxruntime/modeling_decoder.py", line 559, in _from_pretrained
return init_cls(
^^^^^^^^^
File "/proj/mldata/users/anilm/workspace/AIE/miniconda/envs/py311/lib/python3.11/site-packages/optimum/onnxruntime/modeling_decoder.py", line 169, in init
raise ValueError(
ValueError:
use_cache
was set toTrue
but the loaded model only supportsuse_cache=False
. Please load your current model withuse_cache=False
or export the original model once again withuse_cache=True
when calling thefrom_pretrained
method. To export your model, simply setexport=True
I have added the custom export script below.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
Expected behavior
I am generating the model with text-generation-with-past, and it should work seamlessly.
The text was updated successfully, but these errors were encountered: