Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The exported ONNX model of Qwen/Qwen1.5-0.5B-Chat does not produce a cache-enabled model. #1747

Closed
2 of 4 tasks
anilmartha opened this issue Mar 7, 2024 · 3 comments
Closed
2 of 4 tasks
Labels
bug Something isn't working

Comments

@anilmartha
Copy link

System Info

transformers-4.38.2
optimum-1.17.1

Who can help?

Hi @michaelbenayoun,

I have exported Qwen/Qwen1.5-0.5B-Chat model with text-generation-with-past. When running the exported ONNX model with the ORTModelForCausalLM class, the following error is observed.
File "/proj/mldata/users/anilm/repos/qwen/run.py", line 11, in
model = ORTModelForCausalLM.from_pretrained("Qwen1.5-0.5B-Chat")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/proj/mldata/users/anilm/workspace/AIE/miniconda/envs/py311/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py", line 662, in from_pretrained
return super().from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/proj/mldata/users/anilm/workspace/AIE/miniconda/envs/py311/lib/python3.11/site-packages/optimum/modeling_base.py", line 399, in from_pretrained
return from_pretrained_method(
^^^^^^^^^^^^^^^^^^^^^^^
File "/proj/mldata/users/anilm/workspace/AIE/miniconda/envs/py311/lib/python3.11/site-packages/optimum/onnxruntime/modeling_decoder.py", line 559, in _from_pretrained
return init_cls(
^^^^^^^^^
File "/proj/mldata/users/anilm/workspace/AIE/miniconda/envs/py311/lib/python3.11/site-packages/optimum/onnxruntime/modeling_decoder.py", line 169, in init
raise ValueError(
ValueError: use_cache was set to True but the loaded model only supports use_cache=False. Please load your current model with use_cache=False or export the original model once again with use_cache=True when calling the from_pretrained method. To export your model, simply set export=True

I have added the custom export script below.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

from optimum.exporters.onnx import main_export

from transformers import AutoConfig

from optimum.exporters.onnx.config import TextDecoderOnnxConfig,TextDecoderWithPositionIdsOnnxConfig
from optimum.exporters.onnx.base import ConfigBehavior
from optimum.utils import NormalizedTextConfig, DummyPastKeyValuesGenerator
from typing import Dict
import os
import shutil


class QwenDummyPastKeyValuesGenerator(DummyPastKeyValuesGenerator):

    def generate(self, input_name: str, framework: str = "pt"):
        past_key_shape = (
            self.batch_size,
            self.num_attention_heads,
            self.hidden_size // self.num_attention_heads,
            self.sequence_length,
        )
        past_value_shape = (
            self.batch_size,
            self.num_attention_heads,
            self.sequence_length,
            self.hidden_size // self.num_attention_heads,
        )
        return [
            (
                self.random_float_tensor(past_key_shape, framework=framework),
                self.random_float_tensor(past_value_shape, framework=framework),
            )
            for _ in range(self.num_layers)
        ]

class CustomQwenOnnxConfig(TextDecoderOnnxConfig):
    DUMMY_INPUT_GENERATOR_CLASSES = (QwenDummyPastKeyValuesGenerator,) + TextDecoderOnnxConfig.DUMMY_INPUT_GENERATOR_CLASSES
    DUMMY_PKV_GENERATOR_CLASS = QwenDummyPastKeyValuesGenerator

    DEFAULT_ONNX_OPSET = 15  # aten::tril operator requires opset>=14
    NORMALIZED_CONFIG_CLASS = NormalizedTextConfig


    def add_past_key_values(self, inputs_or_outputs: Dict[str, Dict[int, str]], direction: str):
    
         if direction not in ["inputs", "outputs"]:
             raise ValueError(f'direction must either be "inputs" or "outputs", but {direction} was given')

         if direction == "inputs":
             decoder_sequence_name = "past_sequence_length"
             name = "past_key_values"
         else:
             decoder_sequence_name = "past_sequence_length + 1"
             name = "present"

         for i in range(self._normalized_config.num_layers):
             inputs_or_outputs[f"{name}.{i}.key"] = {0: "batch_size", 3: decoder_sequence_name}
             inputs_or_outputs[f"{name}.{i}.value"] = {0: "batch_size", 2: decoder_sequence_name}


model_id = "Qwen/Qwen1.5-0.5B-Chat"
config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)


onnx_config = CustomQwenOnnxConfig(
    config=config,
    task="text-generation",
    use_past=True,
    use_past_in_inputs=False,
)
onnx_config_with_past = CustomQwenOnnxConfig(config, task="text-generation", use_past=True)

custom_onnx_configs = {
    "model": onnx_config_with_past,
}


main_export(
    model_id,
    output="Qwen1.5-0.5B-Chat",
    task="text-generation-with-past",
    trust_remote_code=True,
    custom_onnx_configs=custom_onnx_configs,
    no_post_process=True,
    opset=15
)
### Running 
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
from optimum.utils import NormalizedTextConfig, NormalizedConfigManager
NormalizedConfigManager._conf['qwen2'] = NormalizedTextConfig

import torch
model_id = "Qwen/Qwen1.5-0.5B-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = ORTModelForCausalLM.from_pretrained("Qwen1.5-0.5B-Chat")

Expected behavior

I am generating the model with text-generation-with-past, and it should work seamlessly.

@anilmartha anilmartha added the bug Something isn't working label Mar 7, 2024
@fxmarty
Copy link
Contributor

fxmarty commented Mar 20, 2024

H @anilmartha, thank you for the issue. #1746 should be merged today which should make it as straightforward as:

optimum-cli export onnx --model Qwen/Qwen1.5-0.5B-Chat qwen_onnx

Now regarding your code: Qwen/Qwen1.5-0.5B-Chat does not seem to use a custom modeling code anymore (maybe it was in the past), as qwen2 model type is now supported in Transformers natively.

So you do not need to use trust_remote_code=True. Also, the PKV generator is not correct. Qwen2 is similar to llama and:

from optimum.exporters.onnx import main_export
from transformers import AutoConfig
from optimum.exporters.onnx.model_configs import LlamaOnnxConfig

class CustomQwenOnnxConfig(LlamaOnnxConfig):
    pass

model_id = "fxmarty/tiny-dummy-qwen2"
config = AutoConfig.from_pretrained(model_id)

onnx_config_with_past = CustomQwenOnnxConfig(config, task="text-generation", use_past=True)

custom_onnx_configs = {
    "model": onnx_config_with_past,
}

main_export(
    model_id,
    output="Qwen1.5-0.5B-Chat",
    task="text-generation-with-past",
    custom_onnx_configs=custom_onnx_configs,
)

just works.

Note that #1746 is needed to have the exported model work with ORTModelForCausalLM.

@fxmarty fxmarty closed this as completed Mar 20, 2024
@MrRace
Copy link

MrRace commented Apr 10, 2024

@fxmarty
I used the following command to export the ONNX format model:

optimum-cli export onnx --model /share_model_zoo/LLM/Qwen/Qwen1.5-0.5B-Chat --task text-generation-with-past /share_model_zoo/LLM/Qwen/optimum_onnx/Qwen1.5-0.5B-Chat

The resulting files are as follows:

-rw-r--r-- 1 root root  704 Apr 10 11:50 config.json
-rw-r--r-- 1 root root  205 Apr 10 11:50 generation_config.json
-rw-r--r-- 1 root root 1.4K Apr 10 11:50 tokenizer_config.json
-rw-r--r-- 1 root root  367 Apr 10 11:50 special_tokens_map.json
-rw-r--r-- 1 root root   80 Apr 10 11:50 added_tokens.json
-rw-r--r-- 1 root root 2.7M Apr 10 11:50 vocab.json
-rw-r--r-- 1 root root 1.6M Apr 10 11:50 merges.txt
-rw-r--r-- 1 root root 6.8M Apr 10 11:50 tokenizer.json
-rw-r--r-- 1 root root 8.0M Apr 10 12:08 _model_layers.0_self_attn_rotary_emb_Constant_attr__value
-rw-r--r-- 1 root root 8.0M Apr 10 12:08 _model_layers.0_self_attn_rotary_emb_Constant_5_attr__value
-rw-r--r-- 1 root root 1.8G Apr 10 12:29 model.onnx

I would like to ask what the files _model_layers.0_self_attn_rotary_emb_Constant_attr__value and _model_layers.0_self_attn_rotary_emb_Constant_5_attr__value are? Why are these two files generated, and how do I use them during inference? Thank you very much.

@fxmarty
Copy link
Contributor

fxmarty commented Apr 10, 2024

Hi @MrRace these files are an artifact of a step in the ONNX export where all external data are saved under the same model.onnx_data file. These files are not needed, should be deleted but are not. You can simply use model.onnx.

Basically, torch.onnx.export for models > 2GB export weights data as many independent files (onnx__MatMul_5747, onnx__MatMul_5748), which are fused in a single file (and the Constant_attr__value files as well), but these are leftovers.

This will be fixed in #1808.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants