Skip to content

[Question] How to generate a tokenizer.json file for a model that does not have a tokenizer.json file, such as qwen #1670

@MingkangW

Description

@MingkangW

❓ General Questions

git clone https://huggingface.co/Qwen/Qwen-1_8B-Chat
python -m mlc_chat convert_weight [model-path] --quantization q4f16_1-o [output-model-path]
python -m mlc_chat gen_config [model-path] --quantization q4f16_1 --conv-template qwen -o [output-model-path]
python -m mlc_chat compile [output-model-path]/mlc-chat-config.json --device cuda -o [lib-path]
I compiled the Qwen model with the above command, but it was prompted that there was no qwen in the conv_template when I ran the gen_config. I added qwen to the CONV_TEMPALTES of python/mlc_chat/interface/gen_config.py, but when I inferenced the model, I got a message "Cannot find any tokenizer under: dist/Qwen-1_8B-Chat".
I find that there is no tokenizer. json file generated like other models, such as the official tutorial RedPajama-INCITE-Chat-3B-v1, which contains tokenizer. json in the original model, but Qwen does not. However, the qwen model of MLC_LLM (https://huggingface.co/mlc-ai/Qwen-1_8B-Chat-q4f16_1-MLC/tree/main) contains the tokenizer.json file. I do not know how to generate Qwen's tokenizer.json file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestion about the usage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions