4bit量化Qwen1.5-72B 卡在：Quantizing model.layers blocks : 100% .... #4018

camposs1979 · 2024-05-31T11:05:21Z

Reminder

I have read the README and searched the existing issues.

Reproduction

#!/bin/bash

DO NOT use quantized model or quantization_bit when merging lora weights

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python3.10 export_model.py
--model_name_or_path /hy-tmp/models/Qwen1.5-72B-Chat-sft
--export_quantization_bit 4
--export_quantization_dataset ../data/c4_demo.json
--template qwen
--export_dir ../../models/Qwen1.5-72B-Chat-sft-INT4
--export_size 2
--export_device cpu
--export_legacy_format False
卡在了：
Quantizing model.layers blocks : 100%|████████████████████████████████████████████████████████████| 80/80 [1:30:22<00:00, 67.78s/it]
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:4225: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
一直没有动静。

Expected behavior

希望能够正常量化Qwen1.5-72B

System Info

Package Version

accelerate 0.28.0
addict 2.4.0
aiofiles 23.2.1
aiohttp 3.9.3
aiosignal 1.3.1
aliyun-python-sdk-core 2.15.0
aliyun-python-sdk-kms 2.16.2
altair 5.2.0
annotated-types 0.6.0
anyio 4.3.0
async-timeout 4.0.3
attrs 23.2.0
auto_gptq 0.7.1
bitsandbytes 0.43.0
certifi 2019.11.28
cffi 1.16.0
chardet 3.0.4
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
cmake 3.29.2
coloredlogs 15.0.1
contourpy 1.2.0
crcmod 1.7
cryptography 42.0.5
cupy-cuda12x 12.1.0
cycler 0.12.1
datasets 2.18.0
dbus-python 1.2.16
deepspeed 0.14.0
dill 0.3.8
diskcache 5.6.3
distro 1.4.0
distro-info 0.23ubuntu1
docstring_parser 0.16
einops 0.7.0
exceptiongroup 1.2.0
fastapi 0.110.0
fastrlock 0.8.2
ffmpy 0.3.2
filelock 3.13.3
fire 0.6.0
flash-attn 2.5.9.post1
fonttools 4.50.0
frozenlist 1.4.1
fsspec 2024.2.0
galore-torch 1.0
gast 0.5.4
gekko 1.0.7
gradio 4.29.0
gradio_client 0.16.1
h11 0.14.0
hjson 3.1.0
httpcore 1.0.4
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.22.0
humanfriendly 10.0
idna 2.8
importlib_metadata 7.1.0
importlib_resources 6.4.0
interegular 0.3.3
Jinja2 3.1.3
jmespath 0.10.0
joblib 1.3.2
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
lark 1.1.9
llvmlite 0.42.0
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.8.3
mdurl 0.1.2
modelscope 1.13.3
mpmath 1.3.0
msgpack 1.0.8
multidict 6.0.5
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.2.1
ninja 1.11.1.1
numba 0.59.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.4.99
nvidia-nvtx-cu12 12.1.105
optimum 1.16.0
orjson 3.9.15
oss2 2.18.4
outlines 0.0.34
packaging 24.0
pandas 2.2.1
peft 0.10.0
pillow 10.2.0
pip 24.0
platformdirs 4.2.0
prometheus_client 0.20.0
protobuf 3.20.3
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 15.0.2
pyarrow-hotfix 0.6
pycparser 2.21
pycryptodome 3.20.0
pydantic 2.6.4
pydantic_core 2.16.3
pydub 0.25.1
Pygments 2.17.2
PyGObject 3.36.0
pynvml 11.5.0
pyparsing 3.1.2
python-apt 2.0.1+ubuntu0.20.4.1
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
ray 2.10.0
referencing 0.34.0
regex 2023.12.25
requests 2.31.0
requests-unixsocket 0.2.0
rich 13.7.1
rouge 1.0.1
rpds-py 0.18.0
ruff 0.4.6
safetensors 0.4.2
scipy 1.12.0
semantic-version 2.10.0
sentencepiece 0.2.0
setuptools 69.2.0
shellingham 1.5.4
shtab 1.7.1
simplejson 3.19.2
six 1.14.0
sniffio 1.3.1
sortedcontainers 2.4.0
sse-starlette 2.0.0
ssh-import-id 5.10
starlette 0.36.3
sympy 1.12
termcolor 2.4.0
tiktoken 0.6.0
tokenizers 0.15.2
tomli 2.0.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.1.0
tqdm 4.66.2
transformers 4.39.1
triton 2.1.0
trl 0.8.2
typer 0.12.3
typing_extensions 4.10.0
tyro 0.7.3
tzdata 2024.1
unattended-upgrades 0.1
unsloth 2024.5
urllib3 2.2.1
uvicorn 0.29.0
uvloop 0.19.0
vllm 0.4.0
watchfiles 0.21.0
websockets 11.0.3
wheel 0.43.0
xformers 0.0.22.post7+cu118
xxhash 3.4.1
yapf 0.40.2
yarl 1.9.4
zipp 3.18.1

Others

No response

The text was updated successfully, but these errors were encountered:

hiyouga added pending This problem is yet to be addressed wontfix This will not be worked on and removed pending This problem is yet to be addressed labels Jun 3, 2024

hiyouga closed this as not planned Won't fix, can't repro, duplicate, stale Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4bit量化Qwen1.5-72B 卡在：Quantizing model.layers blocks : 100% .... #4018

4bit量化Qwen1.5-72B 卡在：Quantizing model.layers blocks : 100% .... #4018

camposs1979 commented May 31, 2024

4bit量化Qwen1.5-72B 卡在：Quantizing model.layers blocks : 100% .... #4018

4bit量化Qwen1.5-72B 卡在：Quantizing model.layers blocks : 100% .... #4018

Comments

camposs1979 commented May 31, 2024

Reminder

Reproduction

DO NOT use quantized model or quantization_bit when merging lora weights

Expected behavior

System Info

Others