llama2量化后版本加载报错 #15

WheatJH · 2023-08-03T06:47:27Z

llama2-7b-chat-hf，按照提供的量化步骤，得到4bit版本的模型并补齐模型文件，通过AutoModelForCausalLM.from_pretrained方式加载时，报NotImplementedError: Cannot copy out of meta tensor; no data!
环境配置：
accelerate==0.21.0
bitsandbytes==0.40.2
gradio==3.37.0
protobuf==3.20.3
scipy==1.11.1
sentencepiece==0.1.99
transformers==4.31.0
torch==1.13.0a0+340c412
cuda==11.7

chopin1998 · 2023-09-04T09:15:44Z

我看了一下，似乎新版的 transformer 可以直接进行量化后使用，不需要额外的量化过程？

model_id = "meta-llama/Llama-2-13b-chat-hf"

nf4_config = BitsAndBytesConfig(load_in_4bit=True,
                                bnb_4bit_quant_type="nf4",
                                bnb_4bit_use_double_quant=True,
                                bnb_4bit_compute_dtype=torch.bfloat16)


model_nf4 = AutoModelForCausalLM.from_pretrained(model_id, 
                                                 quantization_config=nf4_config)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama2量化后版本加载报错 #15

llama2量化后版本加载报错 #15

WheatJH commented Aug 3, 2023

chopin1998 commented Sep 4, 2023

llama2量化后版本加载报错 #15

llama2量化后版本加载报错 #15

Comments

WheatJH commented Aug 3, 2023

chopin1998 commented Sep 4, 2023