Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int8量化版本finetuning报错:RuntimeError: self and mat2 must have the same dtype #214

Open
zlht812 opened this issue May 8, 2023 · 6 comments

Comments

@zlht812
Copy link

zlht812 commented May 8, 2023

Traceback (most recent call last):
File "/data/ChatGLM-Tuning/finetune.py", line 117, in
main()
File "/data/ChatGLM-Tuning/finetune.py", line 110, in main
trainer.train()
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/transformers/trainer.py", line 2699, in training_step
loss = self.compute_loss(model, inputs)
File "/data/ChatGLM-Tuning/finetune.py", line 54, in compute_loss
return model(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
return self.base_model(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 1190, in forward
transformer_outputs = self.transformer(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 985, in forward
layer_ret = torch.utils.checkpoint.checkpoint(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 627, in forward
attention_outputs = self.attention(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 445, in forward
mixed_raw_layer = self.query_key_value(hidden_states)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward
result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: self and mat2 must have the same dtype
请教~

@calvinzhan
Copy link

这是因为量化版会把query_key_value改掉,封装;而lora又会改回来。这样input是float16, weight是int8,不能运算。不知有人成功解救这问题的吗?

@zlht812
Copy link
Author

zlht812 commented May 10, 2023

换了另外的训练脚本和f16权重,可以了。原版的训练脚本不支持int 8

@calvinzhan
Copy link

calvinzhan commented May 11, 2023

@zlht812
换了训练版本可以支持int8量化版模型了?换了f16权重,这里能说下是怎么做的吗?方便加下vx,交流下?我的是229402265

@songyi1999
Copy link

我也想问知道这个问题的答案,能发下吗?

@zlht812
Copy link
Author

zlht812 commented May 15, 2023

用的这个支持int8的lora:https://github.com/ssbuild/chatglm_finetuning
目前的情况是,lora训练完成,推理时,预训练模式使用f16,但使用int8方式加载。lora使用half()成功加载,但推理时,又报同样的错误。
image
推理用的预训练模型和训练lora用的一样,怀疑是lora()本身就是int8,所以去掉half(),结果显卡挂了:CUDA out of memory
等新服务器上线后,再测试下。

@Xzaohui
Copy link

Xzaohui commented Jun 21, 2023

有什么解决方案吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants