最新代码增量预训练qwen-7b后无法评测 #601

fengcai24 · 2023-08-20T17:27:32Z

报错如下：

output = old_forward(*args, **kwargs)

File "/root/.cache/huggingface/modules/transformers_modules/checkpoint-2800/modeling_qwen.py", line 381, in forward
context_layer = self.core_attention_flash(q, k, v)
File "/opt/conda/envs/llama_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/checkpoint-2800/modeling_qwen.py", line 95, in forward
assert all((i.dtype in [torch.float16, torch.bfloat16] for i in (q, k, v)))
AssertionError

The text was updated successfully, but these errors were encountered:

fengcai24 · 2023-08-20T17:28:07Z

模型训练采用bf16

hiyouga · 2023-08-21T06:17:36Z

在模型目录下的 config.json 中禁用 flashattn

fengcai24 · 2023-08-21T15:34:31Z

已解决

fengcai24 closed this as completed Aug 21, 2023

hiyouga added the solved This problem has been already solved. label Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

最新代码增量预训练qwen-7b后无法评测 #601

最新代码增量预训练qwen-7b后无法评测 #601

fengcai24 commented Aug 20, 2023

fengcai24 commented Aug 20, 2023

hiyouga commented Aug 21, 2023

fengcai24 commented Aug 21, 2023

最新代码增量预训练qwen-7b后无法评测 #601

最新代码增量预训练qwen-7b后无法评测 #601

Comments

fengcai24 commented Aug 20, 2023

fengcai24 commented Aug 20, 2023

hiyouga commented Aug 21, 2023

fengcai24 commented Aug 21, 2023