Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ascend卡上无法训练deepseek模型 是否支持呢 #4361

Open
1 task done
sweetning0809 opened this issue Jun 18, 2024 · 3 comments
Open
1 task done

Ascend卡上无法训练deepseek模型 是否支持呢 #4361

sweetning0809 opened this issue Jun 18, 2024 · 3 comments
Labels
npu This problem is related to NPU devices pending This problem is yet to be addressed

Comments

@sweetning0809
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

在npu上训练deepseek系列模型,需要flash attn库但是因为冲突 npu无法使用该库导致无法训练
请问是否考虑支持呢 可能需要把向前推理换成flashattn算子:https://www.hiascend.com/document/detail/zh/Pytorch/60RC1/ptmoddevg/trainingmigrguide/performance_tuning_0027.html

Reproduction

llamafactory cli train

Expected behavior

希望可以支持npu训练deepseek

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jun 18, 2024
@sweetning0809
Copy link
Author

研究了一下代码主要涉及的部分是longlora.py中的LlamaFlashAttention2和LlamaSdpaAttention 可能需要按照https://www.hiascend.com/document/detail/zh/Pytorch/60RC1/ptmoddevg/trainingmigrguide/performance_tuning_0027.html 将transformer中的LlamaFlashAttention2文件第516和531行做替换为文档中的torch_npu.npu_fusion_attention
image
image
Sdpa同理 可能可以支持一下

@sweetning0809
Copy link
Author

同时需要更改 模型中的modeling_deepseek.py

@sweetning0809
Copy link
Author

@hiyouga hiyouga added the npu This problem is related to NPU devices label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
npu This problem is related to NPU devices pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants