-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问如何让程序并行执行,用到多GPU卡 #37
Comments
same issue,any solution? |
如果是多卡测试,可以参考eval_mm/evaluate_caption.py 的实现来进行多卡和组batch执行 |
我尝试这样做,但是使用model.generate()的方法, 生成效果远远不如model.chat(),请问model.chat()有批量推理的方法嘛? |
同求,请问该怎么实现啊 @ShuaiBai623 |
同样的问题 求助 @ShuaiBai623 |
1 similar comment
同样的问题 求助 @ShuaiBai623 |
HF上提供了一个qwen_generation_utils.py,循环调用里面make_context函数,组batch,然后调用generate函数和他提供的decode_token函数就好了 |
所以结论是qwen vl不支持多卡推理?官方没计划支持吗? |
这个要求 输入的query长度必须一致,如果长度不一致的话,现在好像没有合适的padding方法。 |
同问,4卡RTX,共96G显存,推理的时候只在第一张卡上执行,指定多卡的话又报错: |
@ybshaw 问题解决了么? |
没有,目前采用int4版本,可以单卡跑 |
请问大佬儿们有解决方案了吗? |
起始日期 | Start Date
No response
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
请问如何让程序并行执行,用到多GPU卡
基本示例 | Basic Example
请问如何让程序并行执行,用到多GPU卡
修改 device_map = "cuda" 为 device_map = "auto"
程序用了多卡,但报错:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:3!
缺陷 | Drawbacks
请问如何让程序并行执行,用到多GPU卡
未解决问题 | Unresolved questions
No response
The text was updated successfully, but these errors were encountered: