Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问如何让程序并行执行,用到多GPU卡 #37

Open
amwork2020 opened this issue Aug 30, 2023 · 13 comments
Open

请问如何让程序并行执行,用到多GPU卡 #37

amwork2020 opened this issue Aug 30, 2023 · 13 comments
Labels
question Further information is requested

Comments

@amwork2020
Copy link

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

请问如何让程序并行执行,用到多GPU卡

基本示例 | Basic Example

请问如何让程序并行执行,用到多GPU卡
修改 device_map = "cuda" 为 device_map = "auto"
程序用了多卡,但报错:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:3!

缺陷 | Drawbacks

请问如何让程序并行执行,用到多GPU卡

未解决问题 | Unresolved questions

No response

@amwork2020 amwork2020 added the question Further information is requested label Aug 30, 2023
@Luccadoremi
Copy link

same issue,any solution?

@ShuaiBai623
Copy link
Collaborator

如果是多卡测试,可以参考eval_mm/evaluate_caption.py 的实现来进行多卡和组batch执行

@Keep-lucky
Copy link

Keep-lucky commented Oct 18, 2023

如果是多卡测试,可以参考eval_mm/evaluate_caption.py 的实现来进行多卡和组batch执行

我尝试这样做,但是使用model.generate()的方法, 生成效果远远不如model.chat(),请问model.chat()有批量推理的方法嘛?

@atomrun39
Copy link

如果是多卡测试,可以参考eval_mm/evaluate_caption.py 的实现来进行多卡和组batch执行

我尝试这样做,但是使用model.generate()的方法, 生成效果远远不如model.chat(),请问model.chat()有批量推理的方法嘛?

同求,请问该怎么实现啊 @ShuaiBai623

@iFe1er
Copy link

iFe1er commented Oct 26, 2023

同样的问题 求助 @ShuaiBai623

1 similar comment
@CrazyBrick
Copy link

同样的问题 求助 @ShuaiBai623

@FangGet
Copy link

FangGet commented Nov 6, 2023

HF上提供了一个qwen_generation_utils.py,循环调用里面make_context函数,组batch,然后调用generate函数和他提供的decode_token函数就好了

@peytoncai
Copy link

所以结论是qwen vl不支持多卡推理?官方没计划支持吗?

@drockser
Copy link

HF上提供了一个qwen_generation_utils.py,循环调用里面make_context函数,组batch,然后调用generate函数和他提供的decode_token函数就好了

这个要求 输入的query长度必须一致,如果长度不一致的话,现在好像没有合适的padding方法。

@ybshaw
Copy link

ybshaw commented May 10, 2024

同问,4卡RTX,共96G显存,推理的时候只在第一张卡上执行,指定多卡的话又报错:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:3!,请问推理的时候有办法分布到多卡上执行么

@cnahmgx
Copy link

cnahmgx commented May 22, 2024

@ybshaw 问题解决了么?

@ybshaw
Copy link

ybshaw commented May 23, 2024

@ybshaw 问题解决了么?

没有,目前采用int4版本,可以单卡跑

@yihp
Copy link

yihp commented Jun 24, 2024

请问大佬儿们有解决方案了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests