-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于多机多卡训练 #12
Comments
您好,由于目前我们的多卡训练默认打开了aggregate(由训练配置中的 |
这里是all-gather操作发生的代码位置: Chinese-CLIP/cn_clip/training/train.py Lines 25 to 33 in 2746589
|
懂了,赞👍🏻,谢谢~ |
例如:1台8卡v100,batch size 能到2048, 4台8卡v100 batch size只能到1024。 |
你好,我用2台4卡A100训练1个epoch时间在10h,单机4卡A100一个epoch的训练时间30min。请问哪些地方导致多机多卡训练效率降低的?
The text was updated successfully, but these errors were encountered: