Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

结果和原文有差距。 #20

Closed
coolcoolcat opened this issue Aug 29, 2020 · 9 comments · Fixed by #35
Closed

结果和原文有差距。 #20

coolcoolcat opened this issue Aug 29, 2020 · 9 comments · Fixed by #35

Comments

@coolcoolcat
Copy link

我将环境配置完毕后,在训练模型时抛出异常:InternalError: GPU sync failed。我觉得可能是一次性训练太多数据内存不够,于是我将DIN模型的BATCH_SIZE改成512,结果为0.629。将DSIN模型的BATCH_SIZE改为256,结果为5.63。百思不得解,也尝试过其它BATCH_SIZE,结果大同小异。求大神不吝赐教。

@zqlao
Copy link

zqlao commented Oct 17, 2020

我也遇到同样情况。先是出现异常:InternalError: GPU sync failed。然后将batch_size改成256,结果为0.5668。同求大神赐教。

@shenweichen
Copy link
Owner

shenweichen commented Oct 18, 2020

这是我刚刚clone项目重新跑的结果 auc为0.6352,请确定按照项目的依赖要求运行实验代码
image

@coolcoolcat
Copy link
Author

感谢您的回复,那可能是我的配置还有地方没有吻合,我再试试吧。

@coolcoolcat
Copy link
Author

我也遇到同样情况。先是出现异常:InternalError: GPU sync failed。然后将batch_size改成256,结果为0.5668。同求大神赐教。请问你InternalError: GPU sync failed问题解决了么?用的什么环境能探讨一下吗?

@coolcoolcat
Copy link
Author

我也遇到同样情况。先是出现异常:InternalError: GPU sync failed。然后将batch_size改成256,结果为0.5668。同求大神赐教。

请问您解决了么?

@Arclabs001
Copy link

请问是因为显存不够吗?多大显存才够用啊,或者有没有什么embedding lookup之类的操作可以改成在cpu执行?

@coolcoolcat
Copy link
Author

coolcoolcat commented Dec 22, 2020 via email

@Li-fAngyU
Copy link

@shenweichen 请问一下 一个epoch就可以把模型训练好吗?

@Jeaninezpp
Copy link

非常感谢您的回复,目前问题已解决,我这里只有一个GPU无法切换,哈哈哈 在 2020-12-19 12:39:39,"Zheng Chen" notifications@github.com 写道: 请问是因为显存不够吗?多大显存才够用啊,或者有没有什么embedding lookup之类的操作可以改成在cpu执行? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

请问您是怎么解决的呢?只需要调小BATCH_SIZE吗?谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants