Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练数据格式问题 #22

Closed
huaxiaohua opened this issue Aug 14, 2019 · 3 comments
Closed

训练数据格式问题 #22

huaxiaohua opened this issue Aug 14, 2019 · 3 comments

Comments

@huaxiaohua
Copy link

您好,想请教几个问题,1)训练数据的格式是["文章1","文章2","文章3"]这样吗,首尾必须带[]吗 ? 2)针对斗破苍穹的数据,是把每一章当作一篇文章吗 ,原数据是包含了全部章节,您是怎么做处理的 3)如果单篇文章很长,会做截断吗(比如bert会限制长度512),非常感谢

@Morizeyao
Copy link
Owner

1,需要
2,训练单超长文档(斗破苍穹)请使用train_single.py
3,这个不是bert,文章很长的话会做截断,以一定步长移动窗口摘取样本

@huaxiaohua
Copy link
Author

谢谢~还有个问题,为啥我用单卡训练的时候,总是占用GPU:0呢,我已经制定GPU为7了,但每次跑都是会占用GPU 0,很奇怪

@Morizeyao
Copy link
Owner

因为系统只能看到gpu7,所以他认为这就是第0块显卡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants