-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
语料问题 #3
Comments
你好,这个可以不用理会的。是pytorch-transformer里的tokenizer的一个自我检查。你可以改这个库的源码把这个提示关掉,也可以无视他。 |
非常感谢回应... |
一般推荐至少1080Ti或者2080Ti以上。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我去下载了斗破苍穹的语料,但它是纯text档案而非json档案。
当我修改train.py中...
# doupo = json.load(f)
doupo = f.read()
却发生警示...
W0726 13:58:53.042458 4556223936 tokenization.py:126] Token indices sequence length is longer than the specified maximum sequence length for this BERT model (5340786 > 512). Running this sequence through BERT will result in indexing errors
请问这是何问题?
The text was updated successfully, but these errors were encountered: