-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在CMeEE数据上报错 #44
Comments
应该是处理完后句子长度太长了,超过了设定的1000导致报错。如果句子里面英文单词过多的话不建议把单词拆成字母,不然句子很可能过长导致程序无法运行。 |
好的,我试试吧英文句子删掉试试。另外,这个1000可以修改吗?我尝试修改了这里的1000 |
即使这里修改了,还是会超出BERT的512个token的限制,同样会报错,最好直接将超出长度的句子处理掉。 |
好的,谢谢 |
可能是你数据处理的有问题,最好查验一下每个样本中的实体index与对应文本中的内容是否一致。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
debug后发现是
_dist_inputs[i, j] = dis2idx[-_dist_inputs[i, j]] + 9
这一句这里出错了,不知道应该怎么修改
我的数据处理后如下,中英文混杂的把英文也拆成了单个字符,不知道这样处理对不对
The text was updated successfully, but these errors were encountered: