-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to understand the index_answer funtion #30
Comments
It's probably due to the tokenization inconsistency between the annotated answer span and spacy tokenization. It's likely to happen where the corpus has unusual punctuations. |
So you're not considering those examples for training, right? |
Yes. It's hard to automatically fix the tokenization errors. |
I banged my head for some days in trying to debug and fix them. It's largely due to the absence of a space character (' ') just before or just after the answer span in the answer. I reduced the errors to 10-15 erroneous examples and dropped them finally. Thank you for your help! |
No, they are not necessary.
If you use the lower-cased GloVe, you should lowercase the text before building the vocab. Otherwise, the vocab and the embedding tokens may not match.
Yes, you lowercase the data when using the GloVe 6B lowercased version. |
Thanks a lot! |
The last condition in this function, wherein you return
(None, None)
. Does this condition arise or is it just for avoiding a crash.I am trying to implement the same paper and when I try to get the final labels for my context-question pair, there are many answers that result in ValueError. Is this some flaw in dataset?
Thank you.
The text was updated successfully, but these errors were encountered: