New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer Fast bug: ValueError: TextInputSequence must be str #7735
Comments
Hi, thanks for opening such a detailed issue with a notebook! Unfortunately, fast tokenizers don’t currently work with the QA pipeline. They will in the second pipeline version which is expected in a few weeks to a few months, but right now please use the slow tokenizers for the QA pipeline. Thanks! |
I think the issue is still there. |
Please open a new issue with your environment, an example of what the issue is and how you expect it to work. Thank you. |
I could fix this simply by changing:
to:
|
I also find this problem when using transformers. I check my data and find that if csv file contains much Null data or the length of str is 0, this error will be returned. I filter these data and I can successfully run my code. |
double check the data and make sure there is no nan in your data, this is the problem i encountered |
Environment info
transformers
version:Who can help
@mfuntowicz
Information
Model I am using: Initially Electra but I tested it out with BERT, DistilBERT and RoBERTa
It's using your scripts, but again, it believe it wouldn't work if I did it myself. The model is trained on SQuAD.
Error traceback
To reproduce
Steps to reproduce the behavior:
I've also made a small notebook to test it out for yourself. here
Expected behavior
Instead of giving an error, I would expect the tokenizer to work...
The text was updated successfully, but these errors were encountered: