-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with non-standard characters? #6
Comments
Are the files UTF-8 encoded? Could you post a gist with an example? |
https://gist.github.com/BrekiTomasson/e9814f32c42376af4fe4795dc3cf95aa for the gist. The file is in UTF-8. |
Also, for reference, when it's been standing still for a while and not outputting anything, and I kill it via a simple ctrl-C, this is what I get:
|
Actually, never mind. It seems the problem might have been that there were other non-standard characters in there, parentheses and quotationmarks, especially. Once I cleaned them out, it worked better. |
Is there a potential problem if the corpus contains non-standard characters? I'm thinking, for example, Swedish characters like Å Ä Ö. I'm testing a corpus which contains this kind of character in most rows, and the process seems to hang without outputting anything.
The text was updated successfully, but these errors were encountered: