-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch encoding for py2 preprocessing to UTF-8 #52
Conversation
Should fix the problem described in #48.
Hmm, it looks like the ascii thing is still a problem. I'm using these versions:
And I'm using code from test/translation.py:
My output is:
This kind of thing fixed it for me (messy quick fix), in torchtext/datasets/translation.py:
|
Why do you need the code in the middle block? Does |
for the reference, the issue described by @marikgoldstein is not what this PR was trying to fix in the first place. Perhaps raise a new issue specifically about encoding handling in the translation dataset? |
Sorry about that, I see now that it was a discussion about encoding in a different part of the code base. I'll make a new issue for it! |
Should fix the problem described in #48.