Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix encoding for sequence tagging dataset #506

Merged
merged 2 commits into from
Feb 22, 2019

Conversation

akurniawan
Copy link
Contributor

This PR is to fix the following error that I have encountered. The data need to be encoded first before torchtext able to load it perfectly.

  File "/opt/conda/lib/python3.6/site-packages/torchtext/data/dataset.py", line 78, in splits
    os.path.join(path, train), **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torchtext/datasets/sequence_tagging.py", line 29, in __init__
    for line in input_file:
  File "/opt/conda/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 301: ordinal not in range(128)

@mttk mttk merged commit 28fc055 into pytorch:master Feb 22, 2019
@mttk
Copy link
Contributor

mttk commented Feb 22, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants