Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1539: invalid start byte #11

Open
ArriettyDouDou opened this issue Mar 14, 2018 · 13 comments

Comments

@ArriettyDouDou
Copy link

When I run
python preprocess_asap.py -i training_set_rel3.tsv

There are the errors:
Traceback (most recent call last):
File "preprocess_asap.py", line 44, in
dataset = collect_dataset(args.input_file)
File "preprocess_asap.py", line 34, in collect_dataset
for line in f:
File "/home/wyn/anaconda3/envs/py3NLP/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1539: invalid start byte

I don't know why do I get it?

For help!

@brucerowan
Copy link

I am getting this error too

@jkdufair
Copy link

jkdufair commented May 8, 2018

This appears to be related to trying to run nea w/ Python 3. I had this until I switched to Python 2.7.

@brucerowan
Copy link

@jkdufair even after switching to Python 2.7 i am getting this error:
File "preprocess_asap.py", line 48, in
create_dataset(lines, 'fold_%d/%s.tsv' % (fold_idx, dataset_type))
File "preprocess_asap.py", line 25, in create_dataset
f_write = open(output_fname, 'w',encoding='utf-16')
TypeError: 'encoding' is an invalid keyword argument for this function

@brucerowan
Copy link

I just removed the encoding keyword and it worked!

@tim-hub
Copy link

tim-hub commented Feb 5, 2019

I got the same error, and fixed it. and I forked it to try to migrate to python3.6

https://github.com/tim-hub/nea

@rahuls321
Copy link

Either you change the python version i.e from python 3 to python 2
OR
you can use this line with open(input_file, encoding="latin-1") as f: instead of the previous one in order to remove the error.

@DamonCC
Copy link

DamonCC commented Sep 5, 2019

I got the same error, and fixed it. and I forked it to try to migrate to python3.6

https://github.com/tim-hub/nea

Can you get the same result as the paper? My Kappa score is between 0.5 and 0.6, and I can't reach the score of the paper.

@jkdufair
Copy link

@DamonCC See #13

@eiphy
Copy link

eiphy commented Jul 18, 2020

@jkdufair ,

By using your method, the result: (left for jkdufair's method and right is generated by python2)

Screenshot from 2020-07-19 02-27-55

@rahuls321,
The latin-1 also gives some mysterious symbols.

My solution is use with open(input_file, encoding="cp1251", errors='ignore') as f:.

@varunkhanna1993
Copy link

I got the same error, and fixed it. and I forked it to try to migrate to python3.6

https://github.com/tim-hub/nea

THIS worked. Thanks ;)

@quamar313
Copy link

I was facing same problem, I just added encoding="latin-1" after the file name and the problem get solved

@zin288
Copy link

zin288 commented Mar 4, 2023

I was facing same problem, I just added encoding="latin-1" after the file name and the problem get solved

This worked for me, thank you! :)

@tats2bzr
Copy link

i am facing the same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests