UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1539: invalid start byte #11

ArriettyDouDou · 2018-03-14T02:18:39Z

When I run
python preprocess_asap.py -i training_set_rel3.tsv

There are the errors:
Traceback (most recent call last):
File "preprocess_asap.py", line 44, in
dataset = collect_dataset(args.input_file)
File "preprocess_asap.py", line 34, in collect_dataset
for line in f:
File "/home/wyn/anaconda3/envs/py3NLP/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1539: invalid start byte

I don't know why do I get it?

For help!

brucerowan · 2018-05-07T02:15:14Z

I am getting this error too

jkdufair · 2018-05-08T19:31:15Z

This appears to be related to trying to run nea w/ Python 3. I had this until I switched to Python 2.7.

brucerowan · 2018-05-09T19:50:10Z

@jkdufair even after switching to Python 2.7 i am getting this error:
File "preprocess_asap.py", line 48, in
create_dataset(lines, 'fold_%d/%s.tsv' % (fold_idx, dataset_type))
File "preprocess_asap.py", line 25, in create_dataset
f_write = open(output_fname, 'w',encoding='utf-16')
TypeError: 'encoding' is an invalid keyword argument for this function

brucerowan · 2018-05-09T19:51:50Z

I just removed the encoding keyword and it worked!

tim-hub · 2019-02-05T21:55:29Z

I got the same error, and fixed it. and I forked it to try to migrate to python3.6

https://github.com/tim-hub/nea

rahuls321 · 2019-06-11T05:57:17Z

Either you change the python version i.e from python 3 to python 2
OR
you can use this line with open(input_file, encoding="latin-1") as f: instead of the previous one in order to remove the error.

DamonCC · 2019-09-05T11:45:52Z

I got the same error, and fixed it. and I forked it to try to migrate to python3.6

https://github.com/tim-hub/nea

Can you get the same result as the paper? My Kappa score is between 0.5 and 0.6, and I can't reach the score of the paper.

jkdufair · 2019-09-19T12:56:49Z

@DamonCC See #13

eiphy · 2020-07-18T10:30:14Z

@jkdufair ,

By using your method, the result: (left for jkdufair's method and right is generated by python2)

@rahuls321,
The latin-1 also gives some mysterious symbols.

My solution is use with open(input_file, encoding="cp1251", errors='ignore') as f:.

varunkhanna1993 · 2021-04-14T06:03:30Z

I got the same error, and fixed it. and I forked it to try to migrate to python3.6

https://github.com/tim-hub/nea

THIS worked. Thanks ;)

quamar313 · 2022-03-02T19:34:02Z

I was facing same problem, I just added encoding="latin-1" after the file name and the problem get solved

zin288 · 2023-03-04T11:36:14Z

I was facing same problem, I just added encoding="latin-1" after the file name and the problem get solved

This worked for me, thank you! :)

tats2bzr · 2023-06-27T21:29:05Z

i am facing the same error

ys7yoo mentioned this issue Feb 14, 2021

'UnicodeDecodeError' for preprocessing ys7yoo/nea_tf#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1539: invalid start byte #11

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1539: invalid start byte #11

ArriettyDouDou commented Mar 14, 2018

brucerowan commented May 7, 2018

jkdufair commented May 8, 2018

brucerowan commented May 9, 2018

brucerowan commented May 9, 2018

tim-hub commented Feb 5, 2019

rahuls321 commented Jun 11, 2019

DamonCC commented Sep 5, 2019

jkdufair commented Sep 19, 2019

eiphy commented Jul 18, 2020 •

edited

Loading

varunkhanna1993 commented Apr 14, 2021

quamar313 commented Mar 2, 2022

zin288 commented Mar 4, 2023

tats2bzr commented Jun 27, 2023

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1539: invalid start byte #11

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1539: invalid start byte #11

Comments

ArriettyDouDou commented Mar 14, 2018

brucerowan commented May 7, 2018

jkdufair commented May 8, 2018

brucerowan commented May 9, 2018

brucerowan commented May 9, 2018

tim-hub commented Feb 5, 2019

rahuls321 commented Jun 11, 2019

DamonCC commented Sep 5, 2019

jkdufair commented Sep 19, 2019

eiphy commented Jul 18, 2020 • edited Loading

varunkhanna1993 commented Apr 14, 2021

quamar313 commented Mar 2, 2022

zin288 commented Mar 4, 2023

tats2bzr commented Jun 27, 2023

eiphy commented Jul 18, 2020 •

edited

Loading