Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 2400: ordinal not in range(128) #112

Closed
hmmhsu opened this issue Apr 29, 2019 · 7 comments

Comments

@hmmhsu
Copy link

hmmhsu commented Apr 29, 2019

I trained the model on GPU, python 2.7 installed. After that I try to decode this model on window with python 3.7.

(base) D:\NER\NCRFpp>python main.py --config demo.decode.config
Seed num: 42
MODEL: decode
Traceback (most recent call last):
File "main.py", line 442, in
data.load(data.dset_dir)
File "D:\NER\NCRFpp\utils\data.py", line 305, in load
tmp_dict = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 2400: ordinal not in range(128)

How can I solve it? Thanks in advance.

@jiesutd
Copy link
Owner

jiesutd commented Apr 30, 2019

This may be caused by the encoding problem. You can try pickle.load(f, encoding='latin1') or other encoding ways 'utf-8' etc.

@jiesutd
Copy link
Owner

jiesutd commented Apr 30, 2019

@hmmhsu
Copy link
Author

hmmhsu commented May 1, 2019

Thank you very much for your reply.
I already tried with other encoding 'utf-8', 'latin1' and also 'byte' but it doesn't work.

Traceback (most recent call last):
File "main.py", line 448, in
data.generate_instance('raw')
File "D:\NER\NCRFpp\utils\data.py", line 274, in generate_instance
self.raw_texts, self.raw_Ids = read_instance(self.raw_dir, self.word_alphabet, self.char_alphabet, self.feature_alphabets, self.label_alphabet, self.number_normalized, self.MAX_SENTENCE_LENGTH)
File "D:\NER\NCRFpp\utils\functions.py", line 23, in read_instance
in_lines = open(input_file,'r').readlines()
File "C:\Users\Hsu\AppData\Local\Continuum\anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 161: character maps to

@jiesutd
Copy link
Owner

jiesutd commented May 2, 2019

Oh, this is a different error. It seems the original error is solved.

The current error can be solved. It seems caused by the line in_lines = open(input_file,'r').readlines(), you can see this link: https://stackoverflow.com/questions/10487563/unicode-error-handling-with-python-3s-readlines

You can try to add 'errors='ignore'' in the open function;

@hmmhsu
Copy link
Author

hmmhsu commented May 3, 2019

@jiesutd Thank you for giving your time.
Now I move to Cent OS 7 from window because I think it happens because of python 2 and 3 compatibility.
But the following error happens. Is this because no GPU driver has been installed? Can it solve by installing GPU driver and cuda? Is there any ways?
Thank you.

error

@jiesutd
Copy link
Owner

jiesutd commented May 3, 2019

I guess it should be solved after install cuda

@jiesutd jiesutd closed this as completed May 3, 2019
@kaushikacharya
Copy link

You can try to add 'errors='ignore'' in the open function;

@jiesutd
IMHO this would lead to loss of data.
Instead one can use io.open() with encoding say utf-8.

I have raised a pull request with the above change: #184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants