Skip to content
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.

[SOLUTION] UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 29: character maps to <undefined> #10

Open
ghost opened this issue Jun 10, 2019 · 1 comment

Comments

@ghost
Copy link

ghost commented Jun 10, 2019

In Win10 I was constantly getting the following error, when training the model in non-English language (special characters like čšž) - regardless of the model used:

Traceback (most recent call last):
File "train.py", line 293, in
main()
File "train.py", line 258, in main
generate_samples()
File "train.py", line 228, in generate_samples
fp.write('\n'.join(all_text))
File "C:\Users\6756\AppData\Local\Programs\Python\Python36\lib\encodings\cp1250.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 1581: character maps to

I have fixed the issue in file train.py, by adding encoding="utf-8" in the function generate_samples():

with open(
                    os.path.join(SAMPLE_DIR, args.run_name,
                                 'samples-{}').format(counter), 'w', encoding="utf-8") as fp:
                fp.write('\n'.join(all_text))

I thought someone else might be struggling with this issue and might find it useful.

@ProtoxiDe22
Copy link

this issue can also occur while using encode.py, in load_dataset.py, and can be solved in the same way by adding encoding="utf-8" in the load_dataset() function
# Plain text with open(path, 'r', encoding="utf-8") as fp:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant