You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.
Shouldn't all file operation have encoding="utf-8" added to make it more portable on other systems like Windows? Unless there is other global switch that could be applied at the beginning to not crash with a message "[...]charmap' codec can't encode character[...]"
The text was updated successfully, but these errors were encountered:
Yeah, I'm not yet 100% sure myself if it should be UTF-8 or one should use system-default encoding dataset instead of UTF-8 and open them as such... Trying to train it on Polish text to see the results. Unfortunately it doesn't want to use Polish accent letters, for example replaces ł with normal l with samples. Maybe I'm missing something or it still needs more training? (although it uses ó which usually exists in 1-byte encoding format)
EDIT: Never-mind the above... It seems that the console output is UTF-8 in my CMD which just simply doesn't work, it would need to be converted to ANSI using Polish code page before output, so in my case UTF-8 is most valid way to read datasets (without BOM!). Sample files look OK.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Shouldn't all file operation have encoding="utf-8" added to make it more portable on other systems like Windows? Unless there is other global switch that could be applied at the beginning to not crash with a message "[...]charmap' codec can't encode character[...]"
The text was updated successfully, but these errors were encountered: