Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte #309

Closed
ngankt2 opened this issue Mar 28, 2021 · 4 comments

Comments

@ngankt2
Copy link

ngankt2 commented Mar 28, 2021

image

@ngankt2 ngankt2 closed this as completed Mar 28, 2021
@ngankt2 ngankt2 reopened this Mar 28, 2021
@ngankt2
Copy link
Author

ngankt2 commented Mar 28, 2021

/pyglossary-master/pyglossary/plugins/stardict.py", line 319, in iter
word = b_word.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte

@ilius ilius added Q&A and removed Feature labels Mar 29, 2021
@ilius
Copy link
Owner

ilius commented Mar 29, 2021

Please attach your StarDict glossary.

@ngankt2
Copy link
Author

ngankt2 commented Mar 29, 2021

dict-error.zip

dict-error-2.zip

This is my dict

Please attach your StarDict glossary.

@ilius
Copy link
Owner

ilius commented Mar 29, 2021

.dict files have invalid UTF-8 characters, that's why it's giving error.

I just added a new read-option to convert it regardless.
Add --read-options unicode_errors=replace flag to you command.
Then it will convert, replacing invalid UTF-8 characters with "�".

You may also want to report this to the glossary author/publisher.
https://sourceforge.net/p/ovdp/discussion/661855/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants