UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte #309

ngankt2 · 2021-03-28T06:50:07Z

ngankt2 · 2021-03-28T06:57:18Z

/pyglossary-master/pyglossary/plugins/stardict.py", line 319, in iter
word = b_word.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte

ilius · 2021-03-29T03:27:25Z

Please attach your StarDict glossary.

ngankt2 · 2021-03-29T16:14:07Z

dict-error.zip

dict-error-2.zip

This is my dict

Please attach your StarDict glossary.

ilius · 2021-03-29T23:52:10Z

.dict files have invalid UTF-8 characters, that's why it's giving error.

I just added a new read-option to convert it regardless.
Add --read-options unicode_errors=replace flag to you command.
Then it will convert, replacing invalid UTF-8 characters with "�".

You may also want to report this to the glossary author/publisher.
https://sourceforge.net/p/ovdp/discussion/661855/

ngankt2 added the Feature label Mar 28, 2021

ngankt2 closed this as completed Mar 28, 2021

ngankt2 reopened this Mar 28, 2021

ilius added Q&A and removed Feature labels Mar 29, 2021

ilius added a commit that referenced this issue Mar 29, 2021

StarDict reader: add option unicode_errors for invalid UTF-8 data, #309

e2e5aa8

ilius closed this as completed Mar 29, 2021

ilius added the Improvement label Mar 30, 2021

ksignorini mentioned this issue Jul 16, 2023

Error converting Stardict to Kobo #506

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte #309

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte #309

ngankt2 commented Mar 28, 2021 •

edited

Loading

ngankt2 commented Mar 28, 2021

ilius commented Mar 29, 2021

ngankt2 commented Mar 29, 2021

ilius commented Mar 29, 2021

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte #309

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte #309

Comments

ngankt2 commented Mar 28, 2021 • edited Loading

ngankt2 commented Mar 28, 2021

ilius commented Mar 29, 2021

ngankt2 commented Mar 29, 2021

ilius commented Mar 29, 2021

ngankt2 commented Mar 28, 2021 •

edited

Loading