UnicodeDecodeError in Python version when using Cyrillic characters. #9

GoogleCodeExporter · 2015-07-15T12:36:26Z

Want to use Cyrillic characters with diff_match_patch (python version,
release), but got errors like:
"UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 0:
unexpected end of data"

appending in some places to strings ".decode("utf-8").encode("utf-8")",
seem to solve the problems, but I guess not 100%.

see the attached patch (and for any case new file).

Alexandr.

Original issue reported on code.google.com by sashul...@gmail.com on 11 May 2008 at 1:12

Attachments:

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-07-15T12:36:26Z

Thank you Alexandr for the bug report and the patches.  Sorry for the delay.  I 
have
fixed the Unicode issues in diff_fromDelta and patch_fromText.  In both cases I 
added:
    if type(text) == unicode:
      text = text.encode("ascii")
These are two functions which are expecting a subset of ASCII characters.

However, your patch also made changes to diff_text1, diff_text2, patch_apply
and patch_obj.__str__.  Despite many tests, I am unable to find scenarios where
the existing code fails when passed Unicode.  An example testcase would be most
apreciated.

In the mean time, I've pushed out a new version which includes the Unicode 
fixes for
diff_fromDelta and patch_fromText in the Python version, as well as a new unit 
test
in all three versions which verifies the behaviour of invalid Unicode sequences 
(e.g.
%c3%xy).

Original comment by neil.fra...@gmail.com on 14 May 2008 at 7:47

Changed state: Fixed

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated labels Jul 15, 2015

GoogleCodeExporter closed this as completed Jul 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError in Python version when using Cyrillic characters. #9

UnicodeDecodeError in Python version when using Cyrillic characters. #9

GoogleCodeExporter commented Jul 15, 2015

GoogleCodeExporter commented Jul 15, 2015

UnicodeDecodeError in Python version when using Cyrillic characters. #9

UnicodeDecodeError in Python version when using Cyrillic characters. #9

Comments

GoogleCodeExporter commented Jul 15, 2015

GoogleCodeExporter commented Jul 15, 2015