You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Want to use Cyrillic characters with diff_match_patch (python version,
release), but got errors like:
"UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 0:
unexpected end of data"
appending in some places to strings ".decode("utf-8").encode("utf-8")",
seem to solve the problems, but I guess not 100%.
see the attached patch (and for any case new file).
Alexandr.
Original issue reported on code.google.com by sashul...@gmail.com on 11 May 2008 at 1:12
Thank you Alexandr for the bug report and the patches. Sorry for the delay. I
have
fixed the Unicode issues in diff_fromDelta and patch_fromText. In both cases I
added:
if type(text) == unicode:
text = text.encode("ascii")
These are two functions which are expecting a subset of ASCII characters.
However, your patch also made changes to diff_text1, diff_text2, patch_apply
and patch_obj.__str__. Despite many tests, I am unable to find scenarios where
the existing code fails when passed Unicode. An example testcase would be most
apreciated.
In the mean time, I've pushed out a new version which includes the Unicode
fixes for
diff_fromDelta and patch_fromText in the Python version, as well as a new unit
test
in all three versions which verifies the behaviour of invalid Unicode sequences
(e.g.
%c3%xy).
Original comment by neil.fra...@gmail.com on 14 May 2008 at 7:47
Original issue reported on code.google.com by
sashul...@gmail.com
on 11 May 2008 at 1:12Attachments:
The text was updated successfully, but these errors were encountered: