Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
BUG: to_clipboard text truncated for Python 3 on Windows for UTF-16 text #25040
For windows users where Python is compiled with UCS-4 (Python 3 primarily), tables copied to clipboard are missing data from the end when there are any unicode characters in the dataframe that have a 4-byte representation in UTF-16 (i.e. in the U+010000 to U+10FFFF range). The bug can be reproduced here:
import pandas obj=pandas.DataFrame([u'\U0001f44d\U0001f44d', u'12345']) obj.to_clipboard()
where the clipboard text results in
One character is chopped from the end of the clipboard string for each 4-byte unicode character copied.
or more to the point:
The cause of this issue is that
My proposed change (affecting only windows clipboard operations) first converts the text to UTF-16 little endian because that is the format used by windows, then measures the length of the resulting byte string, rather than using Python's
I've tested this change in python 3.6 and 2.7 on windows 7 x64. I don't expect this causing other issues with other versions of windows but I would appreciate if anyone on older versions of windows would double check.
@@ Coverage Diff @@ ## master #25040 +/- ## ========================================== + Coverage 92.38% 92.38% +<.01% ========================================== Files 166 166 Lines 52401 52402 +1 ========================================== + Hits 48409 48412 +3 + Misses 3992 3990 -2
@@ Coverage Diff @@ ## master #25040 +/- ## ========================================== - Coverage 92.38% 92.36% -0.02% ========================================== Files 166 166 Lines 52401 52408 +7 ========================================== - Hits 48409 48408 -1 - Misses 3992 4000 +8