-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tkinter clipboard_get() decodes characters incorrectly #58982
Comments
With the text 'abc€' copied to the clipboard, on Linux, where UTF-8 is the default encoding: Python 3.2.3 (default, Apr 12 2012, 21:55:50)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tkinter
>>> root = tkinter.Tk()
>>> root.clipboard_get()
'abcâ\x82¬'
>>> 'abc€'.encode('utf-8').decode('latin-1')
'abcâ\x82¬' I see the same behaviour in 2.7.3 as well (it returns a unicode string u'abc\xe2\x82\xac'). If the clipboard is only accessible at a bytes level, I think clipboard_get should return a bytes object. But I can reliably copy and paste non-ascii characters between programs, so it looks like it's possible to return unicode. |
Still worse. I get 'abc?'. Linux, Python 3.1, 3.2, and 3.3, UTF-8 locale. |
3.3, Win 7, Idle
>>> root.clipboard_get()
'abc€'
after cut from here |
This issue can be reproduced by pure Tcl/Tk: $ wish
% clipboard get
abc?
% clipboard get -type STRING
abc?
% clipboard get -type UTF8_STRING
abc€ Use I don't know whether it should just be documented (UTF8_STRING is not even mentioned in the clipboard_get docstring), or do we need to change the default behavior. |
On this computer, I see this from Tcl: $ wish
% clipboard get
abc\u20ac But here Python's following suit: >>> root.clipboard_get()
'abc\\u20ac' Which is odd, because as far as I know, my two computers run the same OS (Ubuntu 12.04) in the same configuration. I briefly thought the presence of xsel might be affecting it, but uninstalling it doesn't seem to make any difference. |
As is often the case with Tcl/Tk issues, there are platform differences. On OS X, with the two native Tcl/Tk implementations (Aqua Cocoa and Aqua Carbon), the examples work appear to work as is *and* type "UTF8_STRING" does not exist. The less commonly used X11 Tcl/Tk on OS X does support and require "UTF8_STRING" for the example given. So any doc change needs to be carefully worded. |
OK, after a quick bit of reading, I see why I'm confused: the clipboard actually works by requesting the text from the source program, so where you copy it from makes a difference. In my case, copying from firefox gives 'abc\\u20ac', and copying from Geany gives u'abc\xe2\x82\xac'. However, I still think there's something that can be improved in Python. As Serhiy points out, specifying type='UTF8_STRING' makes it work properly from both programs. The Tcl documentation recommends this as the best option for "modern X11 systems"[1]. From what Ned says, we can't make UTF8_STRING the default everywhere, but is there a way to detect if we're inside X11, and use UTF8_STRING by default there? |
There are definitely platform differences. As I noted, the original example works fine on Windows. However >>> root.clipboard_get(type='STRING')
'abc€'
>>> root.clipboard_get(type='UTF8_STRING')
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
root.clipboard_get(type='UTF8_STRING')
File "C:\Programs\Python33\lib\tkinter\__init__.py", line 549, in clipboard_get
return self.tk.call(('clipboard', 'get') + self._options(kw))
_tkinter.TclError: CLIPBOARD selection doesn't exist or form "UTF8_STRING" not defined Of course, on Windows I suspect that the unicode string is not copied to clipboard as utf8 bytes, so if clipboard contents are tagged, there would not be such a thing. Perhaps clipboards work differently on diffferent OSes. >>> help(root.clipboard_get)
...
The type keyword specifies the form in which the data is
to be returned and should be an atom name such as STRING
or FILE_NAME. Type defaults to STRING. (Actually, FILE_NAME give the same exception as UTF8_STRING.) |
Most likely the best way to determine the windowing system is to use the "tk windowingsystem" command (http://www.tcl.tk/man/tcl8.5/TkCmd/tk.htm#M10), so something like this: root = tkinter.Tk()
root.call(('tk', 'windowingsystem')) As documented, the call returns 'x11' for X11-based systems, 'win32' for Windows, and 'aqua' for the native OS X implementations. |
Thanks, Ned. Does it seem like a good idea to test the windowing system like that, and default to UTF8_STRING if it's x11? So far, I've not found any case on X where STRING works but UTF8_STRING doesn't. If it seems reasonable, I'm happy to have a go at making a patch. |
A patch would be great. I don't have a strong opinion about the issue one way or another. I suppose it would simplify things for Python 3 users if the clipboard results were returned properly in the default case when no 'type' argument is passed to clipboard_get(). For Python 2, changing things seems a little more questionable but, as long as it was already returning a unicode object in that case, it sounds like a bug fix rather than a feature. Martin, Andrew: any opinions on this? |
Here's a patch that makes UTF8_STRING the default type for clipboard_get and selection_get when running in X11. |
Patch looks good for me, works fine. |
Indeed, and there don't seem to be any other tests for the clipboard functionality. |
You are right: there are no tests as well as for the most part of tkinter. |
I'm skeptical about the patch. In both 2.7 and 3.x, clipboard_get returns a Unicode string, yet it fails to decode it properly. So I think this is the bug that ought to be fixed (using the proper encoding). Defaulting to UTF8_STRING is a new feature, IMO, and shouldn't be done for 2.7 (or 3.2). |
Martin, is there a way for _tkinter to know whether the result returned from Tcl/Tk is an encoded string or not in this case? With regard to the patch, it would be better to cache the results of the first-time call to get the windowingsystem value so that we don't have to make two calls down into Tcl for each clipboard_get. |
У пт, 2012-05-11 у 21:25 +0000, Thomas Kluyver пише:
Perhaps there will be problems with the old (very old) closed source A few years ago (in Debian Sarge) even xsel did not work with the |
But the encoding used seemingly depends on the source application - Geany (GTK 2, I think) seemingly sends UTF8 text anyway, whereas Firefox escapes the unicode character. So I don't think we can correctly decode the STRING value in all cases. The Tk documentation describes UTF8_STRING as being the "most useful" type on modern X11. |
Agree. Opera sends 'abc?' literally. |
Off-hand, I don't know. I suppose there is a way to do this correctly,
That also. |
Ah, ok. IIUC, support for UTF8_STRING would also be in the realm of This I could also accept for 2.7, since it "shouldn't" have a potential |
+1 to Martin's proposal |
OK, I'll produce an updated patch. |
As requested, the second version of the patch (x11-clipboard-try-utf8):
|
Not to bikeshed here but I think it would be better to cache the windowingsystem value at the module level since I assume an application could be calling clipboard_get on different tkinter objects and I don't there is any possibility that the windowingsystem value could vary within one interpreter invocation. |
I'm happy to put the cache at the module level, but I'll give other people a chance to express their views before I dive into the code again. I imagine most applications would only call clipboard_get() on one item, so it wouldn't matter. However, my own interest in this is from IPython, where we create a Tk object just to call clipboard_get() once, so a module level cache would be quicker, albeit only a tiny bit. |
Why Misc.tk is not a module level variable? |
The 3rd revision of the patch has the cache at the module level. It's a bit awkward, because there's no module level function to call to retrieve it (as far as I know), so it's exposed by objects which can call Tk. Also, serhiy pointed out a mistake in the 2nd revision, which is fixed ('selection' instead of 'clipboard'). |
Serhiy, I don't know why Misc.Tk is not module level but it isn't so caching global attributes there isn't effective. However, upon further consideration, I take back my original suggestion of caching at the module level primarily because I can think of future scenarios where it might be possible that there are different windowing systems supported in the same Python instance. I now think the best solution is to cache at the Tk root object level; that appears to be a simple change to Thomas's 2nd revision. Sorry about that! Here is a fourth version (one for 3.x and one for 2.7) based on the second which includes the fix from the 3rd. I started to write a simple test for the clipboard functions but then realized that there doesn't seem to be a practical way to effectively test in a machine-independent way without destroying the contents of the Tk clipboard and hence the user's desktop clipboard, not a friendly thing to do. For example, the clipboard might contain a data type not supported by the platform's Tk, like pict data on OS X. So I'm not including the test here but it did verify that the attribute was being properly cached across multiple tkinter objects. Thanks to Thomas for the patch and to Serhiy for reviewing. By the way, Thomas, for your patch to be included, you should submit a PSF contributor agreement as described here: http://www.python.org/psf/contrib/. Once that is in place and if the patch looks good to everyone, I'll apply it. |
I've submitted the contributor agreement, though I've not yet heard anything back about it. |
...And mere minutes after I said I hadn't heard anything, I've got the confirmation email. :-) |
Congratulations! |
I'm ok with last patch version. |
New changeset f70fa654f70e by Ned Deily in branch '2.7': New changeset 41382250e5e1 by Ned Deily in branch '3.2': New changeset 97601cbf169f by Ned Deily in branch 'default': |
Applied for release in 2.7.4, 3.2.4 and 3.3.0. Thanks all! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: