Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

characters with ord above 65535 fail to display in IDLE #56551

Closed
wujeksrujek mannequin opened this issue Jun 15, 2011 · 22 comments
Closed

characters with ord above 65535 fail to display in IDLE #56551

wujeksrujek mannequin opened this issue Jun 15, 2011 · 22 comments
Assignees
Labels
expert-tkinter type-bug An unexpected behavior, bug, or error

Comments

@wujeksrujek
Copy link
Mannequin

wujeksrujek mannequin commented Jun 15, 2011

BPO 12342
Nosy @loewis, @terryjreedy, @kbkaiser, @vstinner, @ericvsmith, @ned-deily, @ezio-melotti, @serwy, @bitdancer, @asvetlov, @florentx
Superseder
  • bpo-14200: Idle shell crash on printing non-BMP unicode character
  • Files
  • tcl_unicode_range.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/asvetlov'
    closed_at = <Date 2012-03-14.21:48:11.321>
    created_at = <Date 2011-06-15.20:59:59.587>
    labels = ['type-bug', 'expert-tkinter']
    title = 'characters with ord above 65535 fail to display in IDLE'
    updated_at = <Date 2012-03-14.22:15:48.701>
    user = 'https://bugs.python.org/wujeksrujek'

    bugs.python.org fields:

    activity = <Date 2012-03-14.22:15:48.701>
    actor = 'roger.serwy'
    assignee = 'asvetlov'
    closed = True
    closed_date = <Date 2012-03-14.21:48:11.321>
    closer = 'asvetlov'
    components = ['Tkinter']
    creation = <Date 2011-06-15.20:59:59.587>
    creator = 'wujek.srujek'
    dependencies = []
    files = ['23607']
    hgrepos = []
    issue_num = 12342
    keywords = ['patch']
    message_count = 22.0
    messages = ['138389', '138390', '138392', '138395', '138397', '138402', '138497', '138541', '146663', '146665', '146965', '146983', '146984', '146987', '146991', '146992', '146994', '146999', '154966', '155414', '155804', '155809']
    nosy_count = 14.0
    nosy_names = ['loewis', 'terry.reedy', 'kbk', 'vstinner', 'eric.smith', 'ned.deily', 'ezio.melotti', 'roger.serwy', 'r.david.murray', 'asvetlov', 'flox', 'python-dev', 'wujek.srujek', 'Ramchandra Apte']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '14200'
    type = 'behavior'
    url = 'https://bugs.python.org/issue12342'
    versions = ['Python 3.3']

    @wujeksrujek
    Copy link
    Mannequin Author

    wujeksrujek mannequin commented Jun 15, 2011

    The following code produces an exception:

    print('{:c}'.format(65536))

    when executed in Idle 3.2. The stack trace:

    >>> print('{:c}'.format(65536))
    Traceback (most recent call last):
      File "<pyshell#149>", line 1, in <module>
        print('{:c}'.format(65536))
      File "/usr/lib/python3.2/idlelib/PyShell.py", line 1231, in write
        self.shell.write(s, self.tags)
      File "/usr/lib/python3.2/idlelib/PyShell.py", line 1213, in write
        OutputWindow.write(self, s, tags, "iomark")
      File "/usr/lib/python3.2/idlelib/OutputWindow.py", line 40, in write
        self.text.insert(mark, s, tags)
      File "/usr/lib/python3.2/idlelib/Percolator.py", line 25, in insert
        self.top.insert(index, chars, tags)
      File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert
        self.delegate.insert(index, chars, tags)
      File "/usr/lib/python3.2/idlelib/PyShell.py", line 316, in insert
        UndoDelegator.insert(self, index, chars, tags)
      File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 81, in insert
        self.addcmd(InsertCommand(index, chars, tags))
      File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 116, in addcmd
        cmd.do(self.delegate)
      File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 219, in do
        text.insert(self.index1, self.chars, self.tags)
      File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert
        self.delegate.insert(index, chars, tags)
      File "/usr/lib/python3.2/idlelib/WidgetRedirector.py", line 104, in __call__
        return self.tk_call(self.orig_and_operation + args)
    ValueError: unsupported character

    Seems to work fine in a terminal (Gnome-terminal in this case):

    >>> print('{:c}'.format(0x10000))
    𐀀

    (my font doesn't have the glyph, but otherwise it works)

    Python version:
    >>> print(sys.version)
    3.2 (r32:88445, Mar 25 2011, 19:56:22) 
    [GCC 4.5.2]

    Os:
    wujek@home:~$ uname -a
    Linux studio 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

    wujek@home:~$ cat /etc/issue
    Ubuntu 11.04

    @wujeksrujek wujeksrujek mannequin added expert-IDLE expert-IO type-bug An unexpected behavior, bug, or error labels Jun 15, 2011
    @bitdancer
    Copy link
    Member

    bitdancer commented Jun 15, 2011

    Judging from the stack trace, it isn't str.format that's failing, it's tk failing to display it.

    @bitdancer bitdancer changed the title characters with ord above 65535 fail conversion with str.format for '{:c}' in IDLE characters with ord above 65535 fail to display in IDLE Jun 15, 2011
    @vstinner
    Copy link
    Member

    vstinner commented Jun 15, 2011

    U+10000 is not the most common character in fonts. You should try another character in U+10000-U+10FFFF range (non-BMP characters). The new funny emoticon are in this range, but I don't know if your Ubuntu setup includes a font supporting this range.
    http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F600.pdf

    @ned-deily
    Copy link
    Member

    ned-deily commented Jun 15, 2011

    From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl 8.5 (and earlier) does not support Unicode code points outside the BMP range as in this example. I don't think there is anything practical IDLE or tkinter can do about that.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 15, 2011

    From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl
    8.5 (and earlier) does not support Unicode code points outside
    the BMP range as in this example.

    Extract of http://wiki.tcl.tk/1364 :

    "RS 2008-07-09: Unicode out of BMP (> U+FFFF) requires a deeper rework of Tcl and Tk: we'd need 32 bit chars and/or surrogate pairs. UTF-8 at least can deal with 31-bit Unicodes by principle."

    I don't think there is anything practical IDLE
    or tkinter can do about that.

    We might raise an error with better error message than ValueError('unsupported character'), but it's maybe overkill.

    @ned-deily
    Copy link
    Member

    ned-deily commented Jun 15, 2011

    It looks like that error message has been in _tkinter.c since 2002: http://svn.python.org/view/python/trunk/Modules/_tkinter.c?r1=28989&r2=28990&

    I suppose it could be slightly more informative but it seems pretty unambiguous to me. Martin, any opinions?

    @vstinner
    Copy link
    Member

    vstinner commented Jun 17, 2011

    Instead of
    ValueError: unsupported character
    I suggest:
    ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range

    What do you think?

    @terryjreedy
    Copy link
    Member

    terryjreedy commented Jun 17, 2011

    ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range

    Slightly shorter and without the double :s.

    ValueError: character U+10000 is above the range (U+0000-U+FFFF) allowed by Tcl/Tk.

    I agree with a change like this. People are going to increasingly use non-BMP chars and need to find out that the problem is not our fault.

    @ned-deily
    Copy link
    Member

    ned-deily commented Oct 30, 2011

    (Merging CC list from duplicate bpo-13265.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 30, 2011

    Changing the error message sounds fine to me.

    People in need of the feature should lobby their system vendors to provide a Tcl build that uses a 32-bit Tcl_UniChar. Not sure whether it would actually render the string correctly, but at least it would be able to represent it correctly internally.

    @vstinner
    Copy link
    Member

    vstinner commented Nov 3, 2011

    Here is the patch as a .patch file.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 3, 2011

    I'm not sure whether the wording is good English, but apart from that, the patch looks fine.

    @terryjreedy
    Copy link
    Member

    terryjreedy commented Nov 3, 2011

    The patch implements my suggestion. Looking again, I think the English is fine ;-).

    @ezio-melotti
    Copy link
    Member

    ezio-melotti commented Nov 3, 2011

    You could say "Unicode character ..." in the error to make clear what kind of range is U+0000-U+FFFF (people that are not familiar with Unicode and BMP chars might wonder if that's some tcl/tk thing).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 3, 2011

    New changeset 9a07b73abdb1 by Victor Stinner in branch '3.2':
    Issue bpo-12342: Improve _tkinter error message on unencodable character
    http://hg.python.org/cpython/rev/9a07b73abdb1

    New changeset 5aea95d41ad2 by Victor Stinner in branch 'default':
    (Merge 3.2) Issue bpo-12342: Improve _tkinter error message on unencodable character
    http://hg.python.org/cpython/rev/5aea95d41ad2

    @vstinner
    Copy link
    Member

    vstinner commented Nov 3, 2011

    _tkinter now raises ValueError("character U+10ffff is above the range (U+0000-U+FFFF) allowed by Tcl").

    You could say "Unicode character ..." in the error to make clear
    what kind of range is U+0000-U+FFFF (people that are not familiar
    with Unicode and BMP chars might wonder if that's some tcl/tk thing).

    I consider that U+10ffff in "character U+10ffff" is enough to specify that it is a Unicode character. Even if you don't understand Unicode, you can at least computer numbers (0x10ffff is not in range [0x0000; 0xFFFF]) ;-)

    @vstinner vstinner closed this as completed Nov 3, 2011
    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Nov 4, 2011

    Failed to build these modules: (3.3 on Snow Leopard)
    _tkinter

    ./cpython/Modules/_tkinter.c: In function ‘AsObj’:
    ./cpython/Modules/_tkinter.c:996: warning: dereferencing ‘void *’ pointer
    ./cpython/Modules/_tkinter.c:996: error: invalid use of void expression

    @florentx florentx mannequin reopened this Nov 4, 2011
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 4, 2011

    New changeset 5f49b496d161 by Victor Stinner in branch 'default':
    Issue bpo-12342: Fix compilation on Mac OS X
    http://hg.python.org/cpython/rev/5f49b496d161

    @terryjreedy
    Copy link
    Member

    terryjreedy commented Mar 5, 2012

    In responding to bpo-14200, it occurred to me that better than an exception would be doing what the interpreter does in Command Prompt window, which is expand high chars to '\U0001xxxx' escaped form.

    @serwy
    Copy link
    Mannequin

    serwy mannequin commented Mar 11, 2012

    I agree with Terry. The current behavior of raising ValueError will lead to problems in application code in the future if Tkinter gets fixed such that it can render Unicode properly beyond 0xFFFF.

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Mar 14, 2012

    Fixed in bpo-14200

    @asvetlov asvetlov self-assigned this Mar 14, 2012
    @serwy
    Copy link
    Mannequin

    serwy mannequin commented Mar 14, 2012

    Rather than raising a ValueError, would UnicodeEncodeError be more appropriate? I admit that this suggestion may be bike shedding.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    expert-tkinter type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants