-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
curses implementation of Unicode is wrong in Python 3 #56776
Comments
curses functions accepting strings encode implicitly character strings to UTF-8. This is wrong. We should add a function to set the encoding (see issue bpo-6745) or use the wide character C functions. I don't think that UTF-8 is the right default encoding, I suppose that the locale encoding is a better choice. Accepting characters (and character strings) but calling byte functions is wrong. For example, addch('é') doesn't work with UTF-8 locale encoding. It calls waddch(0xE9) (é is U+00E9), whereas waddch(0xC3)+waddch(0xA9) should be called. Workaround in Python: for byte in 'é'.encode('utf-8'):
win.addch(byte) I see two possible solutions: A) Add a new functions only accepting characters, and not accept characters in the existing functions B) The function should be fixed to call the right C function depending on the input type. For example, Python addch(10) and addch(b'\n') would call waddch(10), whereas addch('é') would call wadd_wch(233). I prefer solution (B) because addch('é') would just work as expected. |
getkey.patch fixes window.getkey(): use get_wch() instead of getch() to handle correctly non-ASCII characters. I tested with the key é (U+00E9) with ISO-8859-1 and UTF-8 locale encoding: getkey() gives the expected result (but addstr is unable to display it, because addstr encodes the string to UTF-8 instead of the locale encoding). |
Oh, by the way: do all platforms have wide character functions? I don't see any failure on our Python 3.x buildbots, but test_curses is skipped on many buildbots. |
I think that some platforms do not have wide character support, though I could be wrong. The FAQ here: http://invisible-island.net/ncurses/ncurses.faq.html has a list of those that do and those that don't, but I don't know how up to date it is. |
Patch the _curses module to improve Unicode support:
The check on the ncursesw library availability is done in setup.py because the library linked to _curses depends on the readline library (see issues bpo-7384 and bpo-9408). I don't know if wide character functions can be available in curses or ncurses library. Details:
It's not possible to specify an encoding. GetConsoleOutputCP() is maybe not the right code on Windows if a text application doesn't run in a Windows console (e.g. if it uses its own terminal emulator). GetOEMCP() is maybe a better choice, or a function should be added to specify the encoding used by the _curses module (override the "locale encoding"). If a function is added to specify the encoding, I think that it is better to add a global function instead of adding an argument to functions creating a new window object (initscr(), getwin(), subwin(), derwin(), newpad()). |
Using curses_unicode.patch:
It would be possible to support multibyte encoded character (like é in UTF-8) for addch() by calling addch() multiple times, one per byte, but I would prefer to keep _curses simple and not workaround libncurses limitations (bugs). |
See also bpo-6755 (curses.get_wch). |
New changeset d98b5e0f0862 by Nadeem Vawda in branch 'default': |
Following d98b5e0f0862, I have been able to successfully build the curses |
Ack sorry, forgot to give context - my machine doesn't have libncursesw, |
See also bpo-10570. |
There are now several bugs dealing with related issues here. Are we any closer to a solution to any of them? The suggested patches look like a good idea - what needs to happen for them to move forward? |
I would like a review of curses_unicode.patch. |
New changeset b1e03d10391e by Victor Stinner in branch 'default': |
I'm not sure that it is correct to call nl_langinfo(CODESET) to get the locale encoding. The LC_CTYPE locale should maybe be set temporary to the current locale (""), as does locale.getpreferredencoding(). Or maybe better, locale.getpreferredencoding() should be called. |
See also issue bpo-6203. |
New changeset 786668a4fb6b by Victor Stinner in branch 'default': |
New changeset c3581ca21a57 by Victor Stinner in branch 'default': |
This broke several Gentoo buildbots. |
New changeset 919259054621 by Victor Stinner in branch 'default': (Oops, wrong issue number, again) |
setup.py is unable to locate correctly curses.h. I added a hack to always search in /usr/include/ncursesw/. The hack is needed on Ubuntu 11.10 if you only have libncursesw5-dev but not libncursesw-dev for example. |
I am still concerned about the compilation warning in OpenIndiana buildbots :-( |
Compile output on OpenSolaris: Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] |
New changeset bf51e32b2a81 by Victor Stinner in branch 'default': (Oops, I copy-pasted the issue number from my previous commit, and the issue number was wrong...) |
I'm unable to reproduce the issue in my OpenIndiana VM: the compilaton of the _curses module fail, not because of Unicode, but because mvwchgat() function is missing => see the issue bpo-3786. I don't know how to install ncursesw on OpenIndiana, I didn't find an official package using pkg search. curses issues on OpenIndiana are serious enough to have their own issue: I opened the issue bpo-13552. |
The code has been commited. The remaining task is to fix OpenIndiana issues: see bpo-13552. |
Victor, I have these notes I wrote down when I set up the OpenIndiana buildbots. Maybe can be useful to you: (compiling from source) """
""" I installed ncurses because the lack of "mvwchgat" and "wchgat". When compiling Python, I add export "CFLAGS=-I/usr/local/include/ncursesw" to help it to find the right lib. Hope to be useful. |
Hum, please use the issue bpo-13552 for curses issues on OpenIndiana/Solaris.
See issues bpo-3786 and bpo-13552 for this problem.
The curses module is compiled by setup.py, not Makefile. It looks that setup.py ignores CFLAGS. I don't know if setup.py permits to specify such option. |
It looks to me as if the documentation in the release candidates for 2.7.3 and 3.2.3 haven't been updated to include the new curses fixes. Is that correct? |
Yes, it was only fixed for 3.3. |
Testing the Python3.3a2 build on OS X - the exception AttributeError: '_curses.curses window' object has no attribute 'get_wch' is still being raised. I don't have a Linux build I can easily test with. Is this a particular problem with the OS X build? |
"still"? Did it work before my last changes? Unicode functions of the (n)curses library are only available if the Python curses module is linked to libncursesw. Is libncursesw available? Is libreadline linked to libncurses or libncursesw? If libreadline is linked to libncurses, the Python curses module is also linked to libncurses. |
Nicholas, please open a new issue documenting which Python 3.3 you are using, from which python.org installer or the ./configure parameters if you built it yourself (and whether you supplied a version of GNU readline or used the Apple default of BSD libedit) and an example of how to reproduce the error. Please don't add to closed issues. Note also there is a known open issue with the 32-bit-only OS X installer for 3.3 where the _curses module does not build (bpo-14225) with an older version of GNU readline. |
New changeset 2035c5ad4239 by Ned Deily in branch 'default': |
It turns out that the Unicode support introduced by this issue didn't build correctly on OS X, either silently failing to build (explaining the problem seen by Nicholas) or causing a compile error (as seen in bpo-14225). This should be working OK (as of 3.3.0b1). BTW, a test of the wide char functions would be nice and might have caught this. |
See also bpo-15037 which documents a broken curses.unget_wch and, hence, test_curses when Python is built with ncurses 5.7 or earlier. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: