New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unicode literals #58384
Comments
Now, that the PEP-414 has been accepted, I can >>> print u'abcœé€'
abcé
>>> If these six characters are not rendered correctly, you It is not necessary to give here the list of (I wrote all my Py2 code in a u'unicode mode', Face it. Python has never worked [*], Python does [*] Except the pure ASCII serie (Py 1.5) and the No offense. I'm pretty sure the creator of this Regards. |
What exactly is the bug you're reporting? Python 2.7.2 (default, Oct 27 2011, 22:35:02)
[GCC 4.5.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print u'abcœé€'
abcœé€ |
What operating system and what terminal are you using? If Windows: what code page does your terminal run in? |
I deliberately hid the information about the used interactive The interactive interpreter was: Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
In that precise case, it was Windws 7 Pro (Windows 7 The technical reasons/aspects: "sys.defaultencoding", [#] For those who do not know, one can not write text Please do not take my aggressive (I recognize it), but sometimes IDLE is not the cause, I use here IDLE to show as an example the I'm not really happy to see this mess again in Py3.3 [†]; the key The Pandora's box is opened. [†] In fact, I will somehow never see or suffer from it. Decisions jmf |
Well, let me soothe your mind then: in Python 3, '...' and u'...' will be absolutely equal, so you won't find any more "mess" with the changes from PEP-414. |
Unless I'm misunderstanding, this is a duplicate of bpo-1602. You will note that the problem is *not* with Python (or open source software in general), the problem is that Microsoft treats the command line as a second (or third, or fourth) class citizen. |
Sorry, I neglected the most important information. Python 3.2 is working perfectly. It is simply impossible Like the limited characters set I used when I wrote my Porting Py 2 code was a child play. |
OK, so I still don't understand what problem it is you are reporting. What do you mean by "can't craete non-valid strings"? Of course you can't. (I don't see how you could do that programatically, either, although that depends heavily on your definition of non-valid.) Are you reporting that cmd.exe has no support for entering French characters? That wouldn't be a Python bug. Are you reporting that idle lacks the keyboard support for French? (I don't use Idle, so I don't know if that is true or not.) |
I'm changing the title since PEP-414 has no bearing here. |
As I explained to J-M when he posted much the same to python-list, Idle's French keyboard support is faulty because tcl/tk's French keyboard support is faulty. A patch for this was recently applied to tcl/tk. I hope it will be in a released version that we can incorporate in 3.3. I am sure we all wish that Microsoft (and Apple) would take more of a lead in moving to a one Unicode world from a 200 encodings and codepages world. I am sometimes as frustrated at the current situation as J-M. But unless he can identify a valid *Python* bug, we should close this. |
You do not get it or I do not explain it correctly. I do not care if Py 3.3 accepts '...' ou u'...'. I'm only I can only use an Py2/Py3 analogy, the types beeing differnt. In Python 2, the u'...' and the unicode('...', 'coding') are Once again, an *illustration* with IDLE / Py2. >>> import unicodedata as ud
>>> for c in u'abc需':
print ud.name(c) LATIN SMALL LETTER A Traceback (most recent call last):
File "<pyshell#3>", line 2, in <module>
print ud.name(c)
ValueError: no such name
>>> # but
>>> import sys
>>> for c in unicode('abc需', sys.stdout.encoding):
print ud.name(c) LATIN SMALL LETTER A
A course, this is actually a no problem with Py 3. I know nothing about the internal of Python. I have however So, if this (u'...') works in Py 3.3, the problem can jmf |
You misunderstand the PEP: in 3.3, '...' and u'...' will be *exactly* the same. The only change is that the interpreter will ignore the u prefix instead of raising SyntaxError. It will be as if 'u' were not there. The only purpose is to let 2.x code run in 3.x without requiring the user to erase the 'u'. I can see how you could misunderstand and think that the 'u' prefix must have some meaning. But is does not. The addition is a bit controversial but Guido approved it with the expectation that it will encourage more conversion of 2.x libraries to run on 3.3. In any case, the tracker is not the place for further discussion of the value of the PEP.
We are painfully aware that 2.x has problems with unicode. You do not need to tell us. I believe that most of the problems that could be sensibly fixed in 2.x have been fixed. 3.0 fixed more problems by changing the language. 3.3 fixes still more problems by changing the internal implementation of unicode, along with the C api, and the meaning of the language on some systems. People who want to avoid all the problems that have been fixed should use 3.3 either from the repository or when it is released.
I am glad you agree and I will close the issue. Please use python-list for any further discussion or questions. |
2012/3/3 Terry J. Reedy <report@bugs.python.org>
Preliminary remark. I'm sending this via gmail, so it No, no and no. This is not a tkinter issue. This ----- wxPython 2.8-ansi build. Traceback (most recent call last):
File "<input>", line 1, in <module>
File "c:\python27\lib\site-packages\wx-2.8-msw-ansi\wx\py\shell.py", line
1242, in writeOut
self.write(text)
File "c:\python27\lib\site-packages\wx-2.8-msw-ansi\wx\py\shell.py", line
1000, in write
self.AddText(text)
File "c:\python27\lib\site-packages\wx-2.8-msw-ansi\wx\stc.py", line
1425, in AddText
return _stc.StyledTextCtrl_AddText(*args, **kwargs)
File "c:\python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position
4-5: character maps to <undefined> abc需
---- PySide, passing "unicode" to a text widdget. Passing u'abc需' works. --- My interactive wx interpreter using wxPython. Strings True ok Traceback (most recent call last):
File "<psi last command>", line 1, in <module>
File
"c:\Python27\lib\site-packages\wx-2.8-msw-ansi\wx\_windows.py",
line 505, in __init__
_windows_.Frame_swiginit(self,_windows_.new_Frame(*args,
**kwargs))
File "c:\Python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in
position 5-6: character maps to <undefined> True ok --- And so on with many libs. You may argue that these libs are guilty. I may argue that Python is somehow guilty, because it Just to show you, I'm quite comfortable with all this abcé??
>>> unicode('abc需', sys.stdout.encoding)
abc需
>>> print u'abc需'
abcé??
>>> print unicode('abc需', sys.stdout.encoding)
abc需 As I am aware of this "feature", all my code is To draw a conclusion. You are wise enough to understand that, when I'm I really, very really, expect all this mess (sorry Let's wait. 'abc需'
>>> print('abc需')
abc需
>>> Regards, PS The u() trick does not help. |
I'd like to encourage you to not try this sort of thing out from an interactive interpreter (incidentally, where does "<psi last command>" come from? It doesn't look like Python's REPL). As David and Terry noted, interactions with such a console, be it Windows' "cmd" or IDLE, have their very own idiosyncrasies and bugs. That said, in Python 2.x *source files* the following two expressions are identical:
Both result in a Unicode string with the six characters/codepoints you mentioned. There won't be any code that works with one but not the other. Of course there are libraries that do not handle Unicode strings in general (nothing to do with literals!) correctly, but as you yourself said, that is a problem with the libraries. Lastly, please read PEP-414 if you are not completely sure what it is proposing. You will see that it merely affects the available syntax for Unicode literals and allows the "u" again. |
I propose to close this issue as invalid (although out-of-date might be fine as well). Jean-Michel is apparently unable to describe what issue *precisely* he wants to see fixed, rather than just complaining that open source is a disaster. I don't think we can anything do about open source being a disaster, and I'm not able to reproduce that perception. Jean-Michel: please try to use this bug tracker in the way it is intended, i.e. report one bug at time, following this structure:
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: