Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in unicode args in logging #42451

Closed
tungwaiyip mannequin opened this issue Oct 5, 2005 · 6 comments
Closed

Issue in unicode args in logging #42451

tungwaiyip mannequin opened this issue Oct 5, 2005 · 6 comments
Assignees

Comments

@tungwaiyip
Copy link
Mannequin

tungwaiyip mannequin commented Oct 5, 2005

BPO 1314107
Nosy @malemburg, @vsajip

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/vsajip'
closed_at = <Date 2005-10-07.08:43:25.000>
created_at = <Date 2005-10-05.18:11:46.000>
labels = ['expert-unicode']
title = 'Issue in unicode args in logging '
updated_at = <Date 2005-10-07.08:43:25.000>
user = 'https://bugs.python.org/tungwaiyip'

bugs.python.org fields:

activity = <Date 2005-10-07.08:43:25.000>
actor = 'vinay.sajip'
assignee = 'vinay.sajip'
closed = True
closed_date = None
closer = None
components = ['Unicode']
creation = <Date 2005-10-05.18:11:46.000>
creator = 'tungwaiyip'
dependencies = []
files = []
hgrepos = []
issue_num = 1314107
keywords = []
message_count = 6.0
messages = ['26512', '26513', '26514', '26515', '26516', '26517']
nosy_count = 4.0
nosy_names = ['lemburg', 'nnorwitz', 'vinay.sajip', 'tungwaiyip']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue1314107'
versions = ['Python 2.4']

@tungwaiyip
Copy link
Mannequin Author

tungwaiyip mannequin commented Oct 5, 2005

logging has an issue in handling unicode object
arguments.

>>> import logging
>>>
>>> class Obj:
...     def __init__(self,name):
...         self.name = name
...     def __str__(self):
...         return self.name
...
>>> # a non-ascii string
...
>>> obj = Obj(u'\u00f6')
>>>
>>> # this will cause error
...
>>> print '%s' % obj
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode 
character u'\xf6' in position 0: ordinal not in range(128)
>>>
>>> # this will promote to unicode (and the console also 
happen to be able to display it)
...
>>> print u'%s' % obj
ö
>>>
>>> # this works fine
... # (other than logging makes its own decision to 
encode in utf8)
...
>>> logging.error(u'%s' % obj)
ERROR:root:b
>>>
>>> # THIS IS AN UNEXPECTED PROBLEM!!!
...
>>> logging.error(u'%s', obj)
Traceback (most recent call last):
  File "C:\Python24\lib\logging\__init__.py", line 706, in 
emit
    msg = self.format(record)
  File "C:\Python24\lib\logging\__init__.py", line 592, in 
format
    return fmt.format(record)
  File "C:\Python24\lib\logging\__init__.py", line 382, in 
format
    record.message = record.getMessage()
  File "C:\Python24\lib\logging\__init__.py", line 253, in 
getMessage
    msg = msg % self.args
UnicodeEncodeError: 'ascii' codec can't encode 
character u'\xf6' in position 0: ordinal not in range(128)
>>>
>>> # workaround the str() conversion in getMessage()
...
>>> logging.error(u'%s-\u00f6', obj)
ERROR:root:b-b

The issue seems to be in LogRecord.getMessage(). It
attempts to convert msg to byte string:

   msg = str(self.msg)

I am not sure why ti want to do the conversion. The last
example workaround this by making sure msg is not
convertible to byte string.

@tungwaiyip tungwaiyip mannequin closed this as completed Oct 5, 2005
@tungwaiyip tungwaiyip mannequin assigned vsajip Oct 5, 2005
@tungwaiyip tungwaiyip mannequin added the topic-unicode label Oct 5, 2005
@tungwaiyip tungwaiyip mannequin closed this as completed Oct 5, 2005
@tungwaiyip tungwaiyip mannequin assigned vsajip Oct 5, 2005
@tungwaiyip tungwaiyip mannequin added the topic-unicode label Oct 5, 2005
@malemburg
Copy link
Member

Logged In: YES
user_id=38388

Unassinging the bug. I don't know anything about the logging
module.

Hint: perhaps the logging module should grow an .encoding
attribute which then allows converting Unicode to some
encoding used in the log file ?!

@nnorwitz
Copy link
Mannequin

nnorwitz mannequin commented Oct 6, 2005

Logged In: YES
user_id=33168

Vinay, any suggestions?

@vsajip
Copy link
Member

vsajip commented Oct 6, 2005

Logged In: YES
user_id=308438

Misc. changes were backported into Python 2.4.2, please
check that you have this version.

The problem is not with

msg = str(self.msg)

but rather with

msg = msg % args

To ensure good Unicode support, ensure your messages are
either Unicode strings or objects whose __str__() method
returns a Unicode string. Then,

msg = msg % args

should result in a Unicode object. You can pass this to a
FileHandler opened with an encoding argument, or a
StreamHandler whose stream has been opened using
codecs.open(). Ensure your default encoding is set correctly
using sitecustomize.py.

The encoding additions were made in Revision 1.26 of
logging/init.py, dated 13/03/2005.

Marking as closed.

@tungwaiyip
Copy link
Mannequin Author

tungwaiyip mannequin commented Oct 6, 2005

Logged In: YES
user_id=561546

>To ensure good Unicode support, ensure your messages
are either Unicode strings or objects whose __str__() method
returns a Unicode string. Then,

>msg = msg % args

That's what I am doing already.

Let me explain the subtle problem again.

  1. print '%s' % obj - error
  2. logging.error(u'%s' % obj) - ok
  3. logging.error(u'%s', obj) - error
  4. logging.error(u'%s-\u00f6', obj) -ok

I can understand how 1 fails. But I expect 2,3 and 4 to work
similarly. Especially contrast 3 with 4. 4 work when 3 doesn't
because when str() is applied to u'%s-\u00f6' it fails and it
fallbacks to the original unicode string, which is the correct
way in my opinion. Whereas in 3, the u'%s' get demoted to
byte string '%s' so it fails like 1.

@vsajip
Copy link
Member

vsajip commented Oct 7, 2005

Logged In: YES
user_id=308438

Aaah...now I understand! Sorry for being a little slow, and
thanks for explaining in more detail.

I've uploaded a fix to CVS: str(msg) is only called if msg
is not either a string or a Unicode object. With the fix,
the following script:
#---------------------------

import logging

class X:
    def __init__(self, name):
        self.name = name

    def __str__(self):
        return self.name

def main():
    obj = X(u'\u00f6')
    logging.error(u'%s' % obj)
    logging.error(u'%s', obj)
    logging.error(u'%s-\u00f6', obj)

if __name__ == "__main__":
    main()
#

Now gives the following output on my system (default
encoding is 'ascii'):

ERROR:root:Â
ERROR:root:Â
ERROR:root:Â-Â

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants