Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging encoding failes some situation #51240

Closed
methane opened this issue Sep 25, 2009 · 9 comments
Closed

logging encoding failes some situation #51240

methane opened this issue Sep 25, 2009 · 9 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@methane
Copy link
Member

methane commented Sep 25, 2009

BPO 6991
Nosy @vsajip, @methane
Files
  • logging_encode.patch: logging.init and test_logging
  • foo.py: DecodeError sample
  • logging_error.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vsajip'
    closed_at = <Date 2009-09-25.18:15:04.910>
    created_at = <Date 2009-09-25.05:29:25.698>
    labels = ['invalid', 'type-bug', 'library']
    title = 'logging encoding failes some situation'
    updated_at = <Date 2009-09-25.19:02:01.894>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2009-09-25.19:02:01.894>
    actor = 'methane'
    assignee = 'vinay.sajip'
    closed = True
    closed_date = <Date 2009-09-25.18:15:04.910>
    closer = 'vinay.sajip'
    components = ['Library (Lib)']
    creation = <Date 2009-09-25.05:29:25.698>
    creator = 'methane'
    dependencies = []
    files = ['14970', '14971', '14973']
    hgrepos = []
    issue_num = 6991
    keywords = ['patch']
    message_count = 9.0
    messages = ['93100', '93103', '93104', '93106', '93112', '93114', '93124', '93125', '93126']
    nosy_count = 2.0
    nosy_names = ['vinay.sajip', 'methane']
    pr_nums = []
    priority = 'normal'
    resolution = 'not a bug'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue6991'
    versions = ['Python 2.6', 'Python 3.0', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @methane
    Copy link
    Member Author

    methane commented Sep 25, 2009

    When stream is codecs.writer object, stream.write(string) does
    string.decode() internally and it may cause UnicodeDecodeError.

    Then, fallback to utf-8 is not good.
    I think good fallback logic is:

    • When message is unicode, message.encode(stream.encoding or 'ascii',
      'backslashreplace')
    • When message is bytes, message.encode('string_escape')

    Attached patch contains this logic, refactoring and test.

    @methane methane added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 25, 2009
    @vsajip
    Copy link
    Member

    vsajip commented Sep 25, 2009

    Thanks, but I'm not sure I understand the reasoning.
    stream.write(unicode_string) should not do decode() internally, though
    of course it would do encode(). Can you explain a little more (with an
    illustrative example) what problem you are trying to solve, and attach a
    small script which shows the problem? Thanks.

    @methane
    Copy link
    Member Author

    methane commented Sep 25, 2009

    Please see and execute an attached foo.py.

    In Python 2.6.2, this cause following error:
    >python foo.py
    Traceback (most recent call last):
      File "foo.py", line 3, in <module>
        f.write('\xaa')
      File "C:\usr\Python2.6\lib\codecs.py", line 686, in write
        return self.writer.write(data)
      File "C:\usr\Python2.6\lib\codecs.py", line 351, in write
        data, consumed = self.encode(object, self.errors)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 0:
    ordinal not in ran
    ge(128)

    @vsajip
    Copy link
    Member

    vsajip commented Sep 25, 2009

    There seems to be a problem with your foo.py. In it, you are writing a
    byte-string to a stream returned from codecs.open. I don't think this is
    correct: you should be writing a Unicode string to that stream, which
    will convert to bytes using the stream's encoding, and write those bytes
    to file. The following snippet illustrates:

    >>> import codecs
    >>> f = codecs.open('foo.txt', 'w', encoding='utf-8')
    >>> f.write(u'\u76F4\u6A39\u7A32\u7530')
    >>> f.close()
    >>> f = open('foo.txt', 'r')
    >>> f.read()
    '\xe7\x9b\xb4\xe6\xa8\xb9\xe7\xa8\xb2\xe7\x94\xb0'

    As you can see, the Unicode has been converted using UTF-8.

    @methane
    Copy link
    Member Author

    methane commented Sep 25, 2009

    Another sample.

    Traceback (most recent call last):
      File "C:\usr\Python2.6\lib\logging\__init__.py", line 790, in emit
        stream.write(fs % msg.encode("UTF-8"))
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 0:
    ordinal not in range(128)

    This is because logging.FileHandler uses codecs.open internally.

    @vsajip
    Copy link
    Member

    vsajip commented Sep 25, 2009

    Your second example (logging_error.py) fails for the same reason -
    you're writing a byte-string to a stream which is expecting Unicode. The
    error occurs in logging only it tries encoding as UTF-8 as a last-ditch
    attempt - and that only happens because of an earlier exception caused
    by you not writing a Unicode string.

    In summary: If you open a stream via codecs.open, whether directly or
    through the logging module, you are expecting the stream to do encoding
    for you. Therefore, you only write Unicode to the stream - never a
    byte-string. If you have a byte-string in your application which you
    have obtained from somewhere else, convert it to Unicode using whatever
    encoding applies to the source. Then, send the resulting Unicode to the
    encoding stream (or logger).

    @methane
    Copy link
    Member Author

    methane commented Sep 25, 2009

    OK, you're right.
    But logging is very basic feature and used very wide modules.
    "All logging code should use unicode string" is right but difficult.

    And logging may be used for debbuging usually. So I think
    logging should write log as safe as possible.
    When log.error(...) called, no-one wants UnicodeDecodeError
    from logging in their log.

    @vsajip
    Copy link
    Member

    vsajip commented Sep 25, 2009

    It's not about logging - your first example (foo.py) didn't have any
    logging code in it.

    The problem is caused only when someone doesn't understand how Unicode
    and codecs.open works, and logging can't fix this.

    The rule is: If you use a stream without encoding and byte strings under
    Python 2.x, you'll be OK - fine if you're using ASCII or Latin-1.
    However, users of systems outside this (e.g. CJK or Cyrillic) will not
    be covered.

    For use anywhere, you really have to work in Unicode internally, decode
    stuff on the way in and encode stuff on the way out. That's what the
    codecs module is for.

    If third-party libraries which you are using don't use Unicode properly,
    then they are broken, and logging can't fix that. Any attempt to "paper
    over the cracks" will fail sooner or later. It's better to identify the
    problem exactly where it occurs: Python's Zen says "Errors should never
    pass silently."

    I'm closing this issue, as it's not really logging-related. Hope that's OK.

    @vsajip vsajip closed this as completed Sep 25, 2009
    @vsajip vsajip added the invalid label Sep 25, 2009
    @methane
    Copy link
    Member Author

    methane commented Sep 25, 2009

    OK, I agree.
    Thank you for your answer.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants