Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

don't mix unicode strings with utf8-in-byte-strings #3470

Merged
merged 4 commits into from Jul 2, 2018

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented Jul 1, 2018

otherwise we explode with:

Traceback (most recent call last):
  File /usr/lib/python2.7/logging/handlers.py, line 78, in emit
    logging.FileHandler.emit(self, record)
  File /usr/lib/python2.7/logging/__init__.py, line 950, in emit
    StreamHandler.emit(self, record)
  File /usr/lib/python2.7/logging/__init__.py, line 887, in emit
    self.handleError(record)
  File /usr/lib/python2.7/logging/__init__.py, line 810, in handleError
    None, sys.stderr)
  File /usr/lib/python2.7/traceback.py, line 124, in print_exception
    _print(file, 'Traceback (most recent call last):')
  File /usr/lib/python2.7/traceback.py, line 13, in _print
    file.write(str+terminator)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_io.py, line 170, in write
    self.log.emit(self.level, format=u{log_io}, log_io=line)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 144, in emit
    self.observer(event)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 136, in __call__
    errorLogger = self._errorLoggerForObserver(brokenObserver)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 156, in _errorLoggerForObserver
    if obs is not observer
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 81, in __init__
    self.log = Logger(observer=self)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 64, in __init__
    namespace = self._namespaceFromCallingContext()
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 42, in _namespaceFromCallingContext
    return currentframe(2).f_globals[__name__]
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/python/compat.py, line 93, in currentframe
    for x in range(n + 1):
RuntimeError: maximum recursion depth exceeded while calling a Python object
Logged from file site.py, line 129
  File /usr/lib/python2.7/logging/__init__.py, line 859, in emit
    msg = self.format(record)
  File /usr/lib/python2.7/logging/__init__.py, line 732, in format
    return fmt.format(record)
  File /usr/lib/python2.7/logging/__init__.py, line 471, in format
    record.message = record.getMessage()
  File /usr/lib/python2.7/logging/__init__.py, line 335, in getMessage
    msg = msg % self.args
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)
Logged from file site.py, line 129

...where the logger apparently recurses whilst trying to log the error, hitting the
maximum recursion depth and killing everything badly.

otherwise we explode with:

```
Traceback (most recent call last):
  File /usr/lib/python2.7/logging/handlers.py, line 78, in emit
    logging.FileHandler.emit(self, record)
  File /usr/lib/python2.7/logging/__init__.py, line 950, in emit
    StreamHandler.emit(self, record)
  File /usr/lib/python2.7/logging/__init__.py, line 887, in emit
    self.handleError(record)
  File /usr/lib/python2.7/logging/__init__.py, line 810, in handleError
    None, sys.stderr)
  File /usr/lib/python2.7/traceback.py, line 124, in print_exception
    _print(file, 'Traceback (most recent call last):')
  File /usr/lib/python2.7/traceback.py, line 13, in _print
    file.write(str+terminator)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_io.py, line 170, in write
    self.log.emit(self.level, format=u{log_io}, log_io=line)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 144, in emit
    self.observer(event)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 136, in __call__
    errorLogger = self._errorLoggerForObserver(brokenObserver)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 156, in _errorLoggerForObserver
    if obs is not observer
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 81, in __init__
    self.log = Logger(observer=self)
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 64, in __init__
    namespace = self._namespaceFromCallingContext()
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 42, in _namespaceFromCallingContext
    return currentframe(2).f_globals[__name__]
  File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/python/compat.py, line 93, in currentframe
    for x in range(n + 1):
RuntimeError: maximum recursion depth exceeded while calling a Python object
Logged from file site.py, line 129
  File /usr/lib/python2.7/logging/__init__.py, line 859, in emit
    msg = self.format(record)
  File /usr/lib/python2.7/logging/__init__.py, line 732, in format
    return fmt.format(record)
  File /usr/lib/python2.7/logging/__init__.py, line 471, in format
    record.message = record.getMessage()
  File /usr/lib/python2.7/logging/__init__.py, line 335, in getMessage
    msg = msg % self.args
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)
Logged from file site.py, line 129
```

...where the logger apparently recurses whilst trying to log the error, hitting the
maximum recursion depth and killing everything badly.
# c.f. https://github.com/matrix-org/synapse/issues/3471
user_agent = self.get_user_agent()
if user_agent is not None:
user_agent = user_agent.decode("utf-8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this behave when the UA is not valid utf-8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it throws:

>>> a="\xa9"
>>> a.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: invalid start byte

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is a better bet:

>>> a.decode("utf-8", "replace")
u'\ufffd'

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@richvdh
Copy link
Member

richvdh commented Jul 2, 2018

[it is surprising to me that authenticated_entity is a bytes rather than a unicode, but I guess that's orthogonal here.]

@ara4n
Copy link
Member Author

ara4n commented Jul 2, 2018

i think it's bytes when it describes a server (as it comes from the raw Auth header) versus unicode when it describes a client. but i didn't dig too deep, especially as i don't understand the circumstances where py2 upgrades byte to unicode strings.

@richvdh
Copy link
Member

richvdh commented Jul 2, 2018

i think it's bytes when it describes a server (as it comes from the raw Auth header) versus unicode when it describes a client.

weeps

@richvdh
Copy link
Member

richvdh commented Jul 2, 2018

relatedly: continuing to do this sort of hackery without writing tests will just doom us to repeat the same lessons over and over again.

@richvdh richvdh merged commit f741630 into develop Jul 2, 2018
@hawkowl hawkowl deleted the matthew/fix-utf8-logging branch September 20, 2018 14:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants