…ed form values.
The .toFormMarkup() method that generates a <form> HTML structure had a bug
when the form field values contained UTF-8 encoded strings with characters
outside the 7-bit ASCII space.
If the lxml implementation of the ElementTree API was in use these values
would result in a ValueError being raised (ValueError: All strings must be XML
compatible: Unicode or ASCII, no NULL bytes or control characters). If the
stdlib implementation of ElementTree was used these characters were silently
replaced by their XML character reference equivalents (&#XXX;).
This patch generates the form using Unicode values for everything and then
serializes the form to a UTF-8 encoded string ensuring that the final form is
what is expected and constant regardless of the ElementTree API
In generating the argument dictionary the .toPostArgs() method (apparently)
assumed that values were all Unicode objects and called
``value.encode('utf-8')`` on them unconditionally. However, the values appear
to be a mixed set of Unicode objects and UTF-8 encoded strings (most being of
the latter group).
Calling .encode('utf-8') on a string will implicitly decode the string into a
Unicode object before encoding it to the selected encoding. This automatic
decoding happens using the ``sys.getdefaultencoding()`` encoding which is by
default 'ascii'. The original call therefore works only as long as the values
are 7-bit ASCII and breaks when they contain higher bit characters.
The patch ensures that the resulting values in the returned dictionary are
UTF-8 encoded strings regardless if the input values were Unicode objects or
… because they contain undecodable characters.
It causes raise of UnicodeDecodeError deep inside python. This only happens if xrds location is not found before
some unicode character.
- Catch UnicodeDecodeError when searching for yadis
- Update check of whether yadis was used - if xrds location is none it was not
- Added tests, update previous unicode test with comment