New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid UnicodeDecoderError: use binary stream for xml files, let SAX determine the encoding #303
Comments
SAX is able to determine the encoding of XML files itself if the file contains a correct "encoding" pseudo attribute, e.g.: <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> For this to work, the file stream has to be opened in binary mode, and the parser has to read the stream using a SAX InputStream, which autodetects the encoding. Fix for firewalld#303.
Can you tell me what version of python you're using? I can reproduce this on 3.4 and 3.6, but not 3.7 $ LC_ALL=C python3.4 foo.py $ LC_ALL=C python3.6 foo.py $ LC_ALL=C python3.7 foo.py |
Verify the XML parser can parse unicode.
Currently 3.6.5, not sure which version I used in march. On 3.7 it probably works cause of https://docs.python.org/3/whatsnew/3.7.html#whatsnew37-pep538 |
Indeed. Thanks for the pointer. |
SAX is able to determine the encoding of XML files itself if the file contains a correct "encoding" pseudo attribute, e.g.: <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> For this to work, the file stream has to be opened in binary mode, and the parser has to read the stream using a SAX InputStream, which autodetects the encoding. Fixes: #303 (cherry picked from commit 7cdd802)
Verify the XML parser can parse unicode. (cherry picked from commit 34ac8cd)
Currently zone_reader from zone.py:
firewalld/src/firewall/core/io/zone.py
Line 699 in 10bcda6
and *_reader from the other io classes open the file as character stream, which fails as soon as e.g. the "C" locale is used and the xml files contains non-ASCII character.
SAX allows to feed a binary stream into the parser, and let SAX determine the encoding from the XML encoding pseudo attribute.
The following code works with both Python2 and Python3:
This is able to parse the following XML file:
The text was updated successfully, but these errors were encountered: