-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError UTF-8 in CSS #213
Comments
What's the traceback? And can you supply a test file(s) so I can try this locally. |
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters Example file: Python: 2.7 |
Hmm... The problem appears to be that the string contains "invalid Unicode" and lxml is strict about the. I can confirm that the test file causes an error with Python 3.6 too.
However, it might be possible to make it work. This appears to work after all: # Exclusively for Python 2.7
from lxml import etree
from lxml.cssselect import CSSSelector
parser = etree.HTMLParser()
html = """<html>
<h1 style="color">Text</h1>
</html>"""
tree = etree.fromstring(html, parser).getroottree()
page = tree.getroot()
for element in CSSSelector("h1")(page):
element.attrib["style"] = u"\ud83d\ude02"
# element.attrib["style"] = u"something"
out = etree.tostring(page, encoding="utf-8").decode("utf-8")
print(repr(out))
print(out) The output becomes:
With Python 3.6... from lxml import etree
from lxml.cssselect import CSSSelector
parser = etree.HTMLParser()
html = """<html>
<h1 style="color">Text</h1>
</html>"""
tree = etree.fromstring(html, parser).getroottree()
page = tree.getroot()
for element in CSSSelector("h1")(page):
element.attrib["style"] = "\ud83d\ude02"
# element.attrib["style"] = "😂"
out = etree.tostring(page, encoding="utf-8").decode("utf-8")
print(repr(out))
print(out) ...you get:
|
@TZanke I'm intrigued to help but I admit that it's realistic that I won't be able to be of much use. First of all, I don't use Python 2 for any of my many projects any more. Second of all, I actually don't use |
In other words, some help would be greatly appreciated. If you're stuck, try to clean up your incoming HTML string so it doesn't contain weird Microsoft Unicode that may or may not be UTF-8. |
At the moment i fix the HTML before running Premailer. This works. Upgrade to Python 3 is planned this year, so Python 2 should not be a problem for us in the future. |
This problem looks like a cssutils problem, i opened a bug: |
The bitbucket repository for cssutils looks unmaintained since around 2017. There seems to exist a fork over at https://github.com/ebook-utils/css-parser, which is also included in debian as python-css-parser Does the fork fix the UTF-8 encoding issue? |
I installed css_parser 1.0.4 but the error still exists. Thanks for your help! |
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
Mail Contains UTF-8 in CSS. Looks like this behaviour is valid CSS.
Mail head CSS:
character: https://www.htmlsymbols.xyz/unicode/U+00DC
The text was updated successfully, but these errors were encountered: