Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mojibake: With unicode crescent moon 馃寵 0x1f319 #157

Open
pzrq opened this issue Jan 18, 2016 · 4 comments
Open

Mojibake: With unicode crescent moon 馃寵 0x1f319 #157

pzrq opened this issue Jan 18, 2016 · 4 comments

Comments

@pzrq
Copy link
Contributor

pzrq commented Jan 18, 2016

Might be similar to #142, #152.

$ python
Python 3.4.3 (default, Jan  5 2016, 10:40:34) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from premailer.premailer import Premailer
>>> html = '''
... <html xmlns="http://www.w3.org/1999/xhtml">
... <body>
...             Dear Skyelar 馃寵,
... </body>
... </html>
... '''
>>> p = Premailer(html)
>>> mojibake = p.transform()
>>> print(mojibake)
<html>
<head></head>
<body><p>h   t   m   l       x   m   l   n   s   =   "   h   t   t   p   :   /   /   w   w   w   .   w   3   .   o   r   g   /   1   9   9   9   /   x   h   t   m   l   "   &gt;   
   </p></body>
</html>

>>> 
@pzrq pzrq changed the title Mojibake: Adding lots of extra spaces with 馃寵 and xmlns Mojibake: With 馃寵 Jan 18, 2016
@pzrq pzrq changed the title Mojibake: With 馃寵 Mojibake: With unicode crescent moon 馃寵 Jan 18, 2016
@pzrq pzrq changed the title Mojibake: With unicode crescent moon 馃寵 Mojibake: With unicode crescent moon 馃寵 0x1f319 Jan 18, 2016
@pzrq
Copy link
Contributor Author

pzrq commented Jan 19, 2016

Have put this into a test into the following branch: master...mathspace:unicode-crescent-moon

For me, it works fine under Python 2.6 and 2.7 tox environments, but fails under Python 3.4 and 3.5 tox environments.

It fails at etree.tostring(root, **kwargs) suggesting this issue at least is an lxml bug.

@peterbe
Copy link
Owner

peterbe commented Jan 19, 2016

What's so special about that emoji specifically? Is it not a problem with other emojis?

@pzrq
Copy link
Contributor Author

pzrq commented Jan 20, 2016

@peterbe It was the emoji that came up in our issue tracker in a staging environment. Nothing particularly special and the issue may well affect a significant set of unicode characters, e.g. (untested) perhaps those needing more than 16 bits to encode in utf-8.

@peterbe
Copy link
Owner

peterbe commented Jan 20, 2016

@pzrq Bummer. Why don't you put up a PR with your work so far and perhaps other people can comment to help make it work in all environments.

OrangeDog added a commit to OrangeDog/premailer that referenced this issue Jun 15, 2016
Passing `--encoding ascii` should fix issues peterbe#93, peterbe#100, peterbe#152 and peterbe#157.
peterbe pushed a commit that referenced this issue Jul 11, 2016
* Allow setting encoding from command-line.

Passing `--encoding ascii` should fix issues #93, #100, #152 and #157.

* PEP8 fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants