Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should not escape HTML entities #43

Closed
listen5k opened this issue Apr 13, 2016 · 5 comments
Closed

Should not escape HTML entities #43

listen5k opened this issue Apr 13, 2016 · 5 comments

Comments

@listen5k
Copy link

First off, thank for the great library. Recently I upgraded the lib from 0.5.1 to 0.7.1 and now it breaks my emails.

How to reproduce
$ python -V
Python 2.7.6
$ pip freeze | grep pynliner
pynliner==0.7.1
$ python -c "import pynliner; print pynliner.fromString('<p>&nbsp;</p>')"
Expected
<p>&nbsp;</p>
Actual
<p> </p>
@rennat
Copy link
Owner

rennat commented Apr 14, 2016

Thanks for using it and reporting bugs!
We recently upgraded to BeautifulSoup4 and it changed some behavior here but first please confirm this issue because I think you have a typo in your example code: &npsp; isn't a valid HTML entity. I'm assuming you meant &nbsp; which does work as expected.

>>> import pynliner
>>> pynliner.fromString('<p>&nbsp;')
u'<p>\xa0</p>'
>>> print _
<p> </p>

@listen5k
Copy link
Author

@rennat Correct. That was a typo. I updated the description.

I'm expecting that the lib would not convert HTML entities. Please elaborate if you disagree.

The reason I have to preserve &nbsp; in my HTML is that Outlook needs to have them in an empty <td></td> for rendering a table correctly.

@rennat
Copy link
Owner

rennat commented Apr 22, 2016

BeautifulSoup 4 doesn't offer the option to preserve HTML entities. It converts all of them to Unicode characters. See the pull request #45 for discussion.

@ShadowKyogre
Copy link

ShadowKyogre commented May 21, 2016

Hey everyone. Thought I'd pop into this discussion.

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#output-formatters

You can use the soup to restore html entities after BS4 slurps up the input string via a formatter. I tested it with the OP's example, but, are there any edge cases that the formatter might not capture?

>>> from bs4 import BeautifulSoup
>>> s = BeautifulSoup("<p>&nbsp;</p>", "html.parser")
>>> s.prettify(formatter="html")
'<p>\n &nbsp;\n</p>'

EDIT: In certain cases, this wouldn't be wanted if minified output is absolutely necessary.

@rennat
Copy link
Owner

rennat commented Sep 20, 2016

fixed in 0.7.2

@rennat rennat closed this as completed Sep 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants