Should not escape HTML entities #43

listen5k · 2016-04-13T08:38:00Z

First off, thank for the great library. Recently I upgraded the lib from 0.5.1 to 0.7.1 and now it breaks my emails.

How to reproduce

$ python -V
Python 2.7.6
$ pip freeze | grep pynliner
pynliner==0.7.1
$ python -c "import pynliner; print pynliner.fromString('<p>&nbsp;</p>')"

Expected

<p>&nbsp;</p>

Actual

<p> </p>

The text was updated successfully, but these errors were encountered:

rennat · 2016-04-14T04:43:35Z

Thanks for using it and reporting bugs!
We recently upgraded to BeautifulSoup4 and it changed some behavior here but first please confirm this issue because I think you have a typo in your example code: &npsp; isn't a valid HTML entity. I'm assuming you meant   which does work as expected.

>>> import pynliner
>>> pynliner.fromString('<p>&nbsp;')
u'<p>\xa0</p>'
>>> print _
<p> </p>

listen5k · 2016-04-14T07:21:07Z

@rennat Correct. That was a typo. I updated the description.

I'm expecting that the lib would not convert HTML entities. Please elaborate if you disagree.

The reason I have to preserve   in my HTML is that Outlook needs to have them in an empty <td></td> for rendering a table correctly.

rennat · 2016-04-22T07:30:46Z

BeautifulSoup 4 doesn't offer the option to preserve HTML entities. It converts all of them to Unicode characters. See the pull request #45 for discussion.

ShadowKyogre · 2016-05-21T16:25:27Z

Hey everyone. Thought I'd pop into this discussion.

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#output-formatters

You can use the soup to restore html entities after BS4 slurps up the input string via a formatter. I tested it with the OP's example, but, are there any edge cases that the formatter might not capture?

>>> from bs4 import BeautifulSoup
>>> s = BeautifulSoup("<p>&nbsp;</p>", "html.parser")
>>> s.prettify(formatter="html")
'<p>\n &nbsp;\n</p>'

EDIT: In certain cases, this wouldn't be wanted if minified output is absolutely necessary.

rennat · 2016-09-20T07:25:38Z

fixed in 0.7.2

rennat mentioned this issue Apr 22, 2016

Add ability to preserve html entities through pynlining process #45

Merged

rennat closed this as completed Sep 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should not escape HTML entities #43

Should not escape HTML entities #43

listen5k commented Apr 13, 2016

rennat commented Apr 14, 2016

listen5k commented Apr 14, 2016

rennat commented Apr 22, 2016

ShadowKyogre commented May 21, 2016 •

edited

rennat commented Sep 20, 2016

Should not escape HTML entities #43

Should not escape HTML entities #43

Comments

listen5k commented Apr 13, 2016

How to reproduce

Expected

Actual

rennat commented Apr 14, 2016

listen5k commented Apr 14, 2016

rennat commented Apr 22, 2016

ShadowKyogre commented May 21, 2016 • edited

rennat commented Sep 20, 2016

ShadowKyogre commented May 21, 2016 •

edited