newlines in <pre> tags shouldn't get removed #130

sknebel · 2018-02-07T20:22:22Z

Just noticed when reading https://aaronparecki.com/2018/02/07/7/indieauth in my feedreader (@aaronpk uses granary to provide an Atom feed) that the contents of <pre> tags also get their newlines stripped and thus the code examples are missing them.

Given that granary doesn't seem to need to parse the HTML anywhere I totally get if this is WON'T FIX. Given that the microformats parser returns the newlines as they are it also seems to be the wrong place to handle this(?)

(Ref #80 for why newlines are stripped)

The text was updated successfully, but these errors were encountered:

sknebel · 2018-02-07T20:53:34Z

After further reading it seems like if at the end of the process the feed generation could know if the source text is HTML or plain text this could be solved by keeping HTML unmodified, but the Activitystreams 1 format does not support keeping that distinction? Changing this seems more realistic, but still quite a bit of effort.

granary/granary/microformats2.py

Line 385 in 0fb9d68

'content': get_html(prop.get('content')),

aaronpk · 2018-02-07T20:57:38Z

Just want to point out that I may have some problems with my own HTML/newline handling right now. Right now my HTML has newlines but no <br> tags, and I use css to get the whitespace to show up right. That means any consumers treating it as HTML will not see the newlines, since literal newlines in HTML are not significant. I think I'm going to need to update how my site handles newlines in general.

snarfed · 2018-02-07T21:33:40Z

thanks for filing @sknebel, and for the in depth sleuthing! whee, whitespace handling. always entertaining. i'll take a look soon.

snarfed · 2018-02-14T19:00:34Z

for my own notes: @aaronpk may be right above about his HTML in general, but for this specific case, the offending content is indeed inside <pre>s, which granary could still theoretically detect and preserve.

sknebel · 2018-02-22T07:42:53Z

Some more thoughts, both assuming keeping AS1 as the central format:

Since AS1 generally assumes HTML for content, plain text properties could be turned to HTML on the input conversion in a way that transparently converts back on text-only outputs.
Several Python templating libraries have a concept of special string interface for HTML (e.g. available as Jinja.Markup or in MarkupSafe) which does not get escaped on output, so the object could know if it contains HTML or not.

for #130, also re #80. i highly suspect this will cause a regression somewhere, but i'm not quite sure where yet. :/

snarfed · 2018-02-26T06:27:20Z

i don't regret tackling this just yet...but i'm sure i will eventually. 🤣

snarfed · 2018-02-26T14:06:24Z

thanks again for the ideas @sknebel. i handled this by adding a custom content_is_html property to AS1 when we generate it from HTML, and then use that to determine whether to strip newlines. we'll see what else breaks.

snarfed added a commit that referenced this issue Feb 26, 2018

mf2: remember HTML content, keep newlines, don't translate to <br>

6260e32

for #130, also re #80. i highly suspect this will cause a regression somewhere, but i'm not quite sure where yet. :/

snarfed closed this as completed Feb 26, 2018

snarfed added a commit to snarfed/bridgy-fed that referenced this issue Mar 25, 2018

fix tests for new granary newline handling in snarfed/granary#130

8c3797c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

newlines in <pre> tags shouldn't get removed #130

newlines in <pre> tags shouldn't get removed #130

sknebel commented Feb 7, 2018

sknebel commented Feb 7, 2018

aaronpk commented Feb 7, 2018

snarfed commented Feb 7, 2018

snarfed commented Feb 14, 2018

sknebel commented Feb 22, 2018 •

edited

Loading

snarfed commented Feb 26, 2018

snarfed commented Feb 26, 2018

newlines in <pre> tags shouldn't get removed #130

newlines in <pre> tags shouldn't get removed #130

Comments

sknebel commented Feb 7, 2018

sknebel commented Feb 7, 2018

aaronpk commented Feb 7, 2018

snarfed commented Feb 7, 2018

snarfed commented Feb 14, 2018

sknebel commented Feb 22, 2018 • edited Loading

snarfed commented Feb 26, 2018

snarfed commented Feb 26, 2018

sknebel commented Feb 22, 2018 •

edited

Loading