New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

newlines in <pre> tags shouldn't get removed #130

Closed
sknebel opened this Issue Feb 7, 2018 · 7 comments

Comments

Projects
None yet
3 participants
@sknebel
Contributor

sknebel commented Feb 7, 2018

Just noticed when reading https://aaronparecki.com/2018/02/07/7/indieauth in my feedreader (@aaronpk uses granary to provide an Atom feed) that the contents of <pre> tags also get their newlines stripped and thus the code examples are missing them.

Given that granary doesn't seem to need to parse the HTML anywhere I totally get if this is WON'T FIX. Given that the microformats parser returns the newlines as they are it also seems to be the wrong place to handle this(?)

(Ref #80 for why newlines are stripped)

@sknebel

This comment has been minimized.

Contributor

sknebel commented Feb 7, 2018

After further reading it seems like if at the end of the process the feed generation could know if the source text is HTML or plain text this could be solved by keeping HTML unmodified, but the Activitystreams 1 format does not support keeping that distinction? Changing this seems more realistic, but still quite a bit of effort.

'content': get_html(prop.get('content')),

@aaronpk

This comment has been minimized.

aaronpk commented Feb 7, 2018

Just want to point out that I may have some problems with my own HTML/newline handling right now. Right now my HTML has newlines but no <br> tags, and I use css to get the whitespace to show up right. That means any consumers treating it as HTML will not see the newlines, since literal newlines in HTML are not significant. I think I'm going to need to update how my site handles newlines in general.

@snarfed

This comment has been minimized.

Owner

snarfed commented Feb 7, 2018

thanks for filing @sknebel, and for the in depth sleuthing! whee, whitespace handling. always entertaining. i'll take a look soon.

@snarfed

This comment has been minimized.

Owner

snarfed commented Feb 14, 2018

for my own notes: @aaronpk may be right above about his HTML in general, but for this specific case, the offending content is indeed inside <pre>s, which granary could still theoretically detect and preserve.

@sknebel

This comment has been minimized.

Contributor

sknebel commented Feb 22, 2018

Some more thoughts, both assuming keeping AS1 as the central format:

  1. Since AS1 generally assumes HTML for content, plain text properties could be turned to HTML on the input conversion in a way that transparently converts back on text-only outputs.

  2. Several Python templating libraries have a concept of special string interface for HTML (e.g. available as Jinja.Markup or in MarkupSafe) which does not get escaped on output, so the object could know if it contains HTML or not.

snarfed added a commit that referenced this issue Feb 26, 2018

mf2: remember HTML content, keep newlines, don't translate to <br>
for #130, also re #80. i highly suspect this will cause a regression somewhere, but i'm not quite sure where yet. :/
@snarfed

This comment has been minimized.

Owner

snarfed commented Feb 26, 2018

i don't regret tackling this just yet...but i'm sure i will eventually. 🤣

@snarfed

This comment has been minimized.

Owner

snarfed commented Feb 26, 2018

thanks again for the ideas @sknebel. i handled this by adding a custom content_is_html property to AS1 when we generate it from HTML, and then use that to determine whether to strip newlines. we'll see what else breaks.

@snarfed snarfed closed this Feb 26, 2018

snarfed added a commit to snarfed/bridgy-fed that referenced this issue Mar 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment