Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid HTML generated? #660

Closed
ioquatix opened this issue May 2, 2020 · 10 comments
Closed

Invalid HTML generated? #660

ioquatix opened this issue May 2, 2020 · 10 comments
Assignees

Comments

@ioquatix
Copy link

@ioquatix ioquatix commented May 2, 2020

The following brackets should probably be escaped:

Represents an XML Instruction; IE, <? ... ?>

e.g.

Kramdown::Document.new("Represents an XML Instruction; IE, <? ... ?>").to_html
=> "<p>Represents an XML Instruction; IE, <? ... ?></p>\n"

Kramdown::Document.new("Represents an XML Instruction; IE, `<? ... ?>`").to_html
=> "<p>Represents an XML Instruction; IE, <code>&lt;? ... ?&gt;</code></p>\n"

Using backticks is probably the correct approach here, but this documentation/text is coming from a 3rd party so hard to make changes. Maybe this is valid behaviour for Kramdown, if so feel free to close. FYI.

@gettalong gettalong self-assigned this May 2, 2020
@gettalong
Copy link
Owner

@gettalong gettalong commented May 2, 2020

Yes, this behaviour is intended, see the last paragraph of the HTML blocks description, i.e. processing instructions are parsed and converted to :xml_pi elements in the internal abstract syntax tree.

This means that you could adapt/subclass the HTML converter to do something different with it.

@ioquatix
Copy link
Author

@ioquatix ioquatix commented May 4, 2020

Is <? ... ?> valid output in an HTML5 paragraph?

@gettalong
Copy link
Owner

@gettalong gettalong commented May 4, 2020

Not in the HTML syntax, I guess, but in the XML syntax of HTML5. The output of kramdown's HTML converter works with both syntaxes. So processing instructions should only be used when one uses the XML syntax of HTML5.

@ioquatix
Copy link
Author

@ioquatix ioquatix commented May 4, 2020

Maybe you should add XHTML and HTML5 as separate conversions because the <? ... ?> definitely seem invalid.

@gettalong gettalong reopened this May 4, 2020
@gettalong
Copy link
Owner

@gettalong gettalong commented May 4, 2020

I don't want to remove the parsing side of things because this would potentially invalidate existing documents. However, on the conversion side we could add an option to disable handling of the processing instruction by outputting it like any other text - what do you think?

@ioquatix
Copy link
Author

@ioquatix ioquatix commented May 4, 2020

I guess my advice would be to ensure you generate valid HTML5 and that should be the default. XHTML is definitely not that common.

@gettalong
Copy link
Owner

@gettalong gettalong commented May 5, 2020

Hmm... After a bit more research I guess that this could considered to be a bug fix. SGML processing instructions were actually allowed in HTML4 but are not in HTML5.

@gettalong
Copy link
Owner

@gettalong gettalong commented May 7, 2020

Okay, so I actually removed the parsing side of things although this might break documents. Changing the converter would have also broken documents on HTML output, though.

The reason for doing this in the parser is that other markup that would appear between the begin/end markers are now correctly parsed, as would be expected if the PIs are ignored.

@cabo
Copy link
Contributor

@cabo cabo commented Aug 8, 2020

Hmm, this is a breaking change for kramdown-rfc2629.
Can the parsing code maybe be factored out into a gem?

@gettalong
Copy link
Owner

@gettalong gettalong commented Aug 11, 2020

@cabo Sorry to hear that! I think the best way to tackle this would be to adapt the parser to contain that functionality in your code base(see commit 3f31e87 for the changes).

You would need to:

  • Define HTML_BLOCK_START_WITH_PI and HTML_SPAN_START_WITH_PI as they were before.
  • Define parse_block_html_with_pi and parse_span_html_with_pi that each contains the now missing elsif part and that delegates to parse_block_html/parse_span_html respectively if no PI was found.
  • Define the parsers like define_parser(:block_html_with_pil, HTML_BLOCK_START_WITH_PI), similar for the span case.
  • Use those parsers in your code by replacing the :block_html/:span_html parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants