Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML parsing block (url-encoding tags) #390

Closed
wmnnd opened this issue Oct 8, 2020 · 6 comments
Closed

HTML parsing block (url-encoding tags) #390

wmnnd opened this issue Oct 8, 2020 · 6 comments
Assignees
Labels

Comments

@wmnnd
Copy link

wmnnd commented Oct 8, 2020

Hi there, after upgrading from 1.4.1 to 1.4.10, one of my tests started failing. Earmark now seems to have a problem parsing an empty HTML tag.
It seems like the bug was introduced in Earmark 1.4.4; 1.4.3 doesn’t exhibit this behavior.

<a href="https://example.org/html-link"></a> was previously correctly parsed (i.e. left untouched).
Now Earmark.as_html changes it to <a href="https://example.org/html-link%22%3E%3C/a%3E.

@RobertDober
Copy link
Collaborator

Thank you for reporting this, very nice to track down the exact version, which might indeed be very helpful.

That said I cannot reproduce the bug, probably I missunderstood something in the description

iex(6)> md
"<a href=\"https://example.org/html-link\"></a>"
iex(7)> Earmark.as_html(md)
{:ok, "<a href=\"https://example.org/html-link\"></a>", []}

Am I missing some context here?

@wmnnd
Copy link
Author

wmnnd commented Oct 8, 2020

Here is the behavior in 1.4.5:

iex(1)>       url = "https://example.org/html-link"
"https://example.org/html-link"
iex(2)>       md = """
...(2)>       Foo2 <a href="#{url}"></a>
...(2)>       """
"Foo2 <a href=\"https://example.org/html-link\"></a>\n"
iex(3)>       Earmark.as_html!(md)
"<p>\n  Foo2 &lt;a href=”\n  <a href=\"https://example.org/html-link%22%3E%3C/a%3E\">\n    https://example.org/html-link”&gt;&lt;/a&gt;\n  </a>\n</p>\n"

Here is the same code in 1.4.4:

iex(1)>       url = "https://example.org/html-link"
"https://example.org/html-link"
iex(2)>       md = """
...(2)>       Foo2 <a href="#{url}"></a>
...(2)>       """
"Foo2 <a href=\"https://example.org/html-link\"></a>\n"
iex(3)>       Earmark.as_html!(md)
"<p>Foo2 <a href=\"https://example.org/html-link\"></a></p>\n"

@RobertDober
Copy link
Collaborator

Oh I see now, thank you for your troubles.

Actually this is not a bug, HTML is not recognized unless in its own line, the implementation was too lenient < 1.4.5

I apologise for the earlier lazy implementation, but I am afraid there is not much I can do now w/o introducing a regression against the spec (while the regression you experienced was more of a fix).

I completely understand that this is unexpected and even bothersome, but I am afraid there is nothing I can do.

@wmnnd
Copy link
Author

wmnnd commented Oct 8, 2020

Thanks for looking into this!

But are you sure the spec requires a new line for all HTML elements?

John Gruber’s spec specifically says only block-level elements require a new line.

For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags.

The only restrictions are that block-level HTML elements — e.g. <div>, <table>, <pre>, <p>, etc. — must be separated from surrounding content by blank lines, and the start and end tags of the block should not be indented with tabs or spaces. Markdown is smart enough not to add extra (unwanted) <p> tags around HTML block-level tags.

https://daringfireball.net/projects/markdown/syntax#html

And Github has an example that is almost identical to mine in their spec for GFM:

Foo <responsive-image src="foo.jpg" /> is parsed as <p>Foo <responsive-image src="foo.jpg" /></p>

https://github.github.com/gfm/#example-636

@RobertDober
Copy link
Collaborator

RobertDober commented Oct 8, 2020

Firstly you are correct and secondly you are 100% correct...
Such bad wording from YHS. I should have written docs instead of specs

https://github.com/pragdave/earmark#html-blocks

This means that I consider the docs as specs, however it is a moving target and I intend to define how earmark parses HTML in 1.5 please keep updated on that.

IIANM this behavior is pretty much what you want, RobertDober/earmark_parser#7 correct?

If so I will close this as a doubleton.

BTW I hope you understand why I wont touch this in 1.4.*

@wmnnd
Copy link
Author

wmnnd commented Oct 8, 2020

Thanks for the clarification. I will stick with.1.4.4 until this has been resolved 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants