Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open HTML tag parsing depends on tag contents in strange way #4094

Closed
link2xt opened this issue Nov 25, 2017 · 7 comments
Closed

Open HTML tag parsing depends on tag contents in strange way #4094

link2xt opened this issue Nov 25, 2017 · 7 comments

Comments

@link2xt
Copy link
Collaborator

link2xt commented Nov 25, 2017

I just want to make Muse reader interpret <code>&foo></code> as code block with literal &foo>, but markdown is also affected:

$ pandoc -f markdown -t native
<code>foo</code>
[Para [RawInline (Format "html") "<code>",Str "foo",RawInline (Format "html") "</code>"]]
$ pandoc -f markdown -t native
<code>&foo</code>
[Para [RawInline (Format "html") "<code>&foo</code>"]]
$ pandoc -f markdown -t native
<code>&foo<</code>
[Para [RawInline (Format "html") "<code>&foo<</code>"]]
$ pandoc -f markdown -t native
<code>&foo></code>
[Para [RawInline (Format "html") "<code>&foo>",RawInline (Format "html") "</code>"]]
$ pandoc -f muse -t native
<code>&foo></code>
[Para [Code ("",[],[]) ""]]
$ pandoc -f muse -t native
<code>foo&foo></code>
[Para [Code ("",[],[]) "foo&foo>"]]

Look like from reading <code>&foo></code> instead of "&foo>" I got nothing inside code tag. I don't want to parse any HTML/XML entities inside tags, I just want to get a string of anyChars and parse them as inlines:

content <- manyTill anyChar endtag

Looks like my code has nothing treating & in a special way, so it is probably in htmlTag.

@jgm
Copy link
Owner

jgm commented Nov 28, 2017

You could investigate this by using htmlTag in ghci to try to figure out why it's not working.

@mb21
Copy link
Collaborator

mb21 commented Nov 28, 2017

Is this the same issue as #4088 ?

@jgm
Copy link
Owner

jgm commented Nov 29, 2017

> runParser (htmlTag (const True)) defaultParserState "hi" "<code>&foo>"
Right (TagOpen "code" [],"<code>&foo>")

and for #4088

> runParser (htmlTag (const True)) defaultParserState "hi" "<input type=\"checkbox\" disabled=\"\" checked=\"\">&thinsp;I'm a _checked_ task</li>"
Right (TagOpen "input" [("type","checkbox"),("disabled",""),("checked","")],"<input type=\"checkbox\" disabled=\"\" checked=\"\">&thinsp;I'm a _checked_ task</li>")

Definitely an issue with htmlTag here!

@jgm
Copy link
Owner

jgm commented Nov 29, 2017

Here's the tokenization for <code>&foo>:

[TagPosition 1 1,TagOpen "code" [],TagPosition 1 8,TagText "&foo> "]

@jgm
Copy link
Owner

jgm commented Nov 29, 2017

I think this is a bug in tag-soup:

Prelude Text.HTML.TagSoup> parseTagsOptions  Text.HTML.TagSoup.parseOptions{ optTagWarning = False, optTagPosition = True} "<a>hi"
[TagPosition 1 1,TagOpen "a" [],TagPosition 1 4,TagText "hi"]
Prelude Text.HTML.TagSoup> parseTagsOptions  Text.HTML.TagSoup.parseOptions{ optTagWarning = False, optTagPosition = True} "<a>&hi>"
[TagPosition 1 1,TagOpen "a" [],TagPosition 1 5,TagText "&hi>"]

I don't see why the TagPosition after the <a> tag should depend on whether it is followed by a &.

@jgm
Copy link
Owner

jgm commented Nov 29, 2017

Even clearer:

Prelude Text.HTML.TagSoup> parseTagsOptions  Text.HTML.TagSoup.parseOptions{ optTagWarning = False, optTagPosition = True} "<a>&</a>"
[TagPosition 1 1,TagOpen "a" [],TagPosition 1 5,TagText "&",TagPosition 1 5,TagClose "a"]
Prelude Text.HTML.TagSoup> parseTagsOptions  Text.HTML.TagSoup.parseOptions{ optTagWarning = False, optTagPosition = True} "<a>x</a>"
[TagPosition 1 1,TagOpen "a" [],TagPosition 1 4,TagText "x",TagPosition 1 5,TagClose "a"]

Note that in the first case, the TagPosition before and after the & are the same!

@jgm
Copy link
Owner

jgm commented Nov 30, 2017

This should be fixed by the new tagsoup version, once it is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants