-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open HTML tag parsing depends on tag contents in strange way #4094
Comments
You could investigate this by using |
Is this the same issue as #4088 ? |
> runParser (htmlTag (const True)) defaultParserState "hi" "<code>&foo>"
Right (TagOpen "code" [],"<code>&foo>") and for #4088
Definitely an issue with |
Here's the tokenization for [TagPosition 1 1,TagOpen "code" [],TagPosition 1 8,TagText "&foo> "] |
I think this is a bug in tag-soup: Prelude Text.HTML.TagSoup> parseTagsOptions Text.HTML.TagSoup.parseOptions{ optTagWarning = False, optTagPosition = True} "<a>hi"
[TagPosition 1 1,TagOpen "a" [],TagPosition 1 4,TagText "hi"]
Prelude Text.HTML.TagSoup> parseTagsOptions Text.HTML.TagSoup.parseOptions{ optTagWarning = False, optTagPosition = True} "<a>&hi>"
[TagPosition 1 1,TagOpen "a" [],TagPosition 1 5,TagText "&hi>"] I don't see why the TagPosition after the |
Even clearer: Prelude Text.HTML.TagSoup> parseTagsOptions Text.HTML.TagSoup.parseOptions{ optTagWarning = False, optTagPosition = True} "<a>&</a>"
[TagPosition 1 1,TagOpen "a" [],TagPosition 1 5,TagText "&",TagPosition 1 5,TagClose "a"]
Prelude Text.HTML.TagSoup> parseTagsOptions Text.HTML.TagSoup.parseOptions{ optTagWarning = False, optTagPosition = True} "<a>x</a>"
[TagPosition 1 1,TagOpen "a" [],TagPosition 1 4,TagText "x",TagPosition 1 5,TagClose "a"] Note that in the first case, the TagPosition before and after the |
This should be fixed by the new tagsoup version, once it is released. |
I just want to make Muse reader interpret
<code>&foo></code>
as code block with literal&foo>
, but markdown is also affected:Look like from reading
<code>&foo></code>
instead of "&foo>" I got nothing inside code tag. I don't want to parse any HTML/XML entities inside tags, I just want to get a string of anyChars and parse them as inlines:pandoc/src/Text/Pandoc/Readers/Muse.hs
Line 107 in 5ba890a
Looks like my code has nothing treating
&
in a special way, so it is probably inhtmlTag
.The text was updated successfully, but these errors were encountered: