Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leak in HTML parsing #9809

Closed
hugopaquet opened this issue May 28, 2024 · 4 comments
Closed

Leak in HTML parsing #9809

hugopaquet opened this issue May 28, 2024 · 4 comments

Comments

@hugopaquet
Copy link

hugopaquet commented May 28, 2024

For the following code,

<body>
  <ul>
    <li>Top-level bullet point. </li>
    <p>Nested line.</p>
  </ul>
</body>

the command pandoc -f html -t html returns only the top-level bullet point:

<ul>
<li>Top-level bullet point.</li>
</ul>

This is pandoc 3.1.12.3 (on MacOS Ventura).

@tarleb
Copy link
Collaborator

tarleb commented May 28, 2024

This isn't really a bug, the given HTML being malformed. You can try that with any HTML validator, e.g., here.

So the question is whether it makes sense to support this kind of buggy input, and if so, how it would be handled. My personal opinion is that pandoc's current behavior is fine.

@tarleb
Copy link
Collaborator

tarleb commented May 28, 2024

Possibly relevant: #8150 and #9187.

@jgm
Copy link
Owner

jgm commented May 28, 2024

I think current behavior is okay, but here are some possible improvements:

  1. omit a warning about the omitted content
  2. try to interpret this case as browsers do. Safari, at least, seems to treat the <p> as a child of the first <li>.

Given that this is invalid HTML, though, these things are low priority for me.

@jgm
Copy link
Owner

jgm commented May 28, 2024

Interesting - the fix for #9187 here, 13e1b49
says:

This change will give a similar treatment to

    <ul>
      <li>L1</li>
      <p>foobar</p>
    </ul>

which also seems to match browser behavior.

That is essentially this case, but pandoc doesn't seem to be doing what that comment says it will...

@jgm jgm closed this as completed in 29fa97a May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants