New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mozilla style HTML nested list gets extra bullet in Markdown #9187
Comments
I'm a bit lost in your description of the issue. Can you post (inline) the HTML, the markdown pandoc currently produces, and the markdown you would expect? |
input (same as #8150 (comment)):
current output (
correct output:
Please note again the issue is only the extra |
I find this suggested output surprising and would like to understand by what criterion it is judged to be correct. The |
@jgm I included the context in my original post to explain how the Mozilla HTML-generating applications have always marked up nested lists. To my knowledge (around 12 years of observation at this point), all HTML editors like Thunderbird (and Seamonkey, less popular) place nested lists as list items without enclosing them in So the criteria judging it "correct" is that the more deeply nested I don't know what to say about the 2nd bullet This problem can be easily & currently reproduced in the latest (and any) version of Thunderbird.... just now I've created a bulleted list that looks like this: header
... which creates HTML exactly like this:
If it's not compelling that the ubiquitous Thunderbird generates this markup, it may be more satisfying to think of the Mozilla parallel standard as: "For brevity's sake, lists can also be items of other lists, and therefore we don't put them inside list item tags." Regardless of how one interprets the markup, or what one thinks of the Mozilla standard, I am simply saying that the extra bullet shouldn't be there. Running the example above back through I hope that is enough to decide whether supporting Mozilla flavoured HTML is of appeal to this project. I appreciate your consideration so far in those 2 earlier issues, and we'd all be able to use |
We treat it like
(opening a new list item implicitly to hold the content). I'm not too inclined to spend much more time trying to support invalid HTML. Why don't you ask Mozilla to fix their broken HTML instead? |
Thanks @jgm for all the time you have spent so far. I saw last year when you asked the same question in #8150 (comment) and called it a "bug", but Mozilla originally & continuously supporting the abbreviated list style as acceptable HTML is a design decision: an error maybe, but not a casual one & not an oversight. It's hard to find documentation on this issue because 1) neither Mozilla nor the Thunderbird developers who have continued the practice have advertised either the difference or their position about it, and 2) the language to search for it on the Internet is too ambiguous to target the issue. There are just lots of little reports of it like this one, hoping for various kinds of support for the unconventional list style: generally meeting with refusals for application support and comments like "your HTML is wrong" (which doesn't fix the operational problem for us or anyone else). Mozilla won't correct for the problem because all browsers originally supported the legacy abbreviated list style, and have continuously ever since. (This fact in itself suggests a "feature incompleteness" for I'm not implying you're obligated to support it but the benefit to the overall community would be huge. HTML and other open source document formats should be readable for a hundred years to come, and we need something to recondition commercially broken document standards so they can be preserved. It seems an ideal choice for Note (also please for other readers) I've tried to find / configure a pre-parser for In any case I wanted to make this one last request to see if you can fix this long-standing, somewhat intractable problem at the destination: considering it will never be fixed at the source. This would be consistent with |
This is why we're getting the extra bullet. If
... then we can see
Put another way: this would be fixed if bare (unwrapped with |
Yes, I understand that. |
OK, it should now be fixed. |
At least two other test cases in the issue queue show that HTML generated by Mozilla applications, not strictly standards compliant but recently supported by
pandoc
, is still generating markdown with a different structure than the original HTML: specifically an extra bullet when the indentation level deepens:1 - #9161 (comment)
Most recently reported, and in the latest release. Under stdout (markdown): note that the
pandoc
output for the 2nd level list item has two bullets in front of it in the markdown:- - a
2 - #8150
An earlier test case that currently demonstrates the problem, and also acknowledges (@jgm @tarleb beginning at #8150 (comment)) that the posted Mozilla syntax should be supported due to "widespread" use.
But although the code has been changed so that
pandoc
now recognises this markup as a nested list, it still places a double bullet before the first more deeply nested item. Here's the currentpandoc
output from this same test case rendered as markdown: https://gist.github.com/rphair/0fc0e6a35389b039906d2490c872a2d6Once the fix to #9161 is released, we should get this output also quoted in #8150 (comment), with tight spacing and without a double bullet in front of the item L3.1:
This problem can be verified by running the same test case as in #8150 (comment) - though the output looks different today after 0d7f80c fixed the bulk of the problem.
(First found this issue on Linux in
pandoc
version3.1.1
and it still persists in currently latest version3.1.9
Debian package.)The text was updated successfully, but these errors were encountered: