Incorrect smart quote conversion for input HTML with markup #3424

lisaah · 2017-02-06T04:40:18Z

Version: 1.19.2.1
Command: pandoc -o output.html input.html -S

input.html

"Hello world..." "Hello world..."

"Hello world..." "Hello world..."

output.html

"Hello world…" “Hello world…”

"Hello world…" “Hello world…”

Expected:

“Hello world…” “Hello world…”

“Hello world…” “Hello world…”

Having any html tag seems to break the smart quote conversion. The ellipsis conversion seems fine. The smart quotes seem to convert correctly when a markdown version of this is used (e.g. "Hello *world*..." "Hello world..."). Am I missing something or is this a bug?

The text was updated successfully, but these errors were encountered:

jgm · 2017-02-06T09:08:22Z

As a workaround try converting to markdown without --smart, then converting the result with --smart.

lisaah · 2017-02-07T02:31:24Z

Hah, yep, that workaround will do for now. Thanks for looking into it!

jgm · 2019-01-11T05:34:51Z

The issue is that in the HTML reader we apply smartPunctuation only in parsing tag contents. So it only works, as it were, between tags. The reasons for this are a bit complex: the HTML reader parses a string of tokens produced by an HTML5 tokenizer. So we can't use our existing smart punctuation code, which operates on strings, on that -- but we can use it on the tag contents.

One could ~~simply~~ duplicate smart punctuation parsing logic using token parsers in the HTML reader, for a better solution. (Crossed out 'simply' because it's not that simple; I guess it would require splitting tag contents so that quotes were separately recognizable tokens.)

jgm added bug format:HTML reader labels Feb 6, 2017

fiapps mentioned this issue May 9, 2018

Markdown: problem parsing single quotes after raw LaTeX #4637

Closed

jgm mentioned this issue Jan 11, 2019

Emphasis in titles trips up double→single quotation marks conversion if using JSON db jgm/pandoc-citeproc#373

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect smart quote conversion for input HTML with markup #3424

Incorrect smart quote conversion for input HTML with markup #3424

lisaah commented Feb 6, 2017

jgm commented Feb 6, 2017

lisaah commented Feb 7, 2017

jgm commented Jan 11, 2019 •

edited

Incorrect smart quote conversion for input HTML with markup #3424

Incorrect smart quote conversion for input HTML with markup #3424

Comments

lisaah commented Feb 6, 2017

jgm commented Feb 6, 2017

lisaah commented Feb 7, 2017

jgm commented Jan 11, 2019 • edited

jgm commented Jan 11, 2019 •

edited