Pandoc escapes characters over-aggressively when writing markdown #6259

khatchad · 2020-04-07T15:26:42Z

Suppose I want to use pandoc to convert between markdown flavors:

$ echo "# Header #1" | pandoc -t markdown

I get the following output with the #1 escaped:

Header \#1
==========

How do I prevent pandoc from doing this?

Version Info

$ pandoc --version
pandoc 2.5
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2.2, skylighting 0.7.7
Default user data directory: /home/rk1424/.pandoc
Copyright (C) 2006-2018 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose

The text was updated successfully, but these errors were encountered:

jgm · 2020-04-07T16:15:45Z

There isn't a way to prevent this at the moment, but the escaping isn't "incorrect," only unnecessary. This is perfectly valid markdown. # is escaped to avoid interpretation as a markdown control character. For example, there's a difference between

# Header #

which yields

<h1>Header</h1>

and

Header \#

which yields

<h1>Header #</h1>

In this particular case the escaping isn't necessary. Pandoc could probably be smarter about detecting such cases, but this isn't a bug.

khatchad · 2020-04-07T17:47:41Z

Thanks for the feedback. I changed the title, but I still think it's an open question regarding whether it is a bug for this one reason:

[Markdown's] key design goal is readability – that the language be readable as-is, without looking like it has been marked up with tags or formatting instructions ... -- Markdown, https://en.wikipedia.org/w/index.php?title=Markdown&oldid=946233394 (last visited Apr. 7, 2020).

Thus, the readability of the produced markdown is questionable.

jgm · 2020-04-07T19:31:20Z

I certainly agree that it would be better not to put in backslashes except when necessary.
We could try to improve the heuristics the markdown writer is currently using.

jayrobwilliams · 2020-09-25T15:37:06Z

Depending on your use case, I've found a potential workaround for the time being. I'm using R Markdown to render .Rmd files to .md for Jekyll. If you create a file fn.md

This is a footenote`[^1]:`{=html}.

`[^1]:`{=html} And here is the content of the footnote.

And then render it to gfm, it will pass the raw 'html' through and then drop the attribute tags, so pandoc fn.md -t gfm results in:

This is a footenote[^1]:.

[^1]: And here is the content of the footnote.

effectively preventing the markdown from getting backslash escaped, and giving me working footnotes for Jekyll. The key is to render it to gfm, because regular markdown will keep the attribute tags; pandoc fn.md -t markdown yields:

This is a footenote`[^1]:`{=html}.

`[^1]:`{=html} And here is the content of the footnote.

laoshaw · 2021-05-16T16:41:33Z

it does the same thing when converting from html to markdown, while it does not break things, it's truly an eyesore.

is there a list for what it will be escaping so I can use sed/whatever to remove those added escape(\#, \., \[, \], ...) as a second-stage processing?

jgm · 2021-05-16T18:49:52Z

<, >, \, `, *, _, [, ], #
@ if citations extension is enabled
| if pipe_tables enabled
^ if superscript enabled
~ if strikeout or subscript enabled
$ if tex_math_dollars enabled
. (when followed by.), ", ', - (when followed by -) if smart enabled
_ if necessary

jgm · 2021-05-16T19:01:27Z

@jayrobwilliams this seems unduly complex, given that pandoc has built in support for this style of footnotes. Have you tried -t gfm+footnotes?

See #6259.

jgm · 2021-05-16T19:26:02Z

I've pushed a change that should reduce unnecessary escapes for # and >.

laoshaw · 2021-05-17T00:11:20Z

<, >, \, `, *, _, [, ], #
@ if citations extension is enabled
| if pipe_tables enabled
^ if superscript enabled
~ if strikeout or subscript enabled
$ if tex_math_dollars enabled
. (when followed by.), ", ', - (when followed by -) if smart enabled
_ if necessary

. + - are also impacted I think, also fenced-code-block are escaped unnecessarily it seems

jgm · 2021-05-17T00:30:35Z

+ is only escaped if they occur at the start of a line (and followed by whitespace), because if unescaped they'd start a list.

- is only escaped in a potential list context (see + above) or in the context -- (where it would be an en dash if smart is enabled)

., as noted, is only escaped in the context .. (if smart is enabled).

jayrobwilliams · 2021-05-19T15:34:38Z

@jayrobwilliams this seems unduly complex, given that pandoc has built in support for this style of footnotes. Have you tried -t gfm+footnotes?

@jgm works perfectly! I totally missed in the documentation that you can append extensions to output formats; didn't event think to look for that since footnotes work natively with standard markdown. Thank you!

aslmx · 2021-10-31T10:43:32Z

I'm also having my issues with escaping of characters.

I want my template to allow for inclusion of a PDF that is put before the PDF that is produced by the markdown.

To have a simple solution (it might be optimized) i have two variables. One that is checked if the pages should be included, a second is the file name. (The use case here is to include the assignment that you are solving, just to explain the variable names ;))

assignment:
  include: 1 
  file: "assignment/task_2.pdf"

I have the following code in the template, using the package pdfpages

%debug: include assignment? $assignment.include$ $assignment.file$ 

$if(assignment.include)$
% include assignment seems on?
\includepdf[pages=-]{$assignment.file$}
$else$
%include assignment was off?
$endif$

However, if like shown above the filename contains an underscore, the intermediate .tex will have it escaped

like assignment/task\_2.pdf

This will fail to convert to pdf then.

It works fine wihtout underscores.

I have not yet found out how to either unescape this in latex (suboptimal I'd say) or (better) to not have pandoc escape this in the first place.

Any idea? is it possible?

I have tried to put the variable value into quotes, double quotes, ticks (` )... to no avail.

Thanks

jgm · 2021-10-31T16:46:30Z

There's an easy solution in your case @aslmx :

assignment:
  include: 1 
  file:  '`assignment/task_2.pdf`{=latex}'

(This is the "raw attribute" and will cause the content to be passed to LaTeX unmodified.)

aslmx · 2021-11-02T07:34:38Z

This is the "raw attribute" and will cause the content to be passed to LaTeX unmodified.

Thanks. I was looking for something like this. I will try it and report if it does not work - i assume it works fine.

thanks & br

…ent)

* erster Aufschlag für includepdf * beispiel.md: added comment for skipfirstpage * integrate changes as suggested in pandoc issue: jgm/pandoc#6259 (comment) * Modified readme.md to cater for the possibility to include ranges from PDF files * Forgot the exmaple block at begin of README.md - updated it m( * typo inf README.md - updated it m( Co-authored-by: Sebastian / sebbo <sebastian@1337lounge.de>

jeffkimbrel · 2023-03-08T19:59:42Z

In case anyone finds this useful, it wasn't immediately obvious to me how to get @jayrobwilliams solution to work for multiline code, as I was initially trying to add the {=html} after the last three backticks. But, adding it after the first three backticks, as though you are calling the syntax highlighting, works for me and keeps multiline working.

For example, a multiline quote placed as normal markdown gets collapse to one line...

> line 1
> line 2

turns to

> line1 line2

But this works correctly after pandoc conversion...

```{=html}
> test 1
> test 2
\```

W1Real · 2023-08-21T20:24:48Z

For anyone looking for a simple compromise/work-around, if your goal is to just get readability you can use rst (reStructuredText). But be warned that someone mentioned that it doesn't deal well with links, I haven't tested links. But for a plain .docx (only text contained inside, no formatting) it worked well for me.

Shout out to this StackOverflow answer to the question Pandoc Markdown to Plain Text Formatting: https://stackoverflow.com/a/61622727

khatchad changed the title ~~Pandoc incorrectly escapes characters when converting to markdown from markdown~~ Pandoc undesirably escapes characters when converting to markdown from markdown Apr 7, 2020

mb21 added format:Markdown writer labels Apr 7, 2020

cderv mentioned this issue Jul 14, 2020

Square brackets in md_document rstudio/rmarkdown#1860

Closed

3 tasks

jgm changed the title ~~Pandoc undesirably escapes characters when converting to markdown from markdown~~ Pandoc escapes characters over-aggressively when writing markdown May 16, 2021

jgm added a commit that referenced this issue May 16, 2021

Markdown writer: fewer unneeded escapes for #.

5a6399d

See #6259.

aslmx mentioned this issue Nov 2, 2021

Include PDF (e.g. for assignments) wbh-community/pandoc-wbh-template#22

Merged

jgm closed this as completed Nov 2, 2021

aslmx pushed a commit to aslmx/pandoc-wbh-template that referenced this issue Nov 3, 2021

integrate changes as suggested in pandoc issue: jgm/pandoc#6259 (comm…

4a02bf9

…ent)

jgm mentioned this issue Dec 2, 2021

round tripping escaped atx header syntax can result in headers where none were intended #7726

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandoc escapes characters over-aggressively when writing markdown #6259

Pandoc escapes characters over-aggressively when writing markdown #6259

khatchad commented Apr 7, 2020

jgm commented Apr 7, 2020

khatchad commented Apr 7, 2020 •

edited

Loading

jgm commented Apr 7, 2020

jayrobwilliams commented Sep 25, 2020

laoshaw commented May 16, 2021

jgm commented May 16, 2021 •

edited

Loading

jgm commented May 16, 2021

jgm commented May 16, 2021

laoshaw commented May 17, 2021

jgm commented May 17, 2021 •

edited

Loading

jayrobwilliams commented May 19, 2021

aslmx commented Oct 31, 2021

jgm commented Oct 31, 2021

aslmx commented Nov 2, 2021

jeffkimbrel commented Mar 8, 2023 •

edited

Loading

W1Real commented Aug 21, 2023

Pandoc escapes characters over-aggressively when writing markdown #6259

Pandoc escapes characters over-aggressively when writing markdown #6259

Comments

khatchad commented Apr 7, 2020

Version Info

jgm commented Apr 7, 2020

khatchad commented Apr 7, 2020 • edited Loading

jgm commented Apr 7, 2020

jayrobwilliams commented Sep 25, 2020

laoshaw commented May 16, 2021

jgm commented May 16, 2021 • edited Loading

jgm commented May 16, 2021

jgm commented May 16, 2021

laoshaw commented May 17, 2021

jgm commented May 17, 2021 • edited Loading

jayrobwilliams commented May 19, 2021

aslmx commented Oct 31, 2021

jgm commented Oct 31, 2021

aslmx commented Nov 2, 2021

jeffkimbrel commented Mar 8, 2023 • edited Loading

W1Real commented Aug 21, 2023

khatchad commented Apr 7, 2020 •

edited

Loading

jgm commented May 16, 2021 •

edited

Loading

jgm commented May 17, 2021 •

edited

Loading

jeffkimbrel commented Mar 8, 2023 •

edited

Loading