New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

un-escaped characters for asciidoc output #2337

Open
tsagkase opened this Issue Jul 30, 2015 · 8 comments

Comments

Projects
None yet
5 participants
@tsagkase
Copy link

tsagkase commented Jul 30, 2015

Pandoc version 1.15.0.6 doesn't correctly escape asciidoc output

$ echo '<a href="http://example.com">][</a>' | pandoc -f html -t asciidoc
http://example.com[][]

which asciidoc would render back as ...

$ echo '<a href="http://example.com">][</a>' | pandoc -f html -t asciidoc|asciidoc - |grep example\.com
<div class="paragraph"><p><a href="http://example.com">http://example.com</a>[]</p></div>

Unfortunately, the rules for escaping asciidoc special chars are complex and I cannot point to a single place in the asciidoc documentation. The general rule is that the '' character is used to escape. So with correct quoting/escaping ...

$ echo '<a href="http://example.com">\][</a>' | pandoc -f html -t asciidoc|asciidoc - |grep example\.com
<div class="paragraph"><p><a href="http://example.com">][</a></p></div>

References

@jgm jgm added the bug label Nov 20, 2015

@jgm jgm added the writer label Mar 11, 2017

@jgm jgm added this to the pandoc 2.0 milestone Mar 11, 2017

@jgm

This comment has been minimized.

Copy link
Owner

jgm commented May 6, 2017

It would be easy to escape all these special characters, but the output would likely be ugly.
Not sure it's worth it if these cases are rare...

@jgm jgm removed this from the pandoc 2.0 milestone May 6, 2017

@mako4

This comment has been minimized.

Copy link

mako4 commented Apr 10, 2018

Escaping with backslashes is not that easy in asciidoc, because it is very picky about only accepting a backslash escape in exactly that cases were it would recognize a command (with exceptions), otherwise it will render a backslash literal. (I'm using asciidoctor as the reference here, I haven't tried the orginal implementation)

E.g. escaping <<...>> to make asciidoc not render them as in-document references

\<<not a proper reference>

\<<proper reference>>

will render a backslash for the first line:

\<<not a proper reference>
<<proper reference>>

Edit: Apparently, there is a much more reliable way to do this with passthroughs: ++<<++proper reference>> will work just fine. This is the unconstrained version of the +...+ passthrough markers. Here is the relevant section of the documentation: Escaping unconstrained quotes

@lisa

This comment has been minimized.

Copy link

lisa commented Jan 9, 2019

I believe I have another two instances of this but with this mediawiki, input:

# pandoc-mediawiki-asciidoc-bug.mediawiki file
Syntax defect begin <code>[a-zA-Z_][a-zA-Z0-9_]*</code> (syntax defect middle <code>__</code>) syntax defect near-end <code>[a-zA-Z_:][a-zA-Z0-9_:]*</code>. syntax defect end.

I have used variations of the phrase "syntax defect" as a way to sanitize and minimize the real-life source, and to illustrate the defect. Converting the file with pandoc -s -f mediawiki pandoc-mediawiki-asciidoc-bug.mediawiki -t asciidoc provides this output:

Syntax defect begin `[a-zA-Z_][a-zA-Z0-9_]*` (syntax defect middle `__`)
syntax defect near-end `[a-zA-Z_:][a-zA-Z0-9_:]*`. syntax defect end.

There are two escape issues with the output identified below with ^ characters:

Syntax defect begin `[a-zA-Z_][a-zA-Z0-9_]*` (syntax defect middle `__`)
                                          ^                         ^
syntax defect near-end `[a-zA-Z_:][a-zA-Z0-9_:]*`. syntax defect end.

Before the characters indicated by the ^ should be a literal \ to escape them, as in:

Syntax defect begin `[a-zA-Z_][a-zA-Z0-9_]\*` (syntax defect middle `\__`)
syntax defect near-end `[a-zA-Z_:][a-zA-Z0-9_:]*`. syntax defect end.

To summarize: I believe there are two separate defects broadly related to unescaped characters:

  1. The two regular expressions appear to interact with one another, with the * character in the first regex appearing to act as a bold start and the * in the second regex acting as the bold end.
  2. The __ appears to act as a single _ when it should be treated as a literal __ because it is between <code></code> mediawiki tags.

Version information:

pandoc 2.5
Compiled with pandoc-types 1.17.5.4, texmath 0.11.1.2, skylighting 0.7.4

MacOS 10.13.6, pandoc installed via homebrew.

@jgm

This comment has been minimized.

Copy link
Owner

jgm commented Jan 9, 2019

Asciidoc is crazy!!
With this input

`[0-9]*`
`[0-9]*`

asciidoctor gives you

<code><strong class="0-9"></code>
<code>[0-9]</strong></code>

which isn't even well-formed HTML.
But with

`0-9*`
`0-9*`

you get

<code>0-9*</code>
<code>0-9*</code>

I have to believe this is a bug in asciidoctor and not the intended behavior. I'm not going to try to work around all these quirks.

@jgm

This comment has been minimized.

Copy link
Owner

jgm commented Jan 9, 2019

EVen worse, if you try to escape the *s in the first example above

`[0-9]\*`
`[0-9]\*`

you get

<code>[0-9]*</code>
<code>[0-9]\*</code>

The first backslash acts as an escape and the second one doesn't!
If this is intentional, it's an insane design decision. How are users supposed to keep track of what a backslash does in these contexts??

@lisa

This comment has been minimized.

Copy link

lisa commented Jan 9, 2019

@jgm Would it help if we opened an issue about this with the upstream project, or supported you (as the owner of this repo) in that endeavour?

@jgm

This comment has been minimized.

Copy link
Owner

jgm commented Jan 9, 2019

@lisa If you'd like to inquire upstream about whether this is intended behavior, and ask them to clarify the escaping rules, that would be great.

@mako4

This comment has been minimized.

Copy link

mako4 commented Feb 4, 2019

Passthrough quotes fix this as well:

`++[0-9]*++`
`++[0-9]*++`

will produce the intended output.

Still also a bug in asciidoctor, as the output isn't proper html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment