Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$ symbol in Display Math #7942

Closed
PassionPenguin opened this issue Feb 26, 2022 · 11 comments
Closed

$ symbol in Display Math #7942

PassionPenguin opened this issue Feb 26, 2022 · 11 comments
Labels

Comments

@PassionPenguin
Copy link

Explain the problem.

I've made a filter to convert Math into svg with lua + latex + dvisvgm, and since latex cannot recognize CJK chars in either mhchem or chemfig, i have to quote these chars with \text{}, but since this is a math-mode control sequence, i have to quote it with $$ again...

Now with the quoted latex code it works in latex parser, but rise a new problem: pandoc seems to parse inline-math first than display-math, causing the whole latex code not parsed and the output is like this:

<p>$$ \ce{<span
class="math inline">$\underset{\ce{CO2}}{\ce{6[1C]}}$</span> + 6[5C]
-&gt; <span
class="math inline">$\underset{\text{3-磷酸甘油酸}}{\ce{12[3C]}}$</span>}\
\ce{12[3C] -&gt;[ATP] <span
class="math inline">$\underset{\text{1,3-二磷酸甘油酸}}{\ce{12[3C]}}$</span>}\
\ce{12[3C] -&gt;[NaDPH] <span
class="math inline">$\underset{\text{3-磷酸甘油醛/二羟丙酮磷酸}}{\ce{12[3C]}}$</span>}\
\ce{<span
class="math inline">$\underset{3-磷酸甘油醛}{\ce{2[3C]}}$</span>
-&gt;[\text{酶}] <span
class="math inline">$\underset{\text{葡萄糖}}{\ce{1[6C]}}$</span>}\
\ce{10[3C] -&gt;[\text{酶}][ATP] 6[5C]} $$</p>

Reproduce

tmp.md:

$$
\ce{$\underset{\ce{CO2}}{\ce{6[1C]}}$ + 6[5C] -> $\underset{\text{3-磷酸甘油酸}}{\ce{12[3C]}}$}\\
\ce{12[3C] ->[ATP] $\underset{\text{1,3-二磷酸甘油酸}}{\ce{12[3C]}}$}\\
\ce{12[3C] ->[NaDPH] $\underset{\text{3-磷酸甘油醛/二羟丙酮磷酸}}{\ce{12[3C]}}$}\\
\ce{$\underset{3-磷酸甘油醛}{\ce{2[3C]}}$ ->[\text{酶}] $\underset{\text{葡萄糖}}{\ce{1[6C]}}$}\\
\ce{10[3C] ->[\text{酶}][ATP] 6[5C]}
$$

shell command:

pandoc --lua-filter=filters\texsvg.lua -s 'tmp.md' -o tmp.html -f commonmark_x

Pandoc version?
pandoc.exe 2.17.1.1
Compiled with pandoc-types 1.22.1, texmath 0.12.4, skylighting 0.12.2,
citeproc 0.6.0.1, ipynb 0.2
User data directory: C:\Users\Hoarfroster\AppData\Roaming\pandoc
Copyright (C) 2006-2022 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

@jgm
Copy link
Owner

jgm commented Feb 26, 2022

I believe the real issue here is the \ce command, which pandoc/texmath doesn't understand. (texmath would have to know that this is a command that can occur in math mode but whose contents are interpreted as text mode...) What is this command, and what package defines it?

@PassionPenguin
Copy link
Author

it's actually mhchem's control sequence \ce...


Thanks for your answer~

I have another reproducable case as following:

tmp.md

$$
\begin{align}
This is $\text{Hello World}$
\end{align}
$$

sh

pandoc --lua-filter=filters\texsvg.lua -s 'tmp.md' -o tmp.html -f commonmark_x

tmp.html

...
<p>$$ \begin{align} This is <span class="math inline">Hello World</span>
\end{align} $$</p>
...

@PassionPenguin
Copy link
Author

if there's no $ in double-$ like this, then it will be treated as DisplayMath well...

$$
\begin{align}
This is Hello World
\end{align}
$$
<p><span class="math display">$$
\begin{align}
This is Hello World
\end{align}
$$</span></p>

$ is needed in several packages(chemfig, mhchem) ever since they donot support non-ascii? chars and i have to use math-mode \text{} or other stuff to quote those chars...

currently i have to change all these code to CodeBlock, which is really a big stuff

@jgm
Copy link
Owner

jgm commented Feb 27, 2022

This isn't valid LaTeX, though:

$$
\begin{align}
This is $\text{Hello World}$
\end{align}
$$

Try that with pdflatex.
Even if you remove the $$s it is not valid.

@PassionPenguin
Copy link
Author

\begin{align}
This is $\text{Hello World}$
\end{align}

sorry for that wrong example....

anyway $ in \ce or \chemfig directly is valid for latex but not in pandoc.

@PassionPenguin
Copy link
Author

I found something interesting:

markdown:

$$
\schemestart
\chemfig{*6([0,.5]------)} \arrow{->[$\text{Hello}$]}
\schemestop
$$
sh 1 sh 2
pandoc -s tmp.md -o tmp.html pandoc -s tmp.md -o tmp.html -f commonmark_x
math-display math-inline
sh 1
<p><span class="math display">$$
\schemestart
\chemfig{*6([0,.5]------)} \arrow{-&gt;[$\text{Hello}$]}
\schemestop
$$</span></p>
sh 2
<p>$$ \schemestart \chemfig{*6([0,.5]——)} \arrow{-&gt;[<span
class="math inline">Hello</span>]} \schemestop $$</p>

what cause the problem?


the code above is valid latex as expanded:

\documentclass[12pt,preview]{standalone}
\usepackage{ctex}
\usepackage[version=4]{mhchem}
\usepackage{chemfig}
\usepackage[libertine]{newtxmath}

\begin{document}
\begin{preview}
\title{Hello World}
\author{Hoarfroster}


\schemestart
\chemfig{*6([0,.5]------)} \arrow{->[$\text{Hello}$]}
\schemestop

\end{preview}
\end{document}

@jgm
Copy link
Owner

jgm commented Feb 28, 2022

mhchem isn't supported yet; see #6668 and give the filter there a try.

@jgm
Copy link
Owner

jgm commented Feb 28, 2022

OK, now I understand your issue better. The markdown reader doesn't cause the problem, it's just commonmark with the tex_math_dollars extension. The following cases illustrate the problem:

% pandoc -f commonmark+tex_math_dollars -t native
$$
\text{Hi $x$}
$$
^D
[ Para
    [ Str "$$"
    , SoftBreak
    , Str "\\text{Hi"
    , Space
    , Math InlineMath "x"
    , Str "}"
    , SoftBreak
    , Str "$$"
    ]
]
% pandoc -f commonmark+tex_math_dollars -t native
$$
\text{Hi}
$$
^D
[ Para [ Math DisplayMath "\n\\text{Hi}\n" ] ]

@jgm
Copy link
Owner

jgm commented Feb 28, 2022

I'm going to transfer this to commonmark-hs, because that's where the issue lies.

jgm added a commit to jgm/commonmark-hs that referenced this issue Feb 28, 2022
...embedded inline math.  See jgm/pandoc#7942.

Note, however, that there's a larger issue this doesn't
solve.  For in principle you could have embedded math
in inline math, e.g.

```
$\text{hi $x$ there}$
```

Pandoc's markdown parser gets this right, but that's
because it uses a LaTeX tokenizer.
@jgm
Copy link
Owner

jgm commented Feb 28, 2022

I've added a fix for this in commonmark-extensions, but it's not perfect. Looking back at pandoc's LaTeX reader, I see that we have code that counts groupings:

dollarsMath :: PandocMonad m => LP m Inlines
dollarsMath = do
  symbol '$'
  display <- option False (True <$ symbol '$')
  (do contents <- try $ untokenize <$> pDollarsMath 0
      if display
         then mathDisplay contents <$ symbol '$'
         else return $ mathInline contents)
   <|> (guard display >> return (mathInline ""))

-- Int is number of embedded groupings
pDollarsMath :: PandocMonad m => Int -> LP m [Tok]
pDollarsMath n = do
  tk@(Tok _ toktype t) <- anyTok
  case toktype of
       Symbol | t == "$"
              , n == 0 -> return []
              | t == "\\" -> do
                  tk' <- anyTok
                  (tk :) . (tk' :) <$> pDollarsMath n
              | t == "{" -> (tk :) <$> pDollarsMath (n+1)
              | t == "}" ->
                if n > 0
                then (tk :) <$> pDollarsMath (n-1)
                else mzero
       _ -> (tk :) <$> pDollarsMath n

This could be ported over to commonmark-hs.

jgm added a commit to jgm/commonmark-hs that referenced this issue Feb 28, 2022
...embedded inline math.  See jgm/pandoc#7942.

Note, however, that there's a larger issue this doesn't
solve.  For in principle you could have embedded math
in inline math, e.g.

```
$\text{hi $x$ there}$
```

Pandoc's markdown parser gets this right, but that's
because it uses a LaTeX tokenizer.
@jgm jgm closed this as completed in f387d9b Feb 28, 2022
@PassionPenguin
Copy link
Author

OK, now I understand your issue better. The markdown reader doesn't cause the problem, it's just commonmark with the tex_math_dollars extension. The following cases illustrate the problem:

% pandoc -f commonmark+tex_math_dollars -t native
$$
\text{Hi $x$}
$$
^D
[ Para
    [ Str "$$"
    , SoftBreak
    , Str "\\text{Hi"
    , Space
    , Math InlineMath "x"
    , Str "}"
    , SoftBreak
    , Str "$$"
    ]
]
% pandoc -f commonmark+tex_math_dollars -t native
$$
\text{Hi}
$$
^D
[ Para [ Math DisplayMath "\n\\text{Hi}\n" ] ]

That‘s it...thx for addressing the problem~

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jan 25, 2023
0.2.3.3
* Fix definition_lists extension (#96). We were not properly consuming
  indentation in definitions, which caused problems when the definitions
  themselves contained lists.

0.2.3.2
* Update lower version bounds for commonmark (#93, David Thrane
  Christiansen).

0.2.3.1
* math extension: don't fail when display math contains embedded inline
  math. See jgm/pandoc#7942.
* Make math parsing more sophisticated. Count embeddings inside {..}, since
  math can contain e.g. \text{...} which itself contains math delimiters.
* Small improvement in pipe table parsing. The old parser failed on some
  edge cases with extra whitespace after pipes (which we should just
  ignore).
* fancy_list extension: improve list type ambiguity resolution (#89).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants