Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaTeX to HTML: 'align' environment is incorrectly converted to 'aligned' #7968

Closed
SunderB opened this issue Mar 16, 2022 · 6 comments
Closed
Labels

Comments

@SunderB
Copy link

SunderB commented Mar 16, 2022

Explain the problem.
I'm attempting to convert some LaTeX files to HTML (using MathJax for maths rendering) and am trying to get AMS equation numbering to work. I noticed in the generated HTML files that pandoc had converted align environments to aligned, which seems to break the AMS numbering. In researching this issue I also found a Stack Exchange post about it: https://tex.stackexchange.com/questions/561133/how-to-prevent-pandoc-from-converting-align-environment-to-aligned-environment.

Example:
Taken from https://tex.stackexchange.com/questions/561133/how-to-prevent-pandoc-from-converting-align-environment-to-aligned-environment. Try it yourself on the pandoc website.

Input:

\documentclass{article}
\usepackage{amsmath}
\title{Demo}
\begin{document}

\begin{align}
1 + 0 & = 1, \label{eq1} \\
1 + 1 & = 2. \label{eq2}
\end{align}

Equation \( \eqref{eq1} \) and \( \eqref{eq2} \) describe eternal
truths.

\end{document}

Output:

<p><span class="math display">\[\begin{aligned}
1 + 0 &amp; = 1, \label{eq1} \\
1 + 1 &amp; = 2. \label{eq2}\end{aligned}\]</span></p>
<p>Equation <span class="math inline">\(\eqref{eq1}\)</span> and <span
class="math inline">\(\eqref{eq2}\)</span> describe eternal truths.</p>

Expected output:

<p><span class="math display">\[\begin{align}
1 + 0 &amp; = 1, \label{eq1} \\
1 + 1 &amp; = 2. \label{eq2}\end{align}\]</span></p>
<p>Equation <span class="math inline">\(\eqref{eq1}\)</span> and <span
class="math inline">\(\eqref{eq2}\)</span> describe eternal truths.</p>

Pandoc version?
pandoc 2.17.1.1 on Windows 10

@SunderB SunderB added the bug label Mar 16, 2022
@jgm
Copy link
Owner

jgm commented Mar 16, 2022

Actually your expected output is INVALID LaTeX, even though MathJax seems to accept it:

\[\begin{align}
1 + 0 &amp; = 1, \label{eq1} \\
1 + 1 &amp; = 2. \label{eq2}\end{align}\]

You can't put the align environment inside math mode (\[..\]). That's why pandoc does this transformation when parsing math.

@jgm
Copy link
Owner

jgm commented Mar 16, 2022

Hard to know how to handle this. We could change the LaTeX reader so that if the raw_tex extension is enabled, we parse align environments as raw tex rather than math; the HTML writer already knows how to render these for MathJax. But this has the drawback that raw_tex will cause the LaTeX reader to parse certain other things in a way that will make them invisible in HTML output (e.g., lettrine, hbox, mbox, input).

Maybe we need a new extension that tells the LaTeX reader to parse just math environments as raw LaTeX. But this is a bit unprincipled; it's an extension solely motivated by one particular conversion (latex -> html with mathjax).

@jgm
Copy link
Owner

jgm commented Mar 16, 2022

Here's the diff for parsing these as raw latex when raw_tex is enabled:

% git diff
diff --git a/src/Text/Pandoc/Readers/LaTeX/Math.hs b/src/Text/Pandoc/Readers/LaTeX/Math.hs
index 9f3d6fe53..bdb8be1b6 100644
--- a/src/Text/Pandoc/Readers/LaTeX/Math.hs
+++ b/src/Text/Pandoc/Readers/LaTeX/Math.hs
@@ -25,6 +25,8 @@ import Control.Applicative ((<|>), optional)
 import Control.Monad (guard, mzero)
 import qualified Data.Map as M
 import Data.Text (Text)
+import Text.Pandoc.Extensions (extensionEnabled, Extension(Ext_raw_tex))
+import Text.Pandoc.Options (ReaderOptions(readerExtensions))
 
 dollarsMath :: PandocMonad m => LP m Inlines
 dollarsMath = do
@@ -76,9 +78,15 @@ mathEnv name = do
 
 inlineEnvironment :: PandocMonad m => LP m Inlines
 inlineEnvironment = try $ do
-  controlSeq "begin"
-  name <- untokenize <$> braced
-  M.findWithDefault mzero name inlineEnvironments
+  (name, rawstart) <- withRaw (controlSeq "begin" *> braced)
+  case M.lookup (untokenize name) inlineEnvironments of
+    Nothing -> mzero
+    Just parser -> do
+      parseRaw <- extensionEnabled Ext_raw_tex <$> getOption readerExtensions
+      if parseRaw
+         then rawInline "latex" . untokenize . (rawstart ++) . snd
+                <$> withRaw parser
+         else parser
 
 inlineEnvironments :: PandocMonad m => M.Map Text (LP m Inlines)

@jgm
Copy link
Owner

jgm commented Mar 16, 2022

Maybe a good solution would be a Lua filter that matches Math elements beginning with \begin{aligned} and changes them to RawInline "latex" elements with aligned changed to align. A bit kludgy but not too hard.

@jgm
Copy link
Owner

jgm commented Mar 16, 2022

Here you go!

% cat align.lua 
function Math(el)
  if el.text:match("\\begin{aligned}") then
    local raw = el.text:gsub("{aligned}","{align}")
    return pandoc.RawInline("latex", raw)
  end
end
% pandoc -L align.lua -f latex -t html --mathjax
 
\begin{align}
1 + 0 & = 1, \label{eq1} \\
1 + 1 & = 2. \label{eq2}
\end{align}
^D
<p><span class="math display">\[\begin{align}
1 + 0 &amp; = 1, \label{eq1} \\
1 + 1 &amp; = 2. \label{eq2}\end{align}\]</span></p>

@tarleb
Copy link
Collaborator

tarleb commented Jun 16, 2022

This appears to be a duplicate of #4104; thus closing.

@tarleb tarleb closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants