Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandoc modifies tex environments #4104

Open
mtomassoli opened this issue Nov 28, 2017 · 14 comments
Open

pandoc modifies tex environments #4104

mtomassoli opened this issue Nov 28, 2017 · 14 comments

Comments

@mtomassoli
Copy link

mtomassoli commented Nov 28, 2017

Consider this minimal tex file:

\documentclass[english]{article}
\begin{document}

\begin{align}
    x &= 3\\
    y &= 2
\end{align}

\end{document}

pandoc test.tex -o test.md produces test.md file with the following content:

$$\begin{aligned}
    x &= 3\\
    y &= 2\end{aligned}$$

If I use a filter, the filter receives aligned rather than align so it's already too late.

The only workaround I found is to wrap the env in $$ but to do that programmatically I'd need to properly parse the tex file.

Is this a bug or what?

I forgot:

pandoc 1.19.2.1
Compiled with pandoc-types 1.17.0.4, texmath 0.9, skylighting 0.1.1.4

on Windows 10.

@jgm
Copy link
Owner

jgm commented Nov 28, 2017 via email

@jgm
Copy link
Owner

jgm commented Nov 28, 2017

One good solution might be to modify mathEnvWith in the LaTeX reader so that, if the raw_tex extension is enabled, these environments are parsed as raw latex; otherwise, we do as before and parse as math with necessary modifications.

Since raw_tex is enabled by default in pandoc markdown, this might mean that some existing documents would break on conversion to Word, so that's a potential worry.

@mtomassoli
Copy link
Author

Couldn't you enable this behavior just for mathjax?

@jgm
Copy link
Owner

jgm commented Nov 28, 2017 via email

@mtomassoli
Copy link
Author

mtomassoli commented Nov 28, 2017

Would it be possible to preserve information about the original math environments as some kind of "metadata" so that filters could recover them? That way you wouldn't break anything.

@shawnohare
Copy link

One good solution might be to modify mathEnvWith in the LaTeX reader so that, if the raw_tex extension is enabled, these environments are parsed as raw latex; otherwise, we do as before and parse as math with necessary modifications.

Since raw_tex is enabled by default in pandoc markdown, this might mean that some existing documents would break on conversion to Word, so that's a potential worry.

I was wondering if there were any updates apropos your comment above. It seems, as of pandoc 2.10, that the raw_tex extension for the LaTeX reader is doing less than for the markdown reader? For example given the simple LaTeX snippet

% emc2.tex
\begin{equation}
  E=mc^2
\end{equation}

I see that

pandoc --mathjax --from latex+raw_tex emc2.tex --to html

outputs

<p><span class="math display">\[E=mc^2\]</span></p>

This doesn't really integrate well with MathJax, e.g. if one wishes to use MathJax to process numbered equation references directly.

For context, I was hoping to use pandoc to produce a static site and/or a personal journal where the content documents were simple MathJax compatible LaTex instead of markdown (to leverage tex editor plugins), but the modification of basic LaTeX environments by the LaTeX reader seems to be blocker for now. Is there any way around this for now that doesn't involve modifying the pandoc codebase itself?

@jgm
Copy link
Owner

jgm commented Oct 15, 2020

Indeed, an equation environment is parsed as a Math element (rather than raw tex) even if raw_tex is set. We could try changing that, but it may have some unanticipated consequences.

@jgm
Copy link
Owner

jgm commented Oct 15, 2020

As a workaround, you could try using a custom environment (not equation).
When compiling with latex you could simply define this as equivalent to equation.

@tarleb
Copy link
Collaborator

tarleb commented Jun 15, 2022

Would it be possible to move the special handling of align environments from the reader to the writer? TeXMath seems to parse the align environment ok. Are there other issues that would make such a change problematic?

@jgm
Copy link
Owner

jgm commented Jun 16, 2022

Would it be possible to move the special handling of align environments from the reader to the writer? TeXMath seems to parse the align environment ok. Are there other issues that would make such a change problematic?

That's an interesting question. The general expectation is that the contents of a Math element should be something that is valid in math mode in LaTeX (as the align environment isn't).

However, if texmath can handle all of the special math environments we handle in the reader (not just align), then a case could be made that we don't need to enforce this expectation. The real question then would be about formats where we pass through the LaTeX math unchanged (as opposed to converting with texmath). One of those is LaTeX itself, and if that were the only one, we could move this code to the writer. But there are others. The one I'm thinking of at the moment is HTML. Oddly, though, I think mathjax actually does allow align environments inside math contexts - even though they're NOT allowed by LaTeX itself. So maybe it could work. There are other formats to consider too -- org, maybe rst? I'd be reluctant to make such a change without a lot of further research.

@jgm
Copy link
Owner

jgm commented Jun 16, 2022

@tarleb It would be good to hear about why you propose this. If it's because you'd like a writer or filter to know whether align or aligned was used in the original, we could perhaps address that by having the reader add a containing span with an attribute when an environment like align is downshifted to its math-mode equivalent.

@tarleb
Copy link
Collaborator

tarleb commented Jun 16, 2022

This came up in a discussion about editor support for Quarto's Markdown. More specifically, the question was whether math should always be a dollar-delimited entity, or whether it can make more sense to use raw LaTeX for some cases. The respective issues is linked above.

The problems that came up could be solved with a filter. But there are issues, like #8122 and this one, that make me think whether a more fundamental change might be a better long term solution. It might also make it easier to add support for align to Org (#6703).

I don't understand all the constraints well enough to have a full formed opinion, it's more that I'm thinking out loud.

@epignatelli
Copy link

epignatelli commented Aug 26, 2022

Thealign environment is fundamental for any scientific text, I am very surprised this issue is still open after so many years.
Any place to start looking @jgm ?

@ajdobner
Copy link

Would it be possible to preserve information about the original math environments as some kind of "metadata" so that filters could recover them? That way you wouldn't break anything.

This solution mentioned seems like a good one. Doing something like enabling attributes on the Math data type would fix this problem of determining whether something is an align environment and would also allow for equation labels and numbering (cf. this pull request jgm/pandoc-types#97). I'm assuming the downside is that it requires an API change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants