Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: MathJax-node in Pandoc #3153

Closed
ickc opened this issue Oct 8, 2016 · 24 comments
Closed

Feature Request: MathJax-node in Pandoc #3153

ickc opened this issue Oct 8, 2016 · 24 comments

Comments

@ickc
Copy link
Contributor

ickc commented Oct 8, 2016

The suggestion is made as a sidetrack in a few issues: #2758 and jgm/pandoc-templates#219. I think may be a dedicated issue should be made.

What Does It Do?

It is similar to what MathJax does, but on the "server-side". i.e. rather than render it every time it is viewed, it pre-renders it.

What Problems Can It Solve?

Alternative Rendering Engine for MathML Output

At the very least, it provides another choice to output MathML. If features unique to MathJax is needed, mathjax-node will be a better alternatives to the existing choices of MathML output options.

Static, Non-Rasterized and HTML Rendering Engine Independent

There were already a few other choices in pandoc that provide a truly static (does not depends on javascript) output: MathML, PNG. PNG are rasterized, MathML support is broken. MathJax-node have CHTML/SVG options that are superior to both (MathJax also has MathML and PNG output; and CHTML is the newest and fastest option).

Self-contained MathJax

It is difficult to self-contained MathJax since mathjax.js will requests additional .js files. To self-contain Maths, either you sacrifies MathJax (and use some other options which might lacks some of the features of the MathJax you relies on), or you need to embed MathJax in your package (see mathjax/MathJax-grunt-cleaner to make the MathJax footprint smaller for embedding).

In this sense, MathJax-node provides a self-contained way of using MathJax, and would be useful in, say, ePub(3) output.

How to implement it in pandoc?

No idea, but the following are my observations:

MathJax-node/bin/tex2html could be used to create a filter to process each math elements and turn it into raw HTML.

It will then has an external dependancy of npm install mathjax-node -g. The size is big (63.8 MB) since it includes the whole MathJax (54.5MB) in it. However if pandoc only use some particular features, the MathJax can be MathJax-grunt-cleaner cleansed to about 2MB (say for the CHTML option only). May be the MathJax and MathJax-node can be forked and created a small foot-print version for pandoc specifically?

In terms of command line option, one way is that whenever pandoc ... --self-contained ... --mathjax ... is used, then it actually uses MathJax-node to provide a "self-contained MathJax". Or else may be --mathjax-node option can be provided.

"I want to do it now"

Install MathJax-node by npm install mathjax-node -g.

page2html as a post-processor

If the output is HTML5, MathJax-node/bin/page2html can be used as a post-processor after pandoc. page2html is specifically said to

Reads an HTML5 file…

page2html as a pre-processor

I tried to process the markdown file directly with page2html and it seems ok. Basically page2html is used here beyond what it is designed for, but any markdown syntaxes probably will be ignored because it would treat it as texts. I'm not sure if raw HTML, div, span used in the pandoc markdown source might confuses page2html though.

In this case, page2html is used as a pre-processor to process the markdown first. The Math becomes raw HTML, and should be ok when parsed by pandoc, and then output to any HTML-related output including ePub(3).

Caveat

@jgm
Copy link
Owner

jgm commented Oct 10, 2016

Can you clarify how this would help? As far as I can see, tex2html from mathjax-node produces regular HTML that depends on MathJaX CSS and fonts. So, MathJax fonts would have to be included to get self-contained output. Perhaps this can be done though.

Is there a way to get pure SVG output that doesn't depend on fonts?

@jgm
Copy link
Owner

jgm commented Oct 10, 2016

By the way, if we did do this, I think a --mathjax-node option would be the way to go, and the user should be required to install mathjax-node separately; we don't want to package this with pandoc.

@ickc
Copy link
Contributor Author

ickc commented Oct 11, 2016

My bad: I have installed the MathJax fonts locally so I didn't see it call resources from cdn.mathjax.org. I should have checked the source.

I made ickc/pandoc-MathJax-node public. There's some test files to show the results of different provided bin. I can't get the PNG output (batik throw an error) which is not important for the discussion. The SVG and MML output doesn't call cdn.mathjax.org. The SVG output is perfect and self-contained. This should be the suggestion I made above, i.e. page2svg, not page2html.

@ickc
Copy link
Contributor Author

ickc commented Oct 11, 2016

As a sidenote, the makefile in ickc/pandoc-MathJax-node and its result shows that pandoc HTML reader cannot parse math. For the demo over there it isn't important though.

@jgm
Copy link
Owner

jgm commented Oct 11, 2016

+++ ickc [Oct 11 16 04:57 ]:

As a sidenote, the makefile in [1]ickc/pandoc-MathJax-node and its
result shows that pandoc HTML reader cannot parse math. For the demo

It can indeed parse mathml in math tags.
It doesn't parse tex math because that isn't part of HTML.

@jgm
Copy link
Owner

jgm commented Oct 14, 2016

Thanks for showing how to do this! But I wonder whether it's even necessary to make changes to pandoc. Look how easy it is. Create a markdown file, newsample.md:

The Lorenz Equations
--------------------

$$
\begin{aligned} \dot{x} & = \sigma(y-x) \\ \dot{y} & = \rho x - y
- xz \\ \dot{z} & = -\beta z + xy \end{aligned}
$$


The Cauchy-Schwarz Inequality
-----------------------------

$$
\left( \sum\_{k=1}^n a\_k b\_k \right)^{\!\!2} \leq
\left( \sum\_{k=1}^n a\_k^2 \right) \left( \sum\_{k=1}^n
b\_k^2 \right)
$$

A Cross Product Formula
-----------------------

$$
\mathbf{V}\_1 \times \mathbf{V}\_2 = \begin{vmatrix}
\mathbf{i} & \mathbf{j} & \mathbf{k} \\ \frac{\partial
X}{\partial u} & \frac{\partial Y}{\partial u} & 0 \\
\frac{\partial X}{\partial v} & \frac{\partial Y}{\partial v} &
0 \\ \end{vmatrix}
$$

The probability of getting \(k\) heads when flipping \(n\) coins is:
------------------------------------------------------------------------
$$
P(E) = {n \choose k} p^k (1-p)^{ n-k}
$$

An Identity of Ramanujan
------------------------
$$
\frac{1}{(\sqrt{\phi \sqrt{5}}-\phi) e^{\frac25 \pi}} =
1+\frac{e^{-2\pi}} {1+\frac{e^{-4\pi}} {1+\frac{e^{-6\pi}}
{1+\frac{e^{-8\pi}} {1+\ldots} } } }
$$

Now all you need to create an HTML file with SVG math is:

pandoc -s --mathjax newsample.md | page2svg > newsample.svg.html

I don't see any particular changes that are needed in pandoc, since this simple pipe suffices.
It would be nice to document this in the manual, of course.

It would be possible to add a new option --mathjax-node that pipes the output through page2svg behind the scenes, but I think I prefer the explicit approach above. It lets people know that they need the page2svg executable, and it lets them set the options they want on the executable.

@jgm
Copy link
Owner

jgm commented Oct 14, 2016

OK, two reasons for more integration into pandoc:

  1. the pipe trick won't work with EPUB output
  2. we don't want the link to the mathjax svg included, and the above command does that

So, proposal: add an option --mathjax-node or maybe --mathjax-svg that pipes the output of the HTML writer through page2svg and does not insert a link. If page2svg is missing, an error should be issued telling where to get it.

Unfortunately, since the HTML writer is pure, we can't do this piping in the writer. We need to do it in two places, pandoc.hs (for HTML, revealjs, etc.) and the EPUB writer (for EPUB). A bit ugly but it could work.

@ickc
Copy link
Contributor Author

ickc commented Oct 14, 2016

May be a filter that passes each math element through tex2svg? The filter can check for HTML related input format only. In this way, it is "universal". The pipe method strictly speaking should only work for HTML5 output format since that is what the script assumes the input format to be (though I guess XHTML should works fine).

The remaining question then is if this filter should be officially embedded in pandoc or as an external filter.

@ickc
Copy link
Contributor Author

ickc commented Oct 14, 2016

I was replying too quick before I think clear: it doesn't quite work as an external filter since the --mathjax option is needed, and currently the script src to cdn will be included.

@jgm
Copy link
Owner

jgm commented Oct 16, 2016

It would certainly be possible to write a filter that works
for every input format and sends math bits through tex2svg.
The question is what to do with the output. We could put
them into data uris on image elements, I suppose.

+++ ickc [Oct 14 16 15:14 ]:

May be a filter that passes each math element through tex2svg? The
filter can check for HTML related input format only. In this way, it is
"universal". The pipe method strictly speaking should only work for
HTML5 output format since that is what the script assumes the input
format to be (though I guess XHTML should works fine).

The remaining question then is if this filter should be officially
embedded in pandoc or as an external filter.


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. Feature Request: MathJax-node in Pandoc #3153 (comment)
  2. https://github.com/notifications/unsubscribe-auth/AAAL5EvJEq_1xovIEXDtjBlnDhudlcPlks5qz_7dgaJpZM4KRlk0

@ickc
Copy link
Contributor Author

ickc commented Oct 19, 2016

Data uris seems to some some advantages: increased file size (though gzip seems to reduce that increase to be negligible), more computation complexity (battery life, rendering time, though today's smartphone are very capable so that might be negligible). And since HTML support inline SVG, inline SVG might be better. i.e. those math element are turned into raw HTML element and left as inline HTML in the output.

@jgm
Copy link
Owner

jgm commented Nov 20, 2016

I've created a standalone filter -- try it out!
https://github.com/jgm/pandoc-tex2svg

@jgm jgm closed this as completed Nov 20, 2016
@ickc
Copy link
Contributor Author

ickc commented Nov 20, 2016

I will try it soon.

Would it be possible to bundle it in the pandoc's binary installer, similar to citeproc?

@jgm
Copy link
Owner

jgm commented Nov 21, 2016

+++ ickc [Nov 20 16 15:54 ]:

I will try it soon.

Would it be possible to bundle it in the pandoc's bainary installer,
similar to citeproc?

Possible, yes, but I don't know if it's desirable, since it
depends on an external program that needs to be installed
and in path.

@suknat
Copy link

suknat commented Dec 14, 2016

Beauty thy name is pandoc-tex2svg!

I've moved to Homebrew so having to use Haskell was a bit of pain, but I've never had so much gain from so little pain!

Thanks.

@jgm
Copy link
Owner

jgm commented Dec 14, 2016 via email

@ickc
Copy link
Contributor Author

ickc commented Dec 15, 2016

I've moved to Homebrew so having to use Haskell was a bit of pain, but I've never had so much gain from so little pain!

You can write a brew formula for it. I've seen quite a few pandoc filters written in haskell has brew formula. You can take a look at those examples. (a recent one is pandoc-sidenote).

@suknat
Copy link

suknat commented Dec 15, 2016

@jgm It may be slow, but it works so I'm happy for now.

@ickc above my pay grade to write brew formulae at the moment.

@ickc
Copy link
Contributor Author

ickc commented Jan 9, 2017

(I considered posting this in pandoc-discuss, but there isn't a lot of discussion over there. So may be it is easier to index and reference when put together here.)

Some updates:

  • According to [discussion] re-organizing the examples in /bin mathjax/MathJax-node#208, MathJax-node will not includes those bin in the future, but to separate it into mathjax-node-page (currently in pkra/mathjax-node-page), which corresponds to the current page2... bin. I'm asking them about the future of bin for standalone equation, e.g. the tex2svg pandoc-tex2svg used.

  • In support (html) fragments pkra/mathjax-node-page#6, @pkra has mentioned the intention for mjpage to support markdown input as well. I updated the tests I created in ickc/pandoc-MathJax-node using this method. (i.e. preprocess the markdown by mjpage before pandoc.) The result is actually quite good (except with some problems with \ref, which is not supported by pandoc-tex2svg anyway). One example show a problem that mjpage "ate" some of the markdown horizontal line. A filter like pandoc-tex2svg will guarantee this will never happen.

  • Speed: real quick. Try make from my repository. It's almost instantaneous.

@shreevatsa
Copy link

In reply to #3153 (comment)

Unfortunately pandoc-tex2svg is too slow. The reason is clear: there's a huge startup cost when node loads the library. If the filter were written in node, then the library could be loaded once, not once for each equation. I started writing this, actually, but got confused about how to deal with the callbacks. If a JavaScript programmer wants to colaborate on this, let me know.

I'm not a real JavaScript programmer, but I try :-) I looked into this. Firstly, how slow is pandoc-tex2svg? To measure, I removed caching from pandoc-tex2svg.hs, took math-samples.md (https://github.com/jgm/pandoc-tex2svg/blob/f4154482/math-samples.md) and made bigger copies of it:

% cp math-samples.md math-samples-1.md
% function next() { cat "math-samples.md" "math-samples-$1.md" > "math-samples-$((1+$1)).md" }
% for i in {1..9}; next $i

(so for example math-samples-10.md is 480 lines long, because math-samples.md is 48 lines long) and timed how long the filter takes:

% for i in {1..10}; do echo $i; time pandoc math-samples-$i.md --filter pandoc-tex2svg -s -t html5 -o math-samples-$i.html; done
1
   5.71s user 0.68s system 105% cpu 6.031 total
2
   11.40s user 1.32s system 107% cpu 11.847 total
3
   17.04s user 1.99s system 107% cpu 17.671 total
4
   22.92s user 2.63s system 107% cpu 23.686 total
5
   28.86s user 3.35s system 108% cpu 29.744 total
6
   34.87s user 4.03s system 108% cpu 35.785 total
7
   41.05s user 4.78s system 108% cpu 42.138 total
8
   46.26s user 5.30s system 109% cpu 47.266 total
9
   52.46s user 6.02s system 109% cpu 53.421 total
10
   58.43s user 6.70s system 109% cpu 59.451 total

Next I wrote a pandoc filter in node (this took most of the time), and indeed it is confusing how to deal with the callbacks, and it needs a change to the pandoc-filter-node library (reported it here: mvhenderson/pandoc-filter-node#7). I found a workaround using the async/await feature which is supported from NodeJS 7.6 or later (released recently in February 2017) (am not enough of a JavaScript programmer to figure out how to make it work in earlier versions), and came up with this filter: https://gist.github.com/shreevatsa/170a1a8f217b20d86b5836e5e4821021

With this, the times are as follows:

% for i in {1..10}; do echo $i; time pandoc math-samples-$i.md --filter ../pandoc-mathjax-svg-filter.js -s -t html5 -o math-samples-$i.html; done
1
  0.83s user 0.09s system 99% cpu 0.922 total
2
  0.95s user 0.09s system 108% cpu 0.955 total
3
  1.12s user 0.09s system 112% cpu 1.075 total
4
  1.25s user 0.10s system 114% cpu 1.179 total
5
  1.38s user 0.10s system 115% cpu 1.278 total
6
  1.44s user 0.10s system 114% cpu 1.340 total
7
  1.57s user 0.10s system 115% cpu 1.447 total
8
  1.76s user 0.11s system 117% cpu 1.589 total
9
  1.85s user 0.11s system 117% cpu 1.662 total
10
  1.93s user 0.11s system 117% cpu 1.745 total

So speed-wise, this seems reasonable. The filter may need some more work to be production-quality (for example it doesn't report errors), but it seems good enough for me for now!

@jgm
Copy link
Owner

jgm commented Nov 10, 2017 via email

@shreevatsa
Copy link

@jgm Sure, published here: https://github.com/shreevatsa/pandoc-mathjax-filter

It still needs to be made npm-installable and robust and all that, but hopefully I will either figure it out or someone who knows such things will help :-)

@mb21
Copy link
Collaborator

mb21 commented Nov 10, 2017

I've added the link to the wiki, where there was already https://github.com/lierdakil/mathjax-pandoc-filter

@shreevatsa
Copy link

Haha oops… written nearly 3 years ago, and even the code looks somewhat similar. There is even a package (two actually) on npm. I thought I had thoroughly searched before getting here, but I must have been searching for the wrong thing. Anyway, I tried it and couldn't quite get it to work, so have added a note to that one as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants