-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move mj-page out of mathjax-node #206
Comments
[EDIT: disregard this comment -- it was supposed to be on #205] Here's the first draft for a separate module based on svg2png var mjAPI = require("mathjax-node/lib/mj-single.js");
exports.math2png = function(mjoptions, pngoptions, callback){
mjAPI.start();
var svg2png = function(result, options, callback){
if (result.errors) return result.errors;
var svgpng = require('svg2png');
var sourceBuffer = new Buffer(result.svg, "utf-8");
var returnBuffer = svgpng.sync(sourceBuffer, {
width: result.svg.width,
height: result.svg.height
});
result.png = {};
result.png.data = "data:image/png;base64," + returnBuffer.toString('base64');
// maybe have something for scaling / dpi?
return callback(result);
};
// make sure SVG output will be generated
mjoptions.svg = true;
return mjAPI.typeset(mjoptions, function(result){
console.log();
svg2png(result, pngoptions, callback);
});
} |
[EDIT: you can disregard this comment -- it is a response to a misplaced comment regarding #205] A couple of comments:
Also, the The main culprit is the Similarly, the
should do it. Here, if there are errors, the callback is called immediately with the results so fat (including the error indications), otherwise
rather than creating an empty object and then adding a property to it. That is what I see at the moment. |
[EDIT: you can disregard this comment -- it is a response to a misplaced comment regarding #205] Thanks for these. As mentioned, it was a first draft just to accompany the PR. It did not require such a detailed response. |
Hi, I don't understand the MathJax internals well enough to follow all of the above, but it might be useful to document how and why we currently use mj-page.js: We have a production tool that does error checking on large numbers of full text XML files which include LaTeX markup. It's not uncommon for an XML file to have thousands of equations nor is it uncommon for us to process thousands of XML files daily. In this tool we don't care about MathJax rendering at all. We're only interested in finding TeX errors. For each XML file being processed, our tool extracts all of the individual pieces of TeX and creates a single HTML file containing only the equations and no other markup. We then run this HTML through mathjax-node, using your components that are customized for our use. These components are derived as follows:
Obviously we can continue to use our customized mj-page.js in the future, but this thread suggests that there is a better way to accomplish what we're doing. How would we approach the above without mj-page.js? What's the better way? Thanks, |
Thanks for sharing your setup, Fred! First off, I just realized that the preceding discussion is on the wrong issue -- which is totally my fault. The comments 2,3, and 4 are about #205, moving PNG support out of mathjax-node. Sorry for the confusion! 😞 Back to your setup. Your setup is, I think, a good example why I initially proposed to drop mj-page. The question is: why are you using mj-page in the first place? You get individual TeX fragments out of your XML, so mj-single seems more natural to use here. I'm guessing the motivation for going to mj-page might be performance, thinking the overhead of jsdom and MathJax would be large enough to make it faster to do mj-page than loop through the fragments and call mj-single for each one. However, we noticed that that's not actually the case; performance is pretty much identical (both methods can be slightly faster than the other, depending on the context but they're virtually the same). So a key motivation for mj-page from our end is no longer present. Additionally, mj-page can lead to subtle problems. Your bug report is a nice example of this and also for how too many options seem to prevent people from finding them. The fact that we have mj-page leads to other questions/requests, e.g., having an equivalent tool for XML (where jsdom sometimes causes issues), or for markdown files, or having something using alternative DOM-like libraries (e.g., cheerio). Adding solutions for these would increase mathjax-node's maintenance burden considerably. So we think it's better to leave those tools in the hands of the community, while providing ideas and of course help with any questions. As part of this deprecation and our work on MathJax v3.0, we will be providing a modernized pre-processor library that will allow developers to incorporate pre-processing more easily into their workflows. Though in your case, you don't seem to need this.
To recap. I'd expect a) you could keep using something like mj-page or b) you could just loop through your TeX fragments with mj-single and collect the errors. |
I am in the process of writing an in browser ebook reader application for the calibre content server (this allows reading ebooks directly in the browser, loaded from an embedded HTML server running in calibre). I am investigating adding support for MathML by using MathJax (similar to how the calibre desktop ebook viewer renders maths using MathJax). The server already does some pre-processing on individual HTML files, so for me the ability to server side process a HTML page contain math is invaluable -- as this is both more performant, and it means I dont have to patch MathJax itself to change how it loads its resources (I need the in browser reader to work offline). Bottom line, the ability to render math in HTML files server side is valuable to me, so I would like to vote against removing this facility. |
Peter, you're correct that we process a page at a time for performance. But to give you more detail: Our QC tool is written in Java. It creates the page with an article's worth of equations and then runs a single invocation of node to process that file. In other words, node is not running continuously, but is called via command line, once for each article. So, if I have 1000 XML files, that's 1000 calls to node. But if I use single equation processing, and have 1000 XML files, each with 1000 equations, that's 1,000,000 calls to node. But, I take your point and I appreciate your advice. We'll keep this issue on our radar as we move forward. Thanks, |
@kovidgoyal thanks for sharing your use case! Since I've worked on ebook pre-processing quite a bit, I can say that @fptoth to clarify: I'm not suggesting to call |
[Edited the above: "not suggesting", @fptoth.] |
@pkra calibre already has the tools to serialize arbitrary html4/5/xhtml into canonical representations -- parsing is not a problem for me. However, after exploring this some more, I decided to just use mathjax client side, storing it in indexeddb for offline mode. Turns out that the patching mathjax requires for this is minimal. The problem with mathjax-node is the dependency chain is too large to bundle with calibre (since calibre is cross platform, i cannot simply rely on using system libraries). Someday I might decide to revisit this, but most likely in that case I'll end up just porting the parts of mathjax that I need to run in python directly, server side. |
Oh, would be interested in hearing more. Just don't use the HTML-CSS output for that -- it will break easily across browser&OS versions.
Copy that. MathJax v3.0 should make that significantly lighter (ideally: zero). |
This is client side, so why would HTML+CSS output break? What I am doing is exactly the same as adding MathJax to a normal HTML page, the only difference is that MathJax is loaded from indexeddb instead of from CDN. |
Sorry! I hand't read your posting carefully enough. It probably won't break anything if you're re-using in the same browser session (well, I guess changes in the surrounding CSS (text size, font etc) might cause minor issues). But if you're using indexddb, then you can use the CommonHTML output anyway, which is more robust and faster so I'd still use that (and you could then even use that to generate something permanently to send to the server and use across sessions). |
Yeah, probably a good idea to use CommonHTML -- I am only using HTML+CSS to start with as I am more familiar with how it works. Once I have that working, I can always switch it to CommonHTML fairly easily. |
Here's a random question that popped into my mind: what shall we call
|
Here's a simple example (for |
[removed; wrong issue] |
I've started to work on a replacement for mj-page. For now, the code is at https://github.com/pkra/mathjax-node-page. Feedback would be very welcome. |
As per F2F (cf #191), we will move
mj-page
out of mathjax-node; there will be alternative modules to use.The text was updated successfully, but these errors were encountered: