Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue #980: incorrect LaTeX capturing and MathJax rendering #2220

Merged
merged 8 commits into from
Mar 9, 2017

Conversation

joined
Copy link
Contributor

@joined joined commented Feb 23, 2017

I wrote a bugfix for issue #980.

Background

The LaTeX code that needs to be rendered inline in the text cells is specified using delimiters. In particular, the following delimiters are supported:

  1. $
  2. $$
  3. \begin and \end
  4. \( and \)

(A note on the last one. It's necessary to write \\( instead of \( because in the latter case the backslash gets interpreted by Markdown as an escaper for the parenthesis.)

There is a mechanism in place in order for the LaTeX code not to get interpreted as Markdown. It works as follows:

  • The user enters the Markdown with inline LaTeX in the text cell, for example *abc* $x_1 = 1, x_2 = 2$ _def_, and clicks execute cell.
  • The function remove_math in notebook/static/notebook/js/mathjaxutils.js is used to extract the LaTeX groups from the text, put them in a separate array, and replace the groups in the text with placeholders. In this case, for example, the function will return ['*abc* @@0@@ _def_', ['$x_1 = 1, x_2 = 2$']].
  • The text gets rendered as Markdown.
  • The placeholders get replaced with the LaTeX groups that were backed up.
  • The rendered Markdown is processed by MathJax to render the LaTeX.

This procedure is necessary since, otherwise, the underscores in the LaTeX group would be interpreted as italic delimiters, in this example.

The core of the problem is that the function remove_math extracts from the text only the groups delimited by 1, 2 and 3, but not 4.

It's easy in fact to see that in the third line in the example in the issue the underscores are interpreted as italic delimiters by Markdown.

Fix

The changes are made on the notebook/static/notebook/js/mathjaxutils.js file.

The remove_math function contains some logic to identify the LaTeX blocks and extract them. The first step in this procedure is to split the text on all the possible group delimiters. This is done using the MATHSPLIT regular expression defined on line 62. As the comments in the code say, it's a bit "magical" in the sense that its workings are not crystal clear.

The \\( and \\) delimiters were missing from the regular expression, so I added them appending \\\\(?:\(|\)) to it. Moreover, since the regular expression was already matching the text \\ as a delimiter, I had to remove it (otherwise the block \\( would not be grouped together and we would not be able to identify it as a group delimiter). It is not clear to me the purpose of splitting on the \\s since we're only looking for LaTeX group delimiters and they are not. I suspect it a result of a copy & paste from somewhere else, since the comments cite different sources for the code.

After being split, the text is processed by running over each of the blocks and looking for start and end LaTeX delimiters. On line 181 I added the missing logic to handle the case in which the text \\(is the start delimiter and the text \\) is the end delimiter.

The last change is in the line 208. It is necessary because since the LaTeX code is extracted, backed up, and reinserted in the text after the Markdown is rendered, the \\( that was necessary for Markdown is not interpreted by it (resulting in \(), so we have to manually replace the instances of \\( and \\) to \( and \) respectively, which are the delimiters that are recognized by MathJax.

Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally against the test cases in #980, looks good!

@takluyver
Copy link
Member

@michaelpacer could you have a look at this?

  • Have we ever explicitly aimed to support \( and \) as math delimiters? I don't recall that, and I don't think it's implemented in nbconvert to HTML.
  • If we should fix this, should we try to get it in for 5.0?

@Carreau Carreau requested a review from mpacer February 27, 2017 18:34
@mpacer
Copy link
Member

mpacer commented Mar 1, 2017

Hi all, a few thoughts (took a couple days to get my bearings on this one as there are a few related issues with which I was getting it confused). My unraveling this confusion led to these different points.

  1. this seems (based on the exposition…I've not tried testing this yet) to address an issue I opened ages ago: Bug in formatting mathjax/LaTeX inconsistently depending on chosen delimiters, markdown interaction #759. If that is the scope of that, I think it should be fine.

  2. Technically the use of \\( …\\) inside the notebook isn't "escaping" the \ so that it can be interpreted by the underlying javascript, but it is specifically the syntax we use in contrast to traditional \( \). I think we made an explicit decision to not support those as delimiters (see next point). I don't think that this is actually doing that (based on the number of \s that are present in the various instances of the commit). That said, that regular expression is a beast to read and not test, so I'll need a little time to error test it thoroughly.

2.1. This should probably have some accompanying tests to check for the previously aberrant behaviour (in #759 and #980), but that assumes that we're still expanding our test suite. In other cases (e.g., #2048) it was decided that tests weren't required. This seems like a particularly precarious case (as it's a "bugfix" rather than a "feature request") and so if there is any planned further expansion of our test suite, then this would merit inclusion.

  1. If this is allowing \( \) as delimiters (nb: I don't think it is), can someone explain why this is ok but other PRs that seemed to try to do something similar weren't (e.g., Add \[ \] and \( \) as maths delimiters #1935 & Incorrect conversion of equations with \[ \] delimiters nbconvert#477 )? I'm not opposed to including it in theory, but it strictly speaking is a departure from the the commonmark approach as quoted previously by @rgbkrk:

the problem with that syntax is that \[ and \( already have clearly defined meanings: they are escaped [ and ( characters, and it is very important that there be a way to escape these characters in certain circumstances

I think this observation is unrelated to this PR beyond from clarifying a point about when certain syntaxes are valid or invalid. However, if we wanted to allow \[ after two new lines (i.e., \n\n\\\[, a \[ with only a blank above it…) which I believe is currently needed for the \begin{ syntax to be detected, that could be fine. If you wanted to get back the \[ escapement, then you just include a space in the previous line, nbd. But, I'm hard pressed to think of a way to allow \( and the escapement of ( at the same time (since it is intended to be "inline" mathematics no whitespace can be used to distinguish it cleanly).

@mpacer
Copy link
Member

mpacer commented Mar 1, 2017

Simpler comment: this should also cover \[ and \] (by indirectly including \\[ and \\] in the parsing code in places that are equivalent to the current \\( \\)).

@@ -178,6 +178,11 @@ define([
end = block;
braces = 0;
}
else if (block === "\\\\\(") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modification should also cover the case where block === "\\\\\[" and end = "\\\\\]".

@@ -200,7 +205,9 @@ define([
//
var replace_math = function (text, math) {
text = text.replace(/@@(\d+)@@/g, function (match, n) {
return math[n];
return math[n]
.replace("\\\\\(", "\\\(")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe (following this pattern) this should also cover .replace("\\\\\[", "\\\[") and .replace("\\\\\]", "\\\]").

However, I have a code smell around this pattern of sequentially applied string replacement…

Shouldn't it be sufficient to exclusively looking for these as beginning and ending elements of the string passed into the second function call (since there shouldn't be more than instance of the each to every call to the anonymous function with the match, n as an signature). Are there cases where this wouldn't be the prefix and the suffix that are passed into the function that provides the the second part of the text return value?

I should mention, that I would prefer – given that no this=that referencing needs to occur inside these anonymous functions – that the second anonymous function were separately, defined with a proper name. Then the second part of the assignment to the text variable would be directly call that function. This would be a lot easier to read.

@rgbkrk
Copy link
Member

rgbkrk commented Mar 1, 2017

@joined - thank you for taking the time to open a PR and write out a very complete description of the problem.

Thank you @michaelpacer for the incredible amount of detail and care you're putting into standardization on the format. Happy you're handling that. I can't speak much to the changes here today, so I'll have to come back to this. I can't be a blocker though, I live it in yours and @blink1073's hands.

@takluyver
Copy link
Member

takluyver commented Mar 1, 2017

If this is allowing \( \) as delimiters (nb: I don't think it is), can someone explain why this is ok but other PRs that seemed to try to do something similar weren't

@rgbkrk's comment that you quoted makes sense to me (I don't remember the earlier PRs on the same subject), and thus I don't think that we should support \( and \) as math delimiters. I don't recall us ever indicating that they were supported delimiters, and we don't try to support all Latex syntax.

@blink1073
Copy link
Member

@takluyver, sounds fair to me.

@joined
Copy link
Contributor Author

joined commented Mar 1, 2017

I incorporated your feedback, I hope I got it right.

Thanks for clarifying the confusion with \\( and \(, I now realise better why the first one was chosen as the delimiter.

And since you are discussing about it, no, I never attempted to add the support to \( as math delimiter.

@mpacer
Copy link
Member

mpacer commented Mar 1, 2017

@takluyver are you ok merging this without tests like before?

If so this looks good to me, though a comment explaining the last replace call & the new function would be useful for future debugging. Not a blocker though.

@mpacer
Copy link
Member

mpacer commented Mar 1, 2017

This goes beyond just fixing #980, and so is in no way blocking…

@joined can you explain why $ is given a special value as inline? I can't see any reason, but if there is, is that something that should also apply to \\\\\( & \\\\\) since those are also inline math delimiters? If there's no reason, would you just replace the use of that variable to instead be a hardcoded string like the rest of the delimiters?

@mpacer
Copy link
Member

mpacer commented Mar 1, 2017

K nvm, I just made that change.

It's small enough of a difference that, if the tests pass, I'll merge, as it would seem to leave everything else untouched and that LGTM.

Copy link
Member

@ivanov ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised this in #2048, and I want to reiterate that I think it is a mistake to add new functionality and provide fixes (especially around tricky issues) without tests. This is not conducive to writing robust software.

@ivanov
Copy link
Member

ivanov commented Mar 2, 2017

I want to add that I am grateful for the work everyone has put in here, thank you @joined for providing this fix, and for everyone else (especially @michaelpacer) for the reviews. So I'm not trying to pick on you, I just don't know how else to push back against what I think is a bad policy of merging features or fixes which do not have tests.

@mpacer
Copy link
Member

mpacer commented Mar 2, 2017

@ivanov I understand your concerns — would you be able to put some cycles into fixing the js testing suite? That could include us pair coding our way through it since you're in the area. I'm not currently equipped to tackle the general notebook js testing problem in an efficient manner (and don't have the free cycles to teach myself about js testing from scratch). However, I think that I could pick up a lot once I got some intellectual momentum by working with someone who knows the ins and outs of the problem space.

@ivanov
Copy link
Member

ivanov commented Mar 2, 2017

As far as I know, the JS test suite mostly works. It breaks because either some tests or some of the utility functions that tests uses end up having race conditions. I've created the Flaky JavaScript tests omnibus ticket #2243 where we can note down tests which occasionally fail (with any relevant details) so we can address this when it comes up.

You can read about writing and running JS tests for the notebook here.

The starting point for this specific PR would probably be to look at and add to notebook/tests/notebook/markdown.js

I'm happy to meet with you @michaelpacer, but I hope you can start at the two links above so you can figure out what's missing and improve the docs and have a place to direct others for how to write and run these tests.

@joined
Copy link
Contributor Author

joined commented Mar 2, 2017

So I looked into the testing for the rendered LaTeX code, it seems quite messy.

For example rendering a cell with the content $ a*b $, $ c*d $ results in the following:

<p><span class="MathJax_Preview" style="color: inherit;"></span><span class="MathJax" id="MathJax-Element-7-Frame" tabindex="0" data-mathml="<math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;><mi>a</mi><mo>&amp;#x2217;</mo><mi>b</mi></math>" role="presentation" style="position: relative;"><nobr aria-hidden="true"><span class="math" id="MathJax-Span-40" role="math" style="width: 2.443em; display: inline-block;"><span style="display: inline-block; position: relative; width: 2.027em; height: 0px; font-size: 120%;"><span style="position: absolute; clip: rect(1.67em 1002.03em 2.682em -999.997em); top: -2.497em; left: 0.003em;"><span class="mrow" id="MathJax-Span-41"><span class="mi" id="MathJax-Span-42" style="font-family: STIXMathJax_Main-italic;">a</span><span class="mo" id="MathJax-Span-43" style="font-family: STIXMathJax_Main; padding-left: 0.241em;"></span><span class="mi" id="MathJax-Span-44" style="font-family: STIXMathJax_Main-italic; padding-left: 0.241em;">b</span></span><span style="display: inline-block; width: 0px; height: 2.503em;"></span></span>
    </span><span style="display: inline-block; overflow: hidden; vertical-align: -0.068em; border-left: 0px solid; width: 0px; height: 1.004em;"></span></span>
    </nobr><span class="MJX_Assistive_MathML" role="presentation"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi><mo></mo><mi>b</mi></math></span></span>
    <script type="math/tex" id="MathJax-Element-7"> a*b </script>, <span class="MathJax_Preview" style="color: inherit;"></span><span class="MathJax" id="MathJax-Element-8-Frame" tabindex="0" data-mathml="<math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;><mi>c</mi><mo>&amp;#x2217;</mo><mi>d</mi></math>" role="presentation" style="position: relative;"><nobr aria-hidden="true"><span class="math" id="MathJax-Span-45" role="math" style="width: 2.384em; display: inline-block;"><span style="display: inline-block; position: relative; width: 1.967em; height: 0px; font-size: 120%;"><span style="position: absolute; clip: rect(1.67em 1001.97em 2.682em -999.997em); top: -2.497em; left: 0.003em;"><span class="mrow" id="MathJax-Span-46"><span class="mi" id="MathJax-Span-47" style="font-family: STIXMathJax_Main-italic;">c</span><span class="mo" id="MathJax-Span-48" style="font-family: STIXMathJax_Main; padding-left: 0.241em;"></span><span class="mi" id="MathJax-Span-49" style="font-family: STIXMathJax_Main-italic; padding-left: 0.241em;">d<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.003em;"></span></span>
    </span><span style="display: inline-block; width: 0px; height: 2.503em;"></span></span>
    </span><span style="display: inline-block; overflow: hidden; vertical-align: -0.068em; border-left: 0px solid; width: 0px; height: 1.004em;"></span></span>
    </nobr><span class="MJX_Assistive_MathML" role="presentation"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi><mo></mo><mi>d</mi></math></span></span>
    <script type="math/tex" id="MathJax-Element-8"> c*d </script>
</p>

I see many things that could go wrong when making an assertion on the rendered HTML output. For example MathJax attributes incremental ids to each rendered LaTeX group, and the numbering is global to the notebook.

It seems also that the inline CSS that MathJax output could change basing on the enviroment making the test fail.

Any ideas?

@ivanov ivanov changed the title Fixed issue #980 Fixed issue #980: incorrect inline Latex rendering Mar 2, 2017
@ivanov
Copy link
Member

ivanov commented Mar 2, 2017

I agree that trying to assert the whole MathJax generated DOM subtree would be difficult. What about drilling down into parts of it? Or asserting how many mo and mi elements there are in a given test case? (I also just updated the title of the PR to be more descriptive)

@joined
Copy link
Contributor Author

joined commented Mar 4, 2017

So I wrote the following code in notebook/markdown.js to test the LaTeX rendering with the $ delimiter, by looking at the number of mi and mo tags as suggested by @ivanov:

// Test LaTeX rendering with delimiter $
output = this.evaluate(function () {
    IPython.notebook.to_markdown();
    var cell = IPython.notebook.get_selected_cell();
    cell.set_text('$ a*b $, $ c*d $');
    cell.render();

    var mi = cell.element[0].getElementsByTagName('mi'),
        mo = cell.element[0].getElementsByTagName('mo');

    return {
        'n_mi': mi.length > 0 ? mi.length : 0,
        'n_mo': mo.length > 0 ? mo.length : 0
    };
});
this.test.assertEquals(output, {'n_mi': 4, 'n_mo': 2}, 'LaTeX rendering with delimiter $ works');

Unfortunately the test fails with the following output:

$ python -m notebook.jstest notebook/markdown.js
[...]
FAIL LaTeX rendering with delimiter $ works
#    type: assertEquals
#    file: /Users/joined/Desktop/Software Architecture/notebook/notebook/tests/notebook/markdown.js
#    subject: {"n_mi":0,"n_mo":0}
#    expected: {"n_mi":4,"n_mo":2}
[...]

By checking the HTML contents of the rendered cell as seen by CasperJS with console.log this is what I get:

<div class="text_cell_render rendered_html" tabindex="-1"><p>$ a*b $, $ c*d $</p>\n</div>

So it looks like MathJax doesn't render the cell, even if calling cell.render() should trigger it.

Any ideas?


Update

Well, I now realise that the problem is that MathJax rendering is asynchronous. I guess it's necessary to trigger an event when the MathJax rendering is completed (there doesn't seem to be already any event triggered) and then use CasperJS's waitFor to wait for that event to be triggered before evaluating the contents of the rendered cell.

It seems that the way to go to trigger an event after the cell has rendered is to push the trigger function to the MathJax Queue just after having queued the rendering.

Maybe here instead of passing only the element I can pass the entire this object to the typeset function and then instead of triggering rendered.MarkdownCell from here I can change the typeset function to do it, like so

var typeset = function(cell) {
    var $el = cell.element.jquery ? cell.element : $(cell.element);

    if(!window.MathJax){
        return;
    }
    $el.each(function(){
        // MathJax takes a DOM node: $.each makes `this` the context
        MathJax.Hub.Queue(["Typeset", MathJax.Hub, this]);
        MathJax.Hub.Queue(cell.events.trigger("rendered.MarkdownCell", {'cell': cell}));
    });
};

(I don't see the reason for returning the MathJax.Hub.Queue objects, the return value doesn't seem to be used anywhere. I also don't see the reason for the second text argument of the function, because it's never called with a second parameter)

Thoughts? To be honest I feel like this is getting a bit bigger than expected and I think this kind of changes are better made by someone that has worked before with the codebase.

@ivanov
Copy link
Member

ivanov commented Mar 5, 2017

Thank you a bunch for looking more into this, @joined, this will help us make progress. I met with @michaelpacer to get him up to speed on the javascript testing stuff earlier this week. We'll get there.

@mpacer mpacer changed the title Fixed issue #980: incorrect inline Latex rendering Fixed issue #980: incorrect LaTeX capturing and MathJax rendering Mar 5, 2017
@gnestor gnestor added this to the 5.0 milestone Mar 7, 2017
@gnestor
Copy link
Contributor

gnestor commented Mar 7, 2017

Fix on jupyterlab with tests: jupyterlab/jupyterlab#1846

@gnestor
Copy link
Contributor

gnestor commented Mar 7, 2017

@michaelpacer Let me know what I can do to help get this merged!

@minrk
Copy link
Member

minrk commented Mar 8, 2017

Test added and passing for a case confirmed to fail on master. Thanks, @michaelpacer!

@mpacer
Copy link
Member

mpacer commented Mar 8, 2017

Also, this closes #759.

Copy link
Member

@ivanov ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the tests @michaelpacer and @minrk 😄 ! Beautiful.

@gnestor
Copy link
Contributor

gnestor commented Mar 9, 2017

Are we ready to merge!?

@rgbkrk rgbkrk merged commit ead73b9 into jupyter:master Mar 9, 2017
@rgbkrk
Copy link
Member

rgbkrk commented Mar 9, 2017

Thank you @joined, @michaelpacer, and all the reviewers

@mgeier
Copy link
Contributor

mgeier commented Apr 13, 2017

I just happened to notice that now \\( is supported in the notebook as math delimiter, and I think this is a really bad idea!

This goes very much against the spirit of Markdown and is very inconsistent from a Markdown standpoint. A backslash is normally used to avoid something being interpreted as formatting command, here it is used for the exact opposite!

It also increases the already bad fragmentation of Markdown extensions.

What about just not supporting this style of separators?
We have already $ and $$ and \begin{equation}...\end{equation} (and many other math environments) ... why do we need more?

@mpacer
Copy link
Member

mpacer commented Apr 13, 2017

@mgeier many people prefer to use this style of equation formatting command (even if I do not)

Technically, the LaTeX community discourages the use of $$ as it is a TeX command that has different spacing rules than the LaTeX command (in which case the one backslash version \[…\] is the officially preferred option which doesn't apply for the aforementioned reason).

Probably most importantly though, these delimiters have been supported in the notebook for a while now, they are not new. Now, they are being parsed correctly when rendered (making previous issues with subscript rendering vs. italics rendering disappear).

@rgbkrk
Copy link
Member

rgbkrk commented Apr 13, 2017

Of note: nteract only supports the $ and $$ varieties, largely to stick with @jgm recommendations with commonmark and a hopeful future commonmark extension pathway.

@mgeier
Copy link
Contributor

mgeier commented Apr 14, 2017

@mpacer

Probably most importantly though, these delimiters have been supported in the notebook for a while now, they are not new.

Thanks for the information, I wasn't aware of that.

I still think it was an unfortunate decision to support them.

I understand that it may be hard to get rid of them now, but I would still discourage their use in the future.
And probably with some future version of nbformat, they could be automatically converted to their preferred alternative?

the LaTeX community discourages the use of $$

That's fair enough, I can totally live without those. I wouldn't even mention them in the documentation (and I don't mention them when I write documentation).
I highly recommend to use math environments (with \begin{...} and \end{...}) instead.
They can and should be used in "normal" LaTeX documents, they don't clash with Markdown, they are fully supported in Pandoc (which is kind of the reference, isn't it?), and they are much more flexible anyway.

@rgbkrk

... and a hopeful future commonmark extension pathway

Exactly.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants