Multiline math #717

mpacer · 2017-12-06T18:09:00Z

This is a modification of the logic #715 (now #716) that takes the approach I suggested in #715 (comment).

In particular it uses the MathBlockLexer as a way to pass the multiline sections of math blocks through to the MathInlineLexer, which will do the actual parsing.

I don't want this to be merged until we can figure out how to simplify the test examples as they are making a set of already hard-to-read tests even more hard-to-read.

I also really want to know why our previous test with a LaTeX environment with a * in its name (the first instance of the test) didn't work to catch this.
Specifically, that test case is:

nbconvert/nbconvert/filters/tests/test_markdown.py

Lines 124 to 128 in 3e203ce

    
           "\\begin{equation*}\n" + 
        
           ("\\left( \\sum_{k=1}^n a_k b_k \\right)^2 " 
        
            "\\leq \\left( \\sum_{k=1}^n a_k^2 \\right) " 
        
            "\\left( \\sum_{k=1}^n b_k^2 \\right)\n") + 
        
           "\\end{equation*}"),

The block lexer/parser was splitting equations like this $$ x = 2 $$ So the inline lexer/parser was never seeing the whole equation, and it wasn't getting properly rendered. This fixes such breaking by adding a block-level lexer/parser to the LaTeX equations written as either $$...$$ or \\[...\\] The inline "block math" parsing code was kept as is, since the above equation could have been part of a paragraph like "$$x = 2$$" to keep the compatibility with Jupyter Notebook rendering engine (and because there's a test enforcing that behavior)

takluyver · 2017-12-06T18:28:09Z

nbconvert/filters/markdown_mistune.py

+    identify math content spanning multiple lines. These are used by the 
+    MathBlockLexer.
+    """
+    multi_math_str = "|".join([r"(^\$\$.*?\$\$)",


Do you know about the re.VERBOSE option? It lets you split a regex over several lines and have comments, without needing to assemble the string yourself.

Could you give an example in this case?

Because I genuinely prefer reading individual regexes joined with a pipe to indicate alternations to one long regex that is individually difficult to parse.

Something like this:

multiline_math = re.compile(r"""(^\$\$.*?\$\$)| (^\\\\\[.*?\\\\\])| (^\\begin\{([a-z]*\*?)\}(.*?)\\end\{\4\})""", re.DOTALL | re.VERBOSE)

Personally I prefer that to the string-joining approach, but you're the one writing it, so you get to decide on code style.

Interesting, in this case I find them equally easy to read. it's more in cases like this in mistune https://github.com/lepture/mistune/blob/92b7f32664bad1a4b3740ee81eda47e5246e780f/mistune.py#L120-L134 where individual chunks need to split multiple lines, and accordingly detecting that an alternation is happening is a lot harder.

Since I find these to be equally readable, I'm going to go with your style in this case. But I also figured out a way to simplify it further (since we don't really need the groups for grabbing the content, as you pointed out).

can I ask your rationale for disliking the string joining approach? I'm just curious as to why it would be dispreferred so that I can update my preferences accordingly.

Ok… actually I'm going to keep using the join method, but with simpler component strings.

I ended up not finding the pipe's use for alternations and the bitwise OR to be equally readable as not needing the second flag. I had skimmed over that detail at a first glance.

takluyver · 2017-12-06T18:31:07Z

nbconvert/filters/markdown_mistune.py

        super(MarkdownWithMath, self).__init__(renderer, **kwargs)

+
+    def output_multiline_math(self):
+        return self.inline(self.token["text"])


Does this mean we're parsing it again with the inline parser? Can we skip that, given that we just want to put it in the output unmodified?

We're parsing it semantically for the first time in the inline parser. We actually aren't putting it in the output unmodified, because we need to escape the individual parts of it for the purposes of putting > and < in html. That occurs in the rendering step.

Gotcha, thanks.

takluyver · 2017-12-06T18:48:21Z

nbconvert/filters/markdown_mistune.py

+        """Add token to pass through mutiline math."""
+        self.tokens.append({
+            "type": "multiline_math",
+            "text": m.group(1) or m.group(2) or m.group(3)


Since the three groups are each the whole of their alternative, this is equivalent to m.group(0), right?

Nice! Much cleaner, grazie.

@takluyver

thx @takluyver!

takluyver · 2017-12-07T10:44:14Z

nbconvert/filters/markdown_mistune.py

+        """Add token to pass through mutiline math."""
+        self.tokens.append({
+            "type": "multiline_math",
+            "text": m.string


I think this should be m.group(0), which is the full match, not m.string, which is the string it tried to match against.

3│ re.match(r'\d+-', '12-AB').string 3> '12-AB' 4│ re.match(r'\d+-', '12-AB').group(0) 4> '12-'

I haven't checked how mistune works, but m.string could be wrong, whereas m.group(0) can't be. So let's go with the more robust option.

mpacer · 2017-12-08T21:03:14Z

Ok simplified the tests, escaped them and fixed the m.string thing.

takluyver · 2017-12-09T12:16:13Z

nbconvert/filters/tests/test_markdown.py

@@ -137,6 +137,18 @@ def test_markdown2html_math(self):
            "$$a<b&b<lt$$",
            "$$a<b&lt;b>a;a-b<0$$",
            "$$<k'>$$",
+            ("$$x\\n"


Shouldn't these be actual newlines, rather than making a string with a backslash followed by a lowercase n? You don't type \n in markdown, as far as I'm aware. What am I missing?

mpacer · 2017-12-09T17:31:23Z

These are encoded as strings now, not raw strings. Additionally they are implicitly joined, not in triple quotes. If you look at the other examples, except for the last example, this style is more in keeping with the rest of our tests. This should be functionally identical in practice to the raw triple quoted string with literal new lines. Honestly, I was doing it to keep the style consistent with what was already there. There were reasons that I originally went along with that style long ago, because of some 3rd party regex tools (e.g., regex101.com). I'd need to reevaluate the state of the art to determine if those issues were still constraints & applicable. If you'd prefer them to be multi line we should make all the old multi-line strings into literal multi-line strings and make that our uniform style for those tests.

…

On Sat, Dec 9, 2017 at 04:16 Thomas Kluyver ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In nbconvert/filters/tests/test_markdown.py <#717 (comment)>: > @@ -137,6 +137,18 @@ def test_markdown2html_math(self): "$$a<b&b<lt$$", "$$a<b<b>a;a-b<0$$", "$$<k'>$$", + ("$$x\\n" Shouldn't these be actual newlines, rather than making a string with a backslash followed by a lowercase n? You don't type \n in markdown, as far as I'm aware. What am I missing? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#717 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACXg6Gxl6a3zljwBMeSJ2Z0pj5YD-nZEks5s-noOgaJpZM4Q4Wrv> .

takluyver · 2017-12-09T18:53:13Z

I'm happy with implicitly joined strings like this. But I think they should be e.g. $$x\n rather than $$x\\n. The former makes a string which ends with a newline (because Python's parser interprets \n as such), whereas the latter makes a string which ends with the characters \n, because the backslash is escaped.

See e.g. line 129 in the same file for an example of what I mean.

mpacer · 2017-12-10T00:07:03Z

Oops! That's what I get for "s/\/\\/g"ing. I will fix it in a bit.

mpacer · 2017-12-11T22:32:21Z

Ok… force pushed onto the last bit since this conceptually seemed like part of the immediately previous commit.

danilobellini and others added 3 commits December 4, 2017 03:36

use the multiline math block lexer as a pass through to the inlinelexer

542f6de

add docstrings to explain the logic of the classes

2ab8b3f

takluyver reviewed Dec 6, 2017

View reviewed changes

mpacer added 2 commits December 6, 2017 10:56

Use m.group(0) as it captures the entirety of the main captured group

d0105e7

thx @takluyver!

replace m.group(0) with m.string, arrange associated Grammars and Lexers

6a2fd6b

takluyver reviewed Dec 7, 2017

View reviewed changes

mpacer force-pushed the multiline_math branch from 1bc9118 to 1e70e7a Compare December 8, 2017 21:03

takluyver reviewed Dec 9, 2017

View reviewed changes

return m.string to m.group(0); simplify and escape new tests

773b403

mpacer force-pushed the multiline_math branch from 1e70e7a to 773b403 Compare December 11, 2017 22:24

takluyver merged commit f849836 into jupyter:master Dec 12, 2017

takluyver mentioned this pull request Dec 12, 2017

Add Markdown block lexer/parser for LaTeX blocks #716

Merged

mpacer added this to the 5.4 milestone Feb 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiline math #717

Multiline math #717

mpacer commented Dec 6, 2017

takluyver Dec 6, 2017

mpacer Dec 6, 2017

takluyver Dec 6, 2017

mpacer Dec 6, 2017

mpacer Dec 7, 2017

mpacer Dec 7, 2017 •

edited

mpacer Dec 7, 2017

takluyver Dec 6, 2017

mpacer Dec 6, 2017 •

edited

takluyver Dec 6, 2017

takluyver Dec 6, 2017

mpacer Dec 6, 2017

takluyver Dec 7, 2017

mpacer commented Dec 8, 2017

takluyver Dec 9, 2017

mpacer commented Dec 9, 2017 via email

takluyver commented Dec 9, 2017

mpacer commented Dec 10, 2017 •

edited

mpacer commented Dec 11, 2017

	"\\begin{equation*}\n" +
	("\\left( \\sum_{k=1}^n a_k b_k \\right)^2 "
	"\\leq \\left( \\sum_{k=1}^n a_k^2 \\right) "
	"\\left( \\sum_{k=1}^n b_k^2 \\right)\n") +
	"\\end{equation*}"),

Multiline math #717

Multiline math #717

Conversation

mpacer commented Dec 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpacer Dec 7, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpacer Dec 6, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpacer commented Dec 8, 2017

Choose a reason for hiding this comment

mpacer commented Dec 9, 2017 via email

takluyver commented Dec 9, 2017

mpacer commented Dec 10, 2017 • edited

mpacer commented Dec 11, 2017

mpacer Dec 7, 2017 •

edited

mpacer Dec 6, 2017 •

edited

mpacer commented Dec 10, 2017 •

edited