New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline math #717

Merged
merged 6 commits into from Dec 12, 2017

Conversation

Projects
None yet
3 participants
@mpacer
Copy link
Member

mpacer commented Dec 6, 2017

This is a modification of the logic #715 (now #716) that takes the approach I suggested in #715 (comment).

In particular it uses the MathBlockLexer as a way to pass the multiline sections of math blocks through to the MathInlineLexer, which will do the actual parsing.

I don't want this to be merged until we can figure out how to simplify the test examples as they are making a set of already hard-to-read tests even more hard-to-read.

I also really want to know why our previous test with a LaTeX environment with a * in its name (the first instance of the test) didn't work to catch this.
Specifically, that test case is:

"\\begin{equation*}\n" +
("\\left( \\sum_{k=1}^n a_k b_k \\right)^2 "
"\\leq \\left( \\sum_{k=1}^n a_k^2 \\right) "
"\\left( \\sum_{k=1}^n b_k^2 \\right)\n") +
"\\end{equation*}"),

danilobellini and others added some commits Dec 4, 2017

Add Markdown block lexer/parser for LaTeX blocks
The block lexer/parser was splitting equations like this

$$
x
=
2
$$

So the inline lexer/parser was never seeing the whole equation, and
it wasn't getting properly rendered. This fixes such breaking by
adding a block-level lexer/parser to the LaTeX equations written as
either $$...$$ or \\[...\\]

The inline "block math" parsing code was kept as is, since the above
equation could have been part of a paragraph like "$$x = 2$$" to keep
the compatibility with Jupyter Notebook rendering engine (and because
there's a test enforcing that behavior)
identify math content spanning multiple lines. These are used by the
MathBlockLexer.
"""
multi_math_str = "|".join([r"(^\$\$.*?\$\$)",

This comment has been minimized.

@takluyver

takluyver Dec 6, 2017

Member

Do you know about the re.VERBOSE option? It lets you split a regex over several lines and have comments, without needing to assemble the string yourself.

This comment has been minimized.

@mpacer

mpacer Dec 6, 2017

Author Member

Could you give an example in this case?

Because I genuinely prefer reading individual regexes joined with a pipe to indicate alternations to one long regex that is individually difficult to parse.

This comment has been minimized.

@takluyver

takluyver Dec 6, 2017

Member

Something like this:

multiline_math = re.compile(r"""(^\$\$.*?\$\$)|
                                (^\\\\\[.*?\\\\\])|
                                (^\\begin\{([a-z]*\*?)\}(.*?)\\end\{\4\})""",
                            re.DOTALL | re.VERBOSE)

Personally I prefer that to the string-joining approach, but you're the one writing it, so you get to decide on code style.

This comment has been minimized.

@mpacer

mpacer Dec 6, 2017

Author Member

Interesting, in this case I find them equally easy to read. it's more in cases like this in mistune https://github.com/lepture/mistune/blob/92b7f32664bad1a4b3740ee81eda47e5246e780f/mistune.py#L120-L134 where individual chunks need to split multiple lines, and accordingly detecting that an alternation is happening is a lot harder.

This comment has been minimized.

@mpacer

mpacer Dec 7, 2017

Author Member

Since I find these to be equally readable, I'm going to go with your style in this case. But I also figured out a way to simplify it further (since we don't really need the groups for grabbing the content, as you pointed out).

This comment has been minimized.

@mpacer

mpacer Dec 7, 2017

Author Member

can I ask your rationale for disliking the string joining approach? I'm just curious as to why it would be dispreferred so that I can update my preferences accordingly.

This comment has been minimized.

@mpacer

mpacer Dec 7, 2017

Author Member

Ok… actually I'm going to keep using the join method, but with simpler component strings.

I ended up not finding the pipe's use for alternations and the bitwise OR to be equally readable as not needing the second flag. I had skimmed over that detail at a first glance.

super(MarkdownWithMath, self).__init__(renderer, **kwargs)


def output_multiline_math(self):
return self.inline(self.token["text"])

This comment has been minimized.

@takluyver

takluyver Dec 6, 2017

Member

Does this mean we're parsing it again with the inline parser? Can we skip that, given that we just want to put it in the output unmodified?

This comment has been minimized.

@mpacer

mpacer Dec 6, 2017

Author Member

We're parsing it semantically for the first time in the inline parser. We actually aren't putting it in the output unmodified, because we need to escape the individual parts of it for the purposes of putting > and < in html. That occurs in the rendering step.

This comment has been minimized.

@takluyver

takluyver Dec 6, 2017

Member

Gotcha, thanks.

"""Add token to pass through mutiline math."""
self.tokens.append({
"type": "multiline_math",
"text": m.group(1) or m.group(2) or m.group(3)

This comment has been minimized.

@takluyver

takluyver Dec 6, 2017

Member

Since the three groups are each the whole of their alternative, this is equivalent to m.group(0), right?

This comment has been minimized.

@mpacer

mpacer Dec 6, 2017

Author Member

Nice! Much cleaner, grazie.

"""Add token to pass through mutiline math."""
self.tokens.append({
"type": "multiline_math",
"text": m.string

This comment has been minimized.

@takluyver

takluyver Dec 7, 2017

Member

I think this should be m.group(0), which is the full match, not m.string, which is the string it tried to match against.

 3│ re.match(r'\d+-', '12-AB').string
 3> '12-AB'

 4│ re.match(r'\d+-', '12-AB').group(0)
 4> '12-'

I haven't checked how mistune works, but m.string could be wrong, whereas m.group(0) can't be. So let's go with the more robust option.

@mpacer mpacer force-pushed the mpacer:multiline_math branch from 1bc9118 to 1e70e7a Dec 8, 2017

@mpacer

This comment has been minimized.

Copy link
Member Author

mpacer commented Dec 8, 2017

Ok simplified the tests, escaped them and fixed the m.string thing.

@@ -137,6 +137,18 @@ def test_markdown2html_math(self):
"$$a<b&b<lt$$",
"$$a<b&lt;b>a;a-b<0$$",
"$$<k'>$$",
("$$x\\n"

This comment has been minimized.

@takluyver

takluyver Dec 9, 2017

Member

Shouldn't these be actual newlines, rather than making a string with a backslash followed by a lowercase n? You don't type \n in markdown, as far as I'm aware. What am I missing?

@mpacer

This comment has been minimized.

Copy link
Member Author

mpacer commented Dec 9, 2017

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Dec 9, 2017

I'm happy with implicitly joined strings like this. But I think they should be e.g. $$x\n rather than $$x\\n. The former makes a string which ends with a newline (because Python's parser interprets \n as such), whereas the latter makes a string which ends with the characters \n, because the backslash is escaped.

See e.g. line 129 in the same file for an example of what I mean.

@mpacer

This comment has been minimized.

Copy link
Member Author

mpacer commented Dec 10, 2017

Oops! That's what I get for "s/\/\\/g"ing. I will fix it in a bit.

@mpacer mpacer force-pushed the mpacer:multiline_math branch from 1e70e7a to 773b403 Dec 11, 2017

@mpacer

This comment has been minimized.

Copy link
Member Author

mpacer commented Dec 11, 2017

Ok… force pushed onto the last bit since this conceptually seemed like part of the immediately previous commit.

@takluyver takluyver merged commit f849836 into jupyter:master Dec 12, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@mpacer mpacer added this to the 5.4 milestone Feb 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment