# Fix bug when there is "<" in the math formula #514

Merged
merged 7 commits into from Feb 20, 2017

## Conversation

Projects
None yet
3 participants
Contributor

### zasdfgbnm commented Jan 18, 2017

 The easiest way to trigger the bug is to add a markdown cell and put . When the notebook is converted to html, the "<" is left unescaped, which makes browsers parse this as a html tag
 Fix bug when there is "<" in the math formula 
The easiest way to trigger the bug is to add a markdown cell and put $$<k'>$$.
When the notebook is converted to html, the "<" is left unescaped,
which makes browsers parse this as a html tag
 656f67f 

Open

### takluyver reviewed Jan 18, 2017

 @@ -104,15 +104,20 @@ def header(self, text, level, raw=None): html = super(IPythonRenderer, self).header(text, level, raw=raw) return add_anchor(html) def escape_lt(self,text): return text.replace('<','<')

#### takluyver Jan 18, 2017

Member

Shouldn't we also be translating >? There's a function html.escape() which we might want to use.

#### zasdfgbnm Jan 18, 2017

Author Contributor

I didn't find any bug triggered by ">". But it is a good idea to escape all these special characters.

Member

### takluyver commented Jan 18, 2017

 Could you add a test for this in nbconvert.filters.tests.test_markdown? Thanks!
Contributor Author

### mpacer reviewed Jan 18, 2017

 @@ -104,15 +104,20 @@ def header(self, text, level, raw=None): html = super(IPythonRenderer, self).header(text, level, raw=raw) return add_anchor(html) def escape_lt(self,text): return text.replace('<','<') # Pass math through unaltered - mathjax does the rendering in the browser

#### mpacer Jan 18, 2017

Member

This will no longer be true, we're deciding to follow a different standard than MathJax in this case, as MathJax does not always treat "<" and ">" as characters to be escaped (which leads to the "bugs" that this PR is trying to address).

### zasdfgbnm added some commits Jan 18, 2017

 Use html.escape to escape <, > and & 
Add tests for escape
 abd54ca 
 change escape and unescape for compatibility 
 8864bd6 
 fix assertNotRegex for compatibility of python 2.7 
 6b9b481 
Contributor Author

### zasdfgbnm commented Jan 19, 2017

 It should work now.
Member

### takluyver commented Jan 19, 2017

 Thanks. This looks OK to me now, but I'll let @michaelpacer have another look over it.

### mpacer reviewed Jan 20, 2017

 @@ -171,7 +190,8 @@ def test_markdown2html_math_paragraph(self): ] for case in cases: self.assertIn(case, markdown2html(case)) s = markdown2html(case) self.assertIn(case,self._unescape(s))

#### mpacer Jan 20, 2017

Member

Do we expect markdown2html() anywhere else to be returning unescaped values? Is there any way to systematically detect that expectation (vs. not)?

I ask because it seems a little off to be using a private function in our testing suite just to make tests pass in the way that they were before. It almost seems like we should change the test's expected output (in this case, the case value) rather than to use this _unescape so our tests don't need any special treatment in order to pass. This way we're more explicit about the expected behaviour.

That said, if there's any purpose to having the unescaping function, we should probably surface it in the regular code base and test it separately.

#### takluyver Jan 24, 2017

Member

It's a bit ugly, but it was a bit of an ugly test before - it checks that the Markdown rendering doesn't change these samples. So it only works with samples that don't use any Markdown syntax (other than the math syntax we add). I wouldn't particularly ask @zasdfgbnm to fix this. Obviously we welcome improvements to the test, but that's probably best as a later PR.

#### mpacer Jan 24, 2017

Member

Yes that was the original test before. Currently this transformation does not leave these samples unchanged, which is why the unescaping needs to happen. I get that to make the test pass as it is it's easier to not have to concern yourself with the escaped bits.

Also, if the idea of this test is that the math processing leaves markdown unaffected, we should probably include some <, > and & in the markdown section and not automatically unescape them (because the math processing is leaking out into the markdown processing).

Also, this test should include a

\begin{}
…
\end{}


(i.e., including the new lines around the declaration) since, not being wrapped in any number of $, there is an absence of the easiest cues to delimiting math vs. markdown.  # all the "<", ">", "&" must be escaped correctly cases = [ "$aa;a-b<0$", "$$"] #### mpacer Jan 20, 2017 Member None of these cases take into account displayed math (i.e., between $$…$$). This problem probably applies equally there. #### mpacer Jan 20, 2017 Member These also don't take into account cases where there are no … delimiters of any kind but where there are LaTeX commands included in their own paragraph see: #232 and the above test. If you unify this test with the above test, this may address the problem. Also, one of the common cases where & is used in LaTeX is in the context of tables, it may be worthwhile to include a table example in the test. #### takluyver Jan 24, 2017 Member +1 to adding a few more test cases.  def test_markdown2html_math(self): # Mathematical expressions should be passed through unaltered def test_markdown2html_math_noescape(self): # Mathematical expressions not containing <, >, & should be passed through unaltered #### mpacer Jan 20, 2017 Member It seems like we should unify this with the other test so that we can have a unified set of tests that check for a single set of expected behaviour that covers both those things that should not be escaped and those things that should be. #### takluyver Jan 24, 2017 Member Unifying them is probably ideal, but I'd merge a PR if that was the only thing outstanding. Member ### mpacer commented Jan 20, 2017 • edited  I just realised that I had had those comments in a sort of "staged" sense for the past day or so… Apologies for appearing to have gone "dark" on this. And thank you @takluyver for saying that explicitly…because the implication that I hadn't had another look at it helped me realise that I had never "submitted" my review. I didn't make these "required changes" because I think that they're not absolutely necessary but would be better. In particular, if @takluyver thinks that we don't need to unify the tests since their separation no longer seems to make sense not use a private function in our tests to make them pass in a way that would not pass if it were expected behaviour in the main library cover all the test cases that this the escaping is supposed to cover (e.g., within \begin{}… \end{} and $$…$$) then I'm cool with merging. Contributor Author ### zasdfgbnm commented Jan 20, 2017  @michaelpacer You are actually making good suggestions. What I was doing on the test was just trying not to make too much changes on existing code. If @takluyver thinks it is better to restructure several tests in a simple PR, I have no problem doing this job. Member ### mpacer commented Jan 21, 2017  @zasdfgbnm That instinct makes sense and in general I'm guessing is a good practice. I just think in this case it may merit a bigger change to the tests. That said, some of my comments can be addressed without changing the structure of the tests just by changing the set of cases in test_markdown2html_math_escape. ### zasdfgbnm added some commits Feb 6, 2017  add more test cases   85e6ee2   merge escape & nonescape cases   9c16783   handle eqn with different beginnings, multiline   8b80daf  Contributor Author ### zasdfgbnm commented Feb 6, 2017  @takluyver @michaelpacer Sorry for the long wait. I was busy the past days. I merged the nonescape cases and escape cases for math formula. I also added test case for $$...$\$ and table as @michaelpacer says. I didn't do anything to the _unescape, because I have no idea on exactly what to do to improve it.
Contributor Author

### zasdfgbnm commented Feb 6, 2017 • edited

 Actually I don't think _unescape is something ugly only to make test passes as @michaelpacer said. On python 3 there is a unescape on official library that does the same thing, but it's not on python 2. That's all the reason I write my own _unescape. See https://wiki.python.org/moin/EscapingHtml for detail. What makes things ugly is not the design of test, but the fact of having two incompatible version of python.

Member

### takluyver commented Feb 20, 2017

 Thanks @zasdfgbnm - I'm happy enough to live with the test being a bit ugly for now.

### takluyver merged commit ec51671 into jupyter:master Feb 20, 2017 1 check passed

#### 1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

Closed