New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug when there is "<" in the math formula #514

Merged
merged 7 commits into from Feb 20, 2017

Conversation

Projects
None yet
3 participants
@zasdfgbnm
Copy link
Contributor

zasdfgbnm commented Jan 18, 2017

The easiest way to trigger the bug is to add a markdown cell and put $$<k'>$$.
When the notebook is converted to html, the "<" is left unescaped,
which makes browsers parse this as a html tag

Fix bug when there is "<" in the math formula
The easiest way to trigger the bug is to add a markdown cell and put $$<k'>$$.
When the notebook is converted to html, the "<" is left unescaped,
which makes browsers parse this as a html tag

@zasdfgbnm zasdfgbnm referenced this pull request Jan 18, 2017

Open

issues with latex < #504

@@ -104,15 +104,20 @@ def header(self, text, level, raw=None):
html = super(IPythonRenderer, self).header(text, level, raw=raw)
return add_anchor(html)

def escape_lt(self,text):
return text.replace('<','&lt;')

This comment has been minimized.

@takluyver

takluyver Jan 18, 2017

Member

Shouldn't we also be translating >? There's a function html.escape() which we might want to use.

This comment has been minimized.

@zasdfgbnm

zasdfgbnm Jan 18, 2017

Author Contributor

I didn't find any bug triggered by ">". But it is a good idea to escape all these special characters.

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Jan 18, 2017

Could you add a test for this in nbconvert.filters.tests.test_markdown? Thanks!

@zasdfgbnm

This comment has been minimized.

Copy link
Contributor Author

zasdfgbnm commented Jan 18, 2017

I will add the test

@@ -104,15 +104,20 @@ def header(self, text, level, raw=None):
html = super(IPythonRenderer, self).header(text, level, raw=raw)
return add_anchor(html)

def escape_lt(self,text):
return text.replace('<','&lt;')

# Pass math through unaltered - mathjax does the rendering in the browser

This comment has been minimized.

@mpacer

mpacer Jan 18, 2017

Member

This will no longer be true, we're deciding to follow a different standard than MathJax in this case, as MathJax does not always treat "<" and ">" as characters to be escaped (which leads to the "bugs" that this PR is trying to address).

zasdfgbnm added some commits Jan 18, 2017

Use html.escape to escape <, > and &
Add tests for escape
@zasdfgbnm

This comment has been minimized.

Copy link
Contributor Author

zasdfgbnm commented Jan 19, 2017

It should work now.

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Jan 19, 2017

Thanks. This looks OK to me now, but I'll let @michaelpacer have another look over it.

@@ -171,7 +190,8 @@ def test_markdown2html_math_paragraph(self):
]

for case in cases:
self.assertIn(case, markdown2html(case))
s = markdown2html(case)
self.assertIn(case,self._unescape(s))

This comment has been minimized.

@mpacer

mpacer Jan 20, 2017

Member

Do we expect markdown2html() anywhere else to be returning unescaped values? Is there any way to systematically detect that expectation (vs. not)?

I ask because it seems a little off to be using a private function in our testing suite just to make tests pass in the way that they were before. It almost seems like we should change the test's expected output (in this case, the case value) rather than to use this _unescape so our tests don't need any special treatment in order to pass. This way we're more explicit about the expected behaviour.

That said, if there's any purpose to having the unescaping function, we should probably surface it in the regular code base and test it separately.

This comment has been minimized.

@takluyver

takluyver Jan 24, 2017

Member

It's a bit ugly, but it was a bit of an ugly test before - it checks that the Markdown rendering doesn't change these samples. So it only works with samples that don't use any Markdown syntax (other than the math syntax we add). I wouldn't particularly ask @zasdfgbnm to fix this. Obviously we welcome improvements to the test, but that's probably best as a later PR.

This comment has been minimized.

@mpacer

mpacer Jan 24, 2017

Member

Yes that was the original test before. Currently this transformation does not leave these samples unchanged, which is why the unescaping needs to happen. I get that to make the test pass as it is it's easier to not have to concern yourself with the escaped bits.

Also, if the idea of this test is that the math processing leaves markdown unaffected, we should probably include some <, > and & in the markdown section and not automatically unescape them (because the math processing is leaking out into the markdown processing).

Also, this test should include a

\begin{}
… 
\end{}

(i.e., including the new lines around the declaration) since, not being wrapped in any number of $, there is an absence of the easiest cues to delimiting math vs. markdown.

# all the "<", ">", "&" must be escaped correctly
cases = [ "$a<b&b<lt$",
"$a<b&lt;b>a;a-b<0$",
"$<k'>$"]

This comment has been minimized.

@mpacer

mpacer Jan 20, 2017

Member

None of these cases take into account displayed math (i.e., between $$…$$). This problem probably applies equally there.

This comment has been minimized.

@mpacer

mpacer Jan 20, 2017

Member

These also don't take into account cases where there are no $…$ delimiters of any kind but where there are LaTeX commands included in their own paragraph see: #232 and the above test.

If you unify this test with the above test, this may address the problem.

Also, one of the common cases where & is used in LaTeX is in the context of tables, it may be worthwhile to include a table example in the test.

This comment has been minimized.

@takluyver

takluyver Jan 24, 2017

Member

+1 to adding a few more test cases.

def test_markdown2html_math(self):
# Mathematical expressions should be passed through unaltered
def test_markdown2html_math_noescape(self):
# Mathematical expressions not containing <, >, & should be passed through unaltered

This comment has been minimized.

@mpacer

mpacer Jan 20, 2017

Member

It seems like we should unify this with the other test so that we can have a unified set of tests that check for a single set of expected behaviour that covers both those things that should not be escaped and those things that should be.

This comment has been minimized.

@takluyver

takluyver Jan 24, 2017

Member

Unifying them is probably ideal, but I'd merge a PR if that was the only thing outstanding.

@mpacer

This comment has been minimized.

Copy link
Member

mpacer commented Jan 20, 2017

I just realised that I had had those comments in a sort of "staged" sense for the past day or so… Apologies for appearing to have gone "dark" on this. And thank you @takluyver for saying that explicitly…because the implication that I hadn't had another look at it helped me realise that I had never "submitted" my review.

I didn't make these "required changes" because I think that they're not absolutely necessary but would be better.

In particular, if @takluyver thinks that we don't need to

  1. unify the tests since their separation no longer seems to make sense
  2. not use a private function in our tests to make them pass in a way that would not pass if it were expected behaviour in the main library
  3. cover all the test cases that this the escaping is supposed to cover (e.g., within \begin{}… \end{} and $$…$$)

then I'm cool with merging.

@zasdfgbnm

This comment has been minimized.

Copy link
Contributor Author

zasdfgbnm commented Jan 20, 2017

@michaelpacer You are actually making good suggestions. What I was doing on the test was just trying not to make too much changes on existing code. If @takluyver thinks it is better to restructure several tests in a simple PR, I have no problem doing this job.

@mpacer

This comment has been minimized.

Copy link
Member

mpacer commented Jan 21, 2017

@zasdfgbnm That instinct makes sense and in general I'm guessing is a good practice. I just think in this case it may merit a bigger change to the tests.

That said, some of my comments can be addressed without changing the structure of the tests just by changing the set of cases in test_markdown2html_math_escape.

zasdfgbnm added some commits Feb 6, 2017

@zasdfgbnm

This comment has been minimized.

Copy link
Contributor Author

zasdfgbnm commented Feb 6, 2017

@takluyver @michaelpacer Sorry for the long wait. I was busy the past days. I merged the nonescape cases and escape cases for math formula. I also added test case for $$...$$ and table as @michaelpacer says. I didn't do anything to the _unescape, because I have no idea on exactly what to do to improve it.

@zasdfgbnm

This comment has been minimized.

Copy link
Contributor Author

zasdfgbnm commented Feb 6, 2017

Actually I don't think _unescape is something ugly only to make test passes as @michaelpacer said. On python 3 there is a unescape on official library that does the same thing, but it's not on python 2. That's all the reason I write my own _unescape. See https://wiki.python.org/moin/EscapingHtml for detail. What makes things ugly is not the design of test, but the fact of having two incompatible version of python.

@takluyver takluyver added this to the 5.2 milestone Feb 20, 2017

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Feb 20, 2017

Thanks @zasdfgbnm - I'm happy enough to live with the test being a bit ugly for now.

@takluyver takluyver merged commit ec51671 into jupyter:master Feb 20, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@zasdfgbnm zasdfgbnm deleted the zasdfgbnm:patch-1 branch Feb 20, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment