Browse files

Fixed bug RF#29521: HTML math output not always XHTML compatible

The characters < and & are not allowed in a script tag in XHTML.
So since the HTML converter uses script tags for math elements,
whenever these characters appear in a value the value is wrapped
in a CDATA section to make the output XHTML compatible.
  • Loading branch information...
1 parent 8f11194 commit 512b00a6d050f506c43b824861f7bbf459862a19 @gettalong committed Feb 19, 2012
Showing with 5 additions and 4 deletions.
  1. +2 −1 lib/kramdown/converter/html.rb
  2. +1 −1 lib/kramdown/parser/html.rb
  3. +2 −2 test/testcases/block/15_math/normal.html
@@ -315,7 +315,8 @@ def convert_smart_quote(el, indent)
def convert_math(el, indent)
block = (el.options[:category] == :block)
- "<script type=\"math/tex#{block ? '; mode=display' : ''}\">#{el.value}</script>#{block ? "\n" : ''}"
+ value = (el.value =~ /<|&/ ? "<![CDATA[#{el.value}]]>" : el.value)
+ "<script type=\"math/tex#{block ? '; mode=display' : ''}\">#{value}</script>#{block ? "\n" : ''}"
xiw Apr 20, 2012

With this commit, MathJax displays <![CDATA and ]]> around equations, which is incorrect.

To make math output MathJax compatible, I am using the following workaround.

-        "<script type=\"math/tex#{block ? '; mode=display' : ''}\">#{value}</script>#{block ? "\n" : ''}"
+        lb = block ? '\[' : '\('
+        rb = block ? '\]'"\n" : '\)'
+        "#{lb}#{value}#{rb}"
gioele Apr 20, 2012 Contributor

I originally reported the problem with unescaped < and & characters. I also think that it would be better to just use () and [] instead of <script>. Anyway, both solutions require either a <![CDATA (that is not liked by MathJax, you say) or, what I prefer, to HTML-escape the contents of value.

Why isn't HTML-escape used instead of <![CDATA?

Please note that not HTML-escaping external data is going to lead to security problems.

xiw Apr 20, 2012

CDATA in \[...\] works for MathJax, but not in the <script> tag.

&amps; in <script> (without CDATA) doesn't work for MathJax, either.

Maybe it's better to get rid of the <script> tag?

BTW, any suggestion on how to do HTML escaping here?

gioele Apr 20, 2012 Contributor

I'm all for getting rid of the <script> tag, also because it (probably, haven't tested thoroughly) hides content for browser with JS disabled. I would like to know what is the reason behind that preference by the MathJax authors.

HTML escape can be easily performed with CGI.escapeHTML(value) (CGI is part of the stdlib).

xiw Apr 20, 2012

Actually I tried this patch.

-        value = (el.value =~ /<|&/ ? "<![CDATA[#{el.value}]]>" : el.value)
-        "<script type=\"math/tex#{block ? '; mode=display' : ''}\">#{value}</script>#{block ? "\n" : ''}"
+        value = CGI.escapeHTML(el.value)
+        lb = block ? '\[' : '\('
+        rb = block ? '\]'"\n" : '\)'
+        "#{lb}#{value}#{rb}"

Then every \\ (linebreak) in my latex source was turned into &#92;, which of course didn't work in MathJax. Did I miss anything?

xiw Apr 20, 2012

Oops, it has nothing to do with CGI.escapeHTML. \\ appears in kramdown's output. Maybe Octopress or Jekyll does something magic (I have been using kramdown as the markdown engine in Octopress).

xiw Apr 21, 2012

Here goes a summary. I am using Octopress with MathJax and kramdown.

Methods that work

  • <script> w/ original latex (i.e., before this commit). The problem is that it is not XHTML compatible when the latex source contains & and <.
  • \[...\] w/ CDATA (i.e., the workaround I am using). Not sure how safe this is. Probably we need to make sure there is no ]]> in the latex source. Any other concerns?

Methods that do not work

  • <script> w/ CDATA. MathJax displays <![CDATA and ]]>.
  • <script> w/ escaped HTML. MathJax displays amp; for &amps.
  • \[...\] w/ original latex. The linebreak \\ in latex becomes &#92;. It is also not XHTML compatible.
  • \[...\] w/ escaped HTML. The linebreak \\ in latex becomes &#92;.

Looks like the only method that both works for MathJax and remains XHTML compatible is \[...\] w/ CDATA.

gettalong Apr 21, 2012 Owner

One reason for using the <script> tag is so that converting back from HTML to kramdown works. Using \[ and \] would be much more complicated...

Another way to solve this is by using the following startup hook for MathJax:

MathJax.Hub.Register.StartupHook("TeX Jax Ready",function () {
  MathJax.InputJax.TeX.prefilterHooks.Add(function (data) {
    data.math = data.math.replace(/^\s*<!\[CDATA\[\s*((?:\n|.)*)\s*\]\]>\s*$/m,"$1");
gioele Apr 21, 2012 Contributor

@xiw, can you file a bug with Octopres/Jekill? They should take care this \\ thing.

I filed the original bug because I would like to use XHTML5 and XML tools, so I need kramdown to generate valid XHTML5 (both well formed and semantically correct, so, no CDATA inside <script>).

Also, can we have an option to choose which delimiters to use for displayed/inline math? For example some may prefer $..$ (mathexchange-like) or $$..$$ (and, maybe, some <span class='math'> to make the conversion back to kramdown easier), others may be fine with <script> + CDATA.

xiw Apr 21, 2012

@gettalong I like your workaround. I'll go for that. Thanks a lot!

@gioele Not sure if it's a Jekyll or Octopress thing.

Adding some option sounds nice. Since I trust the latex source I wrote, using $..$ and $$..$$ would work for me.

gettalong Apr 22, 2012 Owner

@gioele, @xiw: I think I found a one-size-fits-all solution for this problem. Like with CDATA sections in javascript code, we can just comment out the CDATA code for the script itself, i.e. by using LaTeX comments we can hide the CDATA code from MathJax.

Generated MathJax <script> elements would look like this:

<script type="math/tex">% <![CDATA[
< &=5 \\
&=6 \\
\end{align*} %]]> </script>

Could you tell me if this would work for both of you?

xiw Apr 22, 2012

Nice trick! Works for me.

BTW, we need a linebreak after <![CDATA[ for inline math as well, right?

<p>How about <script type="math/tex">%<![CDATA[
x < y
gettalong Apr 22, 2012 Owner

Yes, because otherwise the LaTeX code would end up in the comment, too.

def convert_abbreviation(el, indent)
@@ -529,7 +529,7 @@ def is_math_tag?(el)
def handle_math_tag(el)
set_basics(el, :math, :category => (el.attr['type'] =~ /mode=display/ ? :block : :span))
- el.value = el.children.shift.value
+ el.value = el.children.shift.value.sub(/\A<!\[CDATA\[(.*)\]\]>\z/m, '\1')
@@ -6,10 +6,10 @@
<p><script type="math/tex">\lambda_\alpha > 5</script>
This is a para.</p>
-<script type="math/tex; mode=display">\begin{align*}
+<script type="math/tex; mode=display"><![CDATA[\begin{align*}
&=5 \\
&=6 \\
<script type="math/tex; mode=display">5+5</script>

0 comments on commit 512b00a

Please sign in to comment.