& escaped to & in tex ? #4251

Carreau · 2013-09-22T12:11:50Z

Reported on list.

Hi all (most probably Matthias),

This is the first time I have had to use nbconvert, and I have noticed
that the ampersand symbol used in the eqnarray environment to align
equations is not replaced properly in the latex output. In the latex
file I still have &=& (which can't be compiled by latex), while
in the JSON file I have the correct form, &=&. Is this a known issue? If
so, what is the magic fix?

Cheers,
Zoltán

ping @jdfreder

jakobgager · 2013-09-22T16:16:35Z

Can you be more specific of what you are trying to convert (and which ipython version you are using)?
If I enter in a markdowncell

\begin{eqnarray}
1 &=& 2*3\\
41 &=& 3*6
\end{eqnarray}

this (currently) remains unchanged when converted to latex and compiles fine (ipython 1.1.0).
I haven't met an &amp yet.

v923z · 2013-09-23T07:20:18Z

Oh, sorry. The ipython version is the latest from github (version 2.0). In fact, I believe that for version 1.1 nbcovert wasn't even part of ipython, it was a separate tool.
If I take your example, then my latex output is

\begin{eqnarray}
1 &amp;=&amp; 2*3\\
41 &amp;=&amp; 3*6
\end{eqnarray}

I do have the ampersands in the output. I invoked nbconvert as

python nbconvert --to latex Untitled0.ipynb

without any extra switches. Untitled0.ipynb contains only the cell that you had, nothing more.

jakobgager · 2013-09-23T08:01:52Z

Strange! Which version of pandoc do you have installed?
I cannot reproduce with ipython master and pandoc 1.9.4.1. and 1.12.0.1

python nbconvert --to latex Untitled0.ipynb

I guess you mean ipython ...

Btw. the incorporation of nbconvert into ipython was THE major milestone for the 1.0 release.

v923z · 2013-09-23T09:15:27Z

My pandoc version is 1.9.4.2.
Yes, I meant ipython:)

jakobgager · 2013-09-23T09:28:03Z

Can you provide your ipynb file (although it might be really simple).
I don"t understand why pandoc uses html escapes here!

v923z · 2013-09-23T12:18:36Z

Hi Jakob,

Sure. Here is my notebook file, and also the output of

ipython notebook --to latex

Cheers,
Zoltán

On 23/09/13 11:28, Jakob Gager wrote:

Can you provide your ipynb file (although it might be really simple).
I don"t understand why pandoc uses html escapes here!

—
Reply to this email directly or view it on GitHub
#4251 (comment).

{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\begin{eqnarray}\n",
"1 &=& 2_3\\n",
"41 &=& 3_6\n",
"\end{eqnarray}"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}

jdfreder · 2013-09-23T16:58:30Z

I don"t understand why Pandoc uses html escapes here!

@jakobgager Maybe it's not recognizing that latex environment? Strange that it isn't escaping incorrectly on yours. Do you have the branch we were working on earlier, including your fix? I know my fix will not parse this correctly (as you mentioned, it doesn't do inline latex)

jdfreder · 2013-09-23T17:11:35Z

Link to #4234 and #3503

jakobgager · 2013-09-23T19:18:07Z

I tried with pandoc 1.10.1 and checked also the pandoc Json and pandoc clearly recognizes the \begin{} and \end{} as raw latex (block). Raw latex is passed through by pandoc when converting markdown to latex.
From my point of view, there is absolutely no reason to get any html escapes here.

@jdfreder I tried with ipython 1.1.0 and current master - not with any "advanced" markdown treatment. But maybe we should add such a check as well, because if we convert to html first this might appear in #3503.

@v923z Can you try reinstalling ipython (doesn't matter if master or 1.1.0) and convert again?

v923z · 2013-09-23T19:38:55Z

@v923z Can you try reinstalling ipython (doesn't matter if master or 1.1.0) and convert again?

Hm. I thought it had been established that the problem originates from pandoc. As for re-installation, I do that every morning when I check out the new code from github. But there is another machine, on which I installed ipython completely from scratch just two days ago (pulled code from github, version 2.0.0), and the problem still persists. pandoc version is 1.9.1.1.

For now, I can simply post-process the output (I have to remove & only, I presume), so I can live with this. On the other hand, I guess, this has to be sorted out sooner or later...

jakobgager · 2013-09-23T19:58:14Z

Well I'm still not perfectly sure it is a pandoc issue.
Can you try in a shell:

> echo '\begin{eqnarray}1&=&2\end{eqnarray}' | pandoc -f markdown -t json

v923z · 2013-09-23T20:03:34Z

Yes, here is the output:

[{"docTitle":[],"docAuthors":[],"docDate":[]},[{"RawBlock":["latex","\\begin{eqnarray}1&=&2\\end{eqnarray}"]}]]

jakobgager · 2013-09-23T20:19:10Z

So pandoc works as expected! Found as raw latex and no html escape with the &.
❓❓❓ still no idea

jakobgager · 2013-09-24T06:35:49Z

Maybe just for completion, can you try in a shell:

> echo '\begin{eqnarray}1&=&2\end{eqnarray}' | pandoc -f markdown -t latex

v923z · 2013-09-24T06:42:20Z

That leaves the input unchanged. Is it what's supposed to happen?

jakobgager · 2013-09-24T07:05:10Z

Yes, exactly!

jakobgager · 2013-09-24T10:22:45Z

I downgraded pandoc to 1.9.1.1 but still get correct conversion (with ipython master).
The only way I get pandoc (I tried now with 1.9.1.1) to produce something similiar is

> pandoc -f markdown -t html
\\begin{eqnarray}
1 &=& 2*3\\\\
\\end{eqnarray}

This leads to

<p>\begin{eqnarray}1 &amp;=&amp; 2*3\\ \end{eqnarray}</p>

Looks similar but has some distinct differences, and most important is a conversion to html and not to latex.

Maybe we have to start more fundamentally, which python, OS and shell are you using?

v923z · 2013-09-24T11:22:48Z

Maybe we have to start more fundamentally, which python, OS and shell are you using?

The OS is either Mint Nadia, or ubuntu 12.04, and I ran the commands from the bash shell. The python version is 2.7, but I am not sure why that should matter. If you say that you don't see this problem anywhere else, then it's really weird...

jakobgager · 2013-09-24T11:45:57Z

So far I tried with:

ubuntu 12.10 - python 2.7 - ipython 1.1.0 and master - pandoc 1.12.1
ubuntu 13.04 - python 2.7 - ipython 1.1.0 - pandoc 1.10.1
scientific linux 6 - python 2.6 - ipython 1.1.0 - pandoc 1.9.4.1

-> all worked fine
BUT!!! I finally managed to reproduce your issue with

xubuntu 12.04 - python 2.7 - ipython master - pandoc 1.9.1.1
(all required packages are taken from the repository > sphinx, jinja2, pygments, tornado, pyzmq)

So maybe it is related to the "old" jinja2?? (asking @jdfreder )

jakobgager · 2013-09-24T12:26:26Z

The problem persists with an pandoc update (and jinja update). So no pandoc problem in the end!

jakobgager · 2013-09-24T13:17:40Z

Ok the problem comes from the citation2latex filter -> Only appear in master (not in 1.1.0)
The funny thing is it works with 12.10, but not with 12.04
12.04:

In [1]: from IPython.nbconvert.filters import citation2latex

In [2]: s = "\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}"

In [3]: citation2latex(s)
Out[3]: u'\\begin{eqnarray}\n1 &amp;=&amp;2 \\\\\n2 &amp;=&amp; 4\n\\end{eqnarray}'

12.10:

In [1]: from IPython.nbconvert.filters import citation2latex

In [2]: s = "\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}"

In [3]: citation2latex(s)
Out[3]: '\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}'

I tried updating lxml but no success - any ideas @jdfreder, @ellisonbg

v923z · 2013-09-24T13:22:32Z

Jakob, I completely lost the thread on the other issue: this particular problem with the ampersand is not related to the missing eqnarray environment in the html output, is it?

jakobgager · 2013-09-24T13:24:17Z

No, the other issue IS a pandoc issue 😄

jdfreder · 2013-09-24T21:03:58Z

It looks like lxml is designed to replace & with &:
http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python

Looking at citation.py, all of the text is processed through the lxml logic.

jakobgager · 2013-09-24T21:19:51Z

The strange thing is that with 12.10 and 13.04 lxml does not replace & with &, maybe some default argument has changed?

jdfreder · 2013-09-24T21:20:32Z

recover=True maybe

jakobgager · 2013-09-24T21:31:08Z

Following your link, this might be a good starting point!
I'll try tomorrow.
Actually I tried to upgrade lmxl using pip install --upgrade lxml
but this didn't changed a anything.

v923z · 2013-09-25T07:52:40Z

If this is the case, wouldn't it make more sense to test for whether & is replaced, and then clean up the output, if it is?

jakobgager · 2013-09-25T08:02:09Z

I prefer to correct this bug at its origin instead of just concealing its effects. If there is really no way around than a subsequent testing and correction could be applied, however.

jakobgager · 2013-09-25T08:50:53Z

As @jdfreder noted lxml always replaces & with &. The "correct" conversions I saw with 12.10 and 13.04 are due to the test VE (there was no lxml installed) and the try...else implementation in citation2latex -> my mistake!
So the issue is present with all versions of lxml.

jakobgager · 2013-09-25T09:09:49Z

@v923z in the end your replace approach might be the best solution...
we could use the sax library

from lxml import html
s = "\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}"
t1 = html.fragment_fromstring(s, create_parent='div')
html.tostring(t1, method='html')
from xml.sax.saxutils import unescape
unescape(html.tostring(t1))

gives

'<div>\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}</div>'

However

I discovered something more severe in this context - if someone has the completely insane idea of writing something like 1<2 this would break the lxml approach

from lxml import html
s = "\\begin{eqnarray}\n1 &<&2 \\\\\n2 &=& 4\n\\end{eqnarray}"
t1 = html.fragment_fromstring(s, create_parent='div')
html.tostring(t1, method='html')
from xml.sax.saxutils import unescape
unescape(html.tostring(t1))

gives

'<div>\\begin{eqnarray}\n1 &</div>'

or with our methods:

citation2latex("1<2 is obviously true")

gives

u'1'

So I recommend to rewrite the citation2latex filter - pinging @jdfreder, @ellisonbg.

jakobgager mentioned this issue Sep 23, 2013

Handle raw html tags in markdown during conversion to latex #3503

Closed

This was referenced Sep 25, 2013

Add LaTeX citation handling to nbconvert #4090

Merged

A < in a markdown cell strips cell content when converting to latex #4283

Closed

[DRAFT/DEMO] lxml free citation2latex #4284

Closed

Take 2: citation2latex filter (using HTMLParser) #4323

Closed

ghost assigned jdfreder Jan 26, 2014

jdfreder mentioned this issue Feb 4, 2014

citation2latex filter (using HTMLParser) #5025

Merged

ellisonbg closed this as completed in #5025 Feb 4, 2014

minrk modified the milestones: 2.0, 3.0 Feb 6, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

& escaped to & in tex ? #4251

& escaped to & in tex ? #4251

Carreau commented Sep 22, 2013

jakobgager commented Sep 22, 2013

v923z commented Sep 23, 2013

jakobgager commented Sep 23, 2013

v923z commented Sep 23, 2013

jakobgager commented Sep 23, 2013

v923z commented Sep 23, 2013

jdfreder commented Sep 23, 2013

jdfreder commented Sep 23, 2013

jakobgager commented Sep 23, 2013

v923z commented Sep 23, 2013

jakobgager commented Sep 23, 2013

v923z commented Sep 23, 2013

jakobgager commented Sep 23, 2013

jakobgager commented Sep 24, 2013

v923z commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jakobgager commented Sep 24, 2013

v923z commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jakobgager commented Sep 24, 2013

v923z commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jdfreder commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jdfreder commented Sep 24, 2013

jakobgager commented Sep 24, 2013

v923z commented Sep 25, 2013

jakobgager commented Sep 25, 2013

jakobgager commented Sep 25, 2013

jakobgager commented Sep 25, 2013

& escaped to &amp; in tex ? #4251

& escaped to &amp; in tex ? #4251

Comments

Carreau commented Sep 22, 2013

jakobgager commented Sep 22, 2013

v923z commented Sep 23, 2013

jakobgager commented Sep 23, 2013

v923z commented Sep 23, 2013

jakobgager commented Sep 23, 2013

v923z commented Sep 23, 2013

jdfreder commented Sep 23, 2013

jdfreder commented Sep 23, 2013

jakobgager commented Sep 23, 2013

v923z commented Sep 23, 2013

jakobgager commented Sep 23, 2013

v923z commented Sep 23, 2013

jakobgager commented Sep 23, 2013

jakobgager commented Sep 24, 2013

v923z commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jakobgager commented Sep 24, 2013

v923z commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jakobgager commented Sep 24, 2013

v923z commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jdfreder commented Sep 24, 2013

jakobgager commented Sep 24, 2013

jdfreder commented Sep 24, 2013

jakobgager commented Sep 24, 2013

v923z commented Sep 25, 2013

jakobgager commented Sep 25, 2013

jakobgager commented Sep 25, 2013

jakobgager commented Sep 25, 2013

However

& escaped to & in tex ? #4251

& escaped to & in tex ? #4251