Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

& escaped to & in tex ? #4251

Closed
Carreau opened this issue Sep 22, 2013 · 31 comments · Fixed by #5025
Closed

& escaped to & in tex ? #4251

Carreau opened this issue Sep 22, 2013 · 31 comments · Fixed by #5025
Assignees
Milestone

Comments

@Carreau
Copy link
Member

Carreau commented Sep 22, 2013

Reported on list.

Hi all (most probably Matthias),

This is the first time I have had to use nbconvert, and I have noticed
that the ampersand symbol used in the eqnarray environment to align
equations is not replaced properly in the latex output. In the latex
file I still have &=& (which can't be compiled by latex), while
in the JSON file I have the correct form, &=&. Is this a known issue? If
so, what is the magic fix?

Cheers,
Zoltán

ping @jdfreder

@jakobgager
Copy link
Contributor

Can you be more specific of what you are trying to convert (and which ipython version you are using)?
If I enter in a markdowncell

\begin{eqnarray}
1 &=& 2*3\\
41 &=& 3*6
\end{eqnarray}

this (currently) remains unchanged when converted to latex and compiles fine (ipython 1.1.0).
I haven't met an &amp yet.

@v923z
Copy link
Contributor

v923z commented Sep 23, 2013

Oh, sorry. The ipython version is the latest from github (version 2.0). In fact, I believe that for version 1.1 nbcovert wasn't even part of ipython, it was a separate tool.
If I take your example, then my latex output is

\begin{eqnarray}
1 &=& 2*3\\
41 &=& 3*6
\end{eqnarray}

I do have the ampersands in the output. I invoked nbconvert as

python nbconvert --to latex Untitled0.ipynb

without any extra switches. Untitled0.ipynb contains only the cell that you had, nothing more.

@jakobgager
Copy link
Contributor

Strange! Which version of pandoc do you have installed?
I cannot reproduce with ipython master and pandoc 1.9.4.1. and 1.12.0.1

python nbconvert --to latex Untitled0.ipynb

I guess you mean ipython ...

Btw. the incorporation of nbconvert into ipython was THE major milestone for the 1.0 release.

@v923z
Copy link
Contributor

v923z commented Sep 23, 2013

My pandoc version is 1.9.4.2.
Yes, I meant ipython:)

@jakobgager
Copy link
Contributor

Can you provide your ipynb file (although it might be really simple).
I don"t understand why pandoc uses html escapes here!

@v923z
Copy link
Contributor

v923z commented Sep 23, 2013

Hi Jakob,

Sure. Here is my notebook file, and also the output of

ipython notebook --to latex

Cheers,
Zoltán

On 23/09/13 11:28, Jakob Gager wrote:

Can you provide your ipynb file (although it might be really simple).
I don"t understand why pandoc uses html escapes here!


Reply to this email directly or view it on GitHub
#4251 (comment).

{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\begin{eqnarray}\n",
"1 &=& 2_3\\n",
"41 &=& 3_6\n",
"\end{eqnarray}"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}

@jdfreder
Copy link
Member

I don"t understand why Pandoc uses html escapes here!

@jakobgager Maybe it's not recognizing that latex environment? Strange that it isn't escaping incorrectly on yours. Do you have the branch we were working on earlier, including your fix? I know my fix will not parse this correctly (as you mentioned, it doesn't do inline latex)

@jdfreder
Copy link
Member

Link to #4234 and #3503

@jakobgager
Copy link
Contributor

I tried with pandoc 1.10.1 and checked also the pandoc Json and pandoc clearly recognizes the \begin{} and \end{} as raw latex (block). Raw latex is passed through by pandoc when converting markdown to latex.
From my point of view, there is absolutely no reason to get any html escapes here.

@jdfreder I tried with ipython 1.1.0 and current master - not with any "advanced" markdown treatment. But maybe we should add such a check as well, because if we convert to html first this might appear in #3503.

@v923z Can you try reinstalling ipython (doesn't matter if master or 1.1.0) and convert again?

@v923z
Copy link
Contributor

v923z commented Sep 23, 2013

@v923z Can you try reinstalling ipython (doesn't matter if master or 1.1.0) and convert again?

Hm. I thought it had been established that the problem originates from pandoc. As for re-installation, I do that every morning when I check out the new code from github. But there is another machine, on which I installed ipython completely from scratch just two days ago (pulled code from github, version 2.0.0), and the problem still persists. pandoc version is 1.9.1.1.

For now, I can simply post-process the output (I have to remove & only, I presume), so I can live with this. On the other hand, I guess, this has to be sorted out sooner or later...

@jakobgager
Copy link
Contributor

Well I'm still not perfectly sure it is a pandoc issue.
Can you try in a shell:

> echo '\begin{eqnarray}1&=&2\end{eqnarray}' | pandoc -f markdown -t json

@v923z
Copy link
Contributor

v923z commented Sep 23, 2013

Yes, here is the output:

[{"docTitle":[],"docAuthors":[],"docDate":[]},[{"RawBlock":["latex","\\begin{eqnarray}1&=&2\\end{eqnarray}"]}]]

@jakobgager
Copy link
Contributor

So pandoc works as expected! Found as raw latex and no html escape with the &.
❓❓❓ still no idea

@jakobgager
Copy link
Contributor

Maybe just for completion, can you try in a shell:

> echo '\begin{eqnarray}1&=&2\end{eqnarray}' | pandoc -f markdown -t latex

@v923z
Copy link
Contributor

v923z commented Sep 24, 2013

That leaves the input unchanged. Is it what's supposed to happen?

@jakobgager
Copy link
Contributor

Yes, exactly!

@jakobgager
Copy link
Contributor

I downgraded pandoc to 1.9.1.1 but still get correct conversion (with ipython master).
The only way I get pandoc (I tried now with 1.9.1.1) to produce something similiar is

> pandoc -f markdown -t html
\\begin{eqnarray}
1 &=& 2*3\\\\
\\end{eqnarray}

This leads to

<p>\begin{eqnarray}1 &amp;=&amp; 2*3\\ \end{eqnarray}</p>

Looks similar but has some distinct differences, and most important is a conversion to html and not to latex.

Maybe we have to start more fundamentally, which python, OS and shell are you using?

@v923z
Copy link
Contributor

v923z commented Sep 24, 2013

Maybe we have to start more fundamentally, which python, OS and shell are you using?

The OS is either Mint Nadia, or ubuntu 12.04, and I ran the commands from the bash shell. The python version is 2.7, but I am not sure why that should matter. If you say that you don't see this problem anywhere else, then it's really weird...

@jakobgager
Copy link
Contributor

So far I tried with:

  • ubuntu 12.10 - python 2.7 - ipython 1.1.0 and master - pandoc 1.12.1
  • ubuntu 13.04 - python 2.7 - ipython 1.1.0 - pandoc 1.10.1
  • scientific linux 6 - python 2.6 - ipython 1.1.0 - pandoc 1.9.4.1

-> all worked fine
BUT!!! I finally managed to reproduce your issue with

  • xubuntu 12.04 - python 2.7 - ipython master - pandoc 1.9.1.1
    (all required packages are taken from the repository > sphinx, jinja2, pygments, tornado, pyzmq)

So maybe it is related to the "old" jinja2?? (asking @jdfreder )

@jakobgager
Copy link
Contributor

The problem persists with an pandoc update (and jinja update). So no pandoc problem in the end!

@jakobgager
Copy link
Contributor

Ok the problem comes from the citation2latex filter -> Only appear in master (not in 1.1.0)
The funny thing is it works with 12.10, but not with 12.04
12.04:

In [1]: from IPython.nbconvert.filters import citation2latex

In [2]: s = "\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}"

In [3]: citation2latex(s)
Out[3]: u'\\begin{eqnarray}\n1 &amp;=&amp;2 \\\\\n2 &amp;=&amp; 4\n\\end{eqnarray}'

12.10:

In [1]: from IPython.nbconvert.filters import citation2latex

In [2]: s = "\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}"

In [3]: citation2latex(s)
Out[3]: '\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}'

I tried updating lxml but no success - any ideas @jdfreder, @ellisonbg

@v923z
Copy link
Contributor

v923z commented Sep 24, 2013

Jakob, I completely lost the thread on the other issue: this particular problem with the ampersand is not related to the missing eqnarray environment in the html output, is it?

@jakobgager
Copy link
Contributor

No, the other issue IS a pandoc issue 😄

@jdfreder
Copy link
Member

It looks like lxml is designed to replace & with &amp;:
http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python

Looking at citation.py, all of the text is processed through the lxml logic.

@jakobgager
Copy link
Contributor

The strange thing is that with 12.10 and 13.04 lxml does not replace & with &amp;, maybe some default argument has changed?

@jdfreder
Copy link
Member

recover=True maybe

@jakobgager
Copy link
Contributor

Following your link, this might be a good starting point!
I'll try tomorrow.
Actually I tried to upgrade lmxl using pip install --upgrade lxml
but this didn't changed a anything.

@v923z
Copy link
Contributor

v923z commented Sep 25, 2013

If this is the case, wouldn't it make more sense to test for whether & is replaced, and then clean up the output, if it is?

@jakobgager
Copy link
Contributor

I prefer to correct this bug at its origin instead of just concealing its effects. If there is really no way around than a subsequent testing and correction could be applied, however.

@jakobgager
Copy link
Contributor

As @jdfreder noted lxml always replaces & with &amp;. The "correct" conversions I saw with 12.10 and 13.04 are due to the test VE (there was no lxml installed) and the try...else implementation in citation2latex -> my mistake!
So the issue is present with all versions of lxml.

@jakobgager
Copy link
Contributor

@v923z in the end your replace approach might be the best solution...
we could use the sax library

from lxml import html
s = "\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}"
t1 = html.fragment_fromstring(s, create_parent='div')
html.tostring(t1, method='html')
from xml.sax.saxutils import unescape
unescape(html.tostring(t1))

gives

'<div>\\begin{eqnarray}\n1 &=&2 \\\\\n2 &=& 4\n\\end{eqnarray}</div>'

However

I discovered something more severe in this context - if someone has the completely insane idea of writing something like 1<2 this would break the lxml approach

from lxml import html
s = "\\begin{eqnarray}\n1 &<&2 \\\\\n2 &=& 4\n\\end{eqnarray}"
t1 = html.fragment_fromstring(s, create_parent='div')
html.tostring(t1, method='html')
from xml.sax.saxutils import unescape
unescape(html.tostring(t1))

gives

'<div>\\begin{eqnarray}\n1 &</div>'

or with our methods:

citation2latex("1<2 is obviously true")

gives

u'1'

So I recommend to rewrite the citation2latex filter - pinging @jdfreder, @ellisonbg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment