Upcoming issues with nbconvert #3603

Closed
jakobgager opened this Issue Jul 10, 2013 · 31 comments

4 participants

@jakobgager

Here, I want to collect some issues already present or expected due some recent additions to the notebook. Currently they will be appear when converting to latex (or sphinx?)

  • escape backslash in output (see #3588). FIXED in #3951
    • would work in \verbatim but not in the current \Verbatim
  • selected styles of section headers (related to #3576, #3601, #3531)
    • e.g. strikeout is converted to \sout which cannot be used in section headers (http://tex.stackexchange.com/questions/22410/strikethrough-in-section-title)
    • as the python markdown method is replaced by the pandoc version, latex code is passed as-it-is during latex conversion, e.g. \alpha (without $) is passed without backslash escape. Is considered as INTENDED see e.g #4090
    • \itemize could appear but is not allowed in latex section headers. FIXED in #3601
  • Terminal colors (256 or 24bit) cf. #3618
    • the current implementation allows only colors defined in IPython.utils.coloransi.color_templates. The colors are manually added to latex using the color package
  • (re)newcommand calls embedded in math environments in the notebook (to be recognized by mathjax) are placed into a math environment during conversion as well. However, this is not permitted in LaTeX and is ignored. (reported by @dpsanders)
    • add a RE-filter to capture these calls and remove the $s?
    • define some sort of %%header magic to get access to the latex preamble?
@minrk
IPython member

itemize should not be possible to appear in section headers

@jakobgager

Well, they are not rendered, but the are stored in the json and therefore get converted to latex!

@dpsanders

@jakobgager Actually, \newcommand etc. are allowed inside math environments.
The problem is that they are local to those environments!

I thought of just sticking the \newcommand's in a %%latex magic cell, but that gets
horribly mangled in the conversion process, rather than just passed straight through as it should!

Probably this separate %%latex is a reasonable solution, but at the moment it copies the \newcommand etc. to the output -- maybe this could somehow be automatically suppressed?

There is also a funny issue with the converter with the following in a markdown cell:

$
y = f(x)
$

This is correctly interpreted in the markdown cell as an inline equation, but also gets mangled in the LaTeX output.
$y = f(x)$ all on one line works correctly...

@dpsanders

Could there be some kind of %%header magic, which could import an external header file for all of the extra \usepackage stuff one could need in the latex converter? I presume other converters may also need similar header boilerplate.

@dpsanders

For the LaTeX backend and any others which can end up as PDF there must be an option to have the figures exported as PDF rather than PNG. Matplotlib exports excellent PDFs so I guess this is pretty easy for somebody who understands how the code works. I'll give it a try once I get my head round the structure.

Is there a good description somewhere of the nbconvert process?

@jakobgager

Thanks, @dpsanders for clarifying the \newcommand issue. I only tested it the other way, so sorry for the somehow wrong info.

@jakobgager

Well, there is the option to specify the matplotlib output as svg, which finally gets converted to pdf during the nbconversion process. So I guess this should solve your problem. Should be something like:
>>> %config InlineBackend.figure_format = 'svg'

@dpsanders

Nice idea (is that documented somewhere?) but it doesn't seem to work -- still bitmapped images in the PDF.

@jakobgager

Regarding the Latex Header stuff, I proposed a PR some time ago #3570 which splits the markdown cell into

  • markdown,
  • html tags, and
  • something I called tagged comments.

Later are basically provided to inject invisible latex code, so this could be used to define such newcommands.
However, I think includes have to be defined in the preamble and are not permitted in the latex body - please correct me if I'm wrong here (again 😄 )

@jakobgager

I found the svg config here: http://ipython.org/ipython-doc/dev/interactive/qtconsole.html
It's really strange that these svgs get pixeled! Will try this tomorrow too.

@dpsanders

You're correct that includes are not allowed after \begin{document}.
Of course, it would be possible to collect all the \usepackage's from the rest of the document and output them all together...

@dpsanders

From what I understand in #3607 , inkscape (!) is used to convert SVG to PDF.
This seems to me to be a non-useful solution, at least as far as the LaTeX backend is concerned,
since a better solution would just be to output PDF straight from matplotlib.

There is no reason (from my naive, non-developer's point of view) why the inline backend for viewing in the notebook web app should be related to the backend that nbconvert decides to use for a particular output format.

@minrk
IPython member

since a better solution would just be to output PDF straight from matplotlib.

Yes, absolutely. But this decision has to be done at plot-time, not nbconvert-time, and we have to add a PDF formatter for the PDF to be included in the notebook document.

There is no reason (from my naive, non-developer's point of view) why the inline backend for viewing in the notebook web app should be related to the backend that nbconvert decides to use for a particular output format.

There is, in that the inline backend determines what matplotlib generates, and thus what is embedded into the notebook document. A very important part of nbconvert is that it does not re-run code to generate new figures, it just takes the output already saved in the notebook document, and plops that into the destination format.

The right way to do proper plots for pdf (in my opinion) is to register a PDF formatter for figures in addition to the PNG / SVG, since PDF won't be visible in the live notebook. This PDF will then be used by nbconvert for latex output. This should be helped by some of the cleanup in IPEP 13 / notebook format 4. But all of this happens before nbconvert is involved at all.

@dpsanders

Ah, thanks for the clarification -- fundamental misunderstanding on my part.
I completely agree with your proposed solution.

@dpsanders

Having said that, while I understand that this is the idea of nbconvert, there then needs to be some other tool that can "run" a given "notebook document" (@Carreau is right: there is a real language problem here) in different ways to produce different output, with the correct configuration for each.

Indeed, I understand that the whole point of the notebook is finally to be able to use a single document to produce, say

  • a reveal slideshow
  • a publishable paper
  • an HTML blog version
  • a documented Python library

We don't want to have to be going into the Notebook app and manually changing around things like matplotlib backends to get out all these different beasts. Are there proposals in this kind of direction that I can look at?

@minrk
IPython member

It's actually very simple to 'run' a notebook in a headless environment. Stepping through a notebook and running cells is something we should add to IPython. This is independent of nbconvert, in my mind.

That said, I think a Transformer could be used by nbconvert to include re-execution as a part of the export process, ignoring the output saved in the document.

@dpsanders

I was under the impression that with the _repr_... methods, objects could have different representations for different purposes. So there could be one for the display in the notebook and a different one for LaTeX output (so that no PDF representation would be needed in the notebook application for example). Is this correct? Where can I check this in detail?

But of course if I'm working interactively in the notebook app, I want to see inline SVG's and not generate PDFs separately at the same time.

I guess I'm again misunderstanding nbconvert.
Thanks for the link to stepping through the notebook @minrk , definitely something to add in to IPython.

@minrk
IPython member

I was under the impression that with the repr... methods, objects could have different representations for different purposes. So there could be one for the display in the notebook and a different one for LaTeX output (so that no PDF representation would be needed in the notebook application for example). Is this correct? Where can I check this in detail?

Sort of. Objects can have different representations for use in different contexts, but all of them are rendered at once. It is in the live notebook that these representations are computed and then all of them stored in the notebook document (A JSON structure). Note that only one is displayed in the live notebook UI, but all are stored in the document.

In NbConvert, no code execution happens and thus no new representations are computed - only transformations of the input and output from what is already stored in this notebook document.

I guess I'm again misunderstanding nbconvert.

I think the main point is that representations are computed at 'execution' time, which is distinct from 'nbconvert' time, which is given the static input and output of the notebook document.

That said, a script that 're-runs' the notebook can certainly be used to PDF-ify figures by executing the notebook again after enabling PDF output. However, this is only feasible if re-running the notebook actually makes sense (e.g.. it doesn't depend on time-changing or otherwise unavailable data, for instance, or it takes a few days to execute). But this would just be another 'execution' of a notebook, and still done prior to using nbconvert to translate to other formats.

Further, it is not out of the question that a 'Transformer', in the current terminology, could be used to actually re-execute the entire notebook and generate new output. This might be a misuse of nbconvert, I'm not sure. Nonetheless, it would not be the standard behavior.

@dpsanders

OK, let me step back a bit.
I've been looking at the SVG / PDF business.

Apparently, inkscape is the correct solution for the conversion. I now even have a sneaking feeling that perhaps matplotlib uses this somehow, since the output seems to be identical. (I will confirm this tomorrow.)

There is in fact a package 'svg' on CTAN (the TeX package repository), but I cannot get it to work at the moment.
In any case, it's just a question of running

inkscape myfig.svg --export-pdf=myfig.pdf

on all the SVG's and problem solved!
I guess this is easy ("trivial", for those for whom it is trivial :P) to put into the transformation technology.

I think I was just remembering that a few years ago SVG support was very bad, so that PDF was the correct vectorized solution. Apparently now SVG is just as good?

To summarize: SVG is the way to go!

@dpsanders

By the way, why is it not possible to stretch (with the mouse) SVG figures in the Notebook app? It seems that they should be ideally suited for this treatment, being vector format (unlike PNGs, for which it is possible, but which look terrible in the process)? Could this be fixed?

@minrk
IPython member

It probably can be fixed - ironically it's easy to just make a plain image resizable with a single call than it is an SVG.

@dpsanders

Please give me a pointer to the right place in the code and I'll have a look...

@minrk
IPython member

For resizable SVG? I would first look up a general solution for how to add resize handles on SVGs in a page, then the place to add it in the code would be here.

@dpsanders

The sphinx_howto format also fails to process the SVGs.

@Carreau
IPython member

Could there be some kind of %%header magic, which could import an external header file for all of the extra \usepackage stuff one could need in the latex converter? I presume other converters may also need similar header boilerplate.

You got raw cells that should be unchanged in the latex exporter.

That said, I think a Transformer could be used by nbconvert to include re-execution as a part of the export process, ignoring the output saved in the document.

Well this go into the ipynb -> ipynb conversion process. it is meaningfull to me. you could even have a "template" notebook and nbconvert could spit derivatives of it.

There is in fact a package 'svg' on CTAN (the TeX package repository), but I cannot get it to work at the moment.
In any case, it's just a question of running

I had it to work (if it's the one I think of). It's ... complex and awesome. You can typeset the text of the svg by LaTeX at compile time. which mean the .texhave access to what is inside the svg and vice-versa. cross references environment.... Problem is it is a PAIN to put the text at the right place as Matplotlib (for example) do not know the exact size of the font that will be use.

@jakobgager

The issue with the %%header magic was to get some includes which are only allowed in the preamble of the tex file. raw cells, however, are placed inside the tex body and this won't work.

@dpsanders

Everything seems to already be setup in the code for the LaTeX exporter to convert SVGs to PDFs using Inkscape.
But nonetheless it does not work...

@jakobgager

Yeah I can confirm this problem. Can you open a separate issue for this one? I want to keep the present more related to non-blocking problems 😄

@dpsanders

Some important nbconvert issues I have opened:
#3701, #3702, #3703

@jakobgager

I'm closing this here, since the IPythonMarkdownPandocLimitations.ipynb notebook by @jdfreder lists all these.

@jakobgager jakobgager closed this Oct 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment