Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anchor links and conversion to LaTeX #11

Closed
flying-sheep opened this issue Apr 22, 2015 · 14 comments
Closed

Anchor links and conversion to LaTeX #11

flying-sheep opened this issue Apr 22, 2015 · 14 comments
Labels
format:LaTeX pertains to exporting to the LaTeX format
Milestone

Comments

@flying-sheep
Copy link
Contributor

Converting markdown like [Header](#Header) to LaTeX just strips out the anchor link instead of inserting \ref{header} (even though \label{header} is actually inserted next to the header)

@minrk
Copy link
Member

minrk commented Apr 22, 2015

I suspect this is a pandoc issue, rather than an IPython one. What version of pandoc do you have?

@flying-sheep
Copy link
Contributor Author

pandoc 1.13.1.

@takluyver takluyver added the format:LaTeX pertains to exporting to the LaTeX format label Jun 17, 2015
@jankatins
Copy link
Contributor

I've the same issue when I want to convert a notebook to pdf via the notebook UI. The notebook contains a TOC (which I manually inserted via a md cell and [headline](#headlineLink) links).

Could that be a problem due to the way the cells are processed: one by one instead of a complete document? E.g. pandoc sees only the link, but can't find the anchor for that link because it is in another cell?

@takluyver
Copy link
Member

That could well be the issue.

@stsievert
Copy link

I am having this same issue. I can replicate this when I include the section I'm linking to in a different cell. When I'm linking to a section in the same cell, the issue goes away.

I have also found it doesn't work when I include math in the titles. When I convert to latex with nbconvert, I see these lines in the output:

\protect\hyperlink{proof-for-ux24Vux5fux5cux257Bpux5fnux5cux257Dux24}{the
appendix}, by induction we can show that
% ...
\subsection{\texorpdfstring{Proof for
\(V_{p_n}\)}{Proof for V\_\{p\_n\}}}\label{proof-for-vux5fpux5fn}

@mpacer
Copy link
Member

mpacer commented Aug 18, 2016

Is the consensus that this is a pandoc issue and not a nbconvert issue? Based on @stsievert's success within a cell and @JanSchulz's hypothesis about the parsing granularity, it sounds like it might be an interaction, but one that could be somewhat alleviated if we were to parse the entire notebook at once instead of at the cell level.

I'm assuming there's a reason for not doing that (possibly having to do with the contents of individual cells to pandoc vs. concatenating entire notebooks before passing data to pandoc), but why is that? At least in the context of passing something to pandoc, couldn't we make this an option?

@takluyver
Copy link
Member

I think various things could be improved if we were to pass entire notebooks to pandoc, but we also lose quite a bit of the customisation we want, at least if we do it the simple way (convert entire notebook to markdown and then pass into latex). I think it may make sense to experiment with some alternative pathways from notebook to PDF, one of which would rely more heavily on pandoc, but at least for the time being, I don't think we should try to replace the workings of the existing LatexExporter.

@mpacer mpacer added this to the 6.0 milestone Aug 23, 2016
@mpacer
Copy link
Member

mpacer commented Aug 23, 2016

@takluyver We talked about this a bit today and @minrk suggested instead looking into the intermediate representation format for pandoc which can still mark up everything as separate pieces (and apparently is quite like nbformat). But will still be able to resolve the cross references globally.

@mpacer
Copy link
Member

mpacer commented Aug 23, 2016

Also, I'm pretty sure the only reason any of this worked at all is because pandoc does some automatic reference/identifier name conversion as part of it's handling of headers: http://pandoc.org/MANUAL.html#extension-auto_identifiers

specifically:
• Remove all formatting, links, etc.
• Remove all footnotes.
• Remove all punctuation, except underscores, hyphens, and periods.
• Replace all spaces and newlines with hyphens.
• Convert all alphabetic characters to lowercase.
• Remove everything up to the first letter (identifiers may not begin with a number or punctuation mark).
• If nothing is left after this, use the identifier section.

If we want to support this feature, we should probably add something about this to the documentation, or at least point to the pandoc resource explaining it (ping @willingc, which do you think should be the approach).

Also, we can specify references explicitly, by giving them a unique CSS style id attribute.

Headers can be assigned attributes using this syntax at the end of the line containing the header text:

{#identifier .class .class key=value key=value}

@willingc
Copy link
Member

If we want to support this feature, we should probably add something about this to the documentation, or at least point to the pandoc resource explaining it (ping @willingc, which do you think should be the approach).

My recommendation for the nbconvert docs would be the following:

  • if the underlying nbconvert code is changed then we should be more detailed in the nbconvert docs
  • if there is no new nbconvert code to accomplish this, I would recommend linking to the pandoc resource for explanation with perhaps a couple of sentences in the nbconvert docs.

@mpacer
Copy link
Member

mpacer commented Aug 23, 2016

My recommendation for the nbconvert docs would be the following:

  • if the underlying nbconvert code is changed then we should be more detailed in the nbconvert docs
  • if there is no new nbconvert code to accomplish this, I would recommend linking to the pandoc resource for explanation with perhaps a couple of sentences in the nbconvert docs.

So right now, this works in a minimal case using standard pandoc code, but only within a single cell. That's such limited functionality, I'm not sure if it merits inclusion as of yet.

That said, the auto-identifier formatting might be weird for people… but that leads to a slightly different issue…

@Carreau Do we currently support anything like tab-completions for these kinds of selectors? I don't think we do. Or should that be a new issue? Should that be a second aim after getting cross references to work across the entire document? I feel like for autoidentifiers, if we are going to mention it in the documentation, given how tricky they can be to figure out how to write correctly, we should have some way to make it easier to automatically complete them (at least at the cell level).

Ok, and all that said, It looks like in the long run the syntax can follow pandoc's but the support for the feature is going to vary dramatically between now and then (if I can figure out how to make it work). Meaning that the way forward with documentation might be a partial note on this now, with a more elaborate exploration of it later when it actually works across cells?

@takluyver
Copy link
Member

Discussed this with Michael today. We can relatively easily preserve these links by using a pandoc filter to convert. We have such a filter in bookbook already (this is actually a bit more complex than we need for nbconvert, since it also deals with references between notebooks).

There will be some performance penalty in using a filter, whether we use pandoc twice to turn markdown -> JSON and JSON -> Latex, or run pandoc once and let it invoke a Python subprocess for the filter. @michaelpacer is investigating what this is like.

@mpacer
Copy link
Member

mpacer commented Nov 3, 2016

Did a quick and dirty approach to this, and on a benchmark set of documents; we're seeing a increase to 1.5× for calling pandoc twice from within python and 2.5× for invoking python from within pandoc. Eitherway we're using the pandocfilters library.

Now, to make the solutions not quick and dirty.

@willingc
Copy link
Member

willingc commented Nov 3, 2016

Excellent @michaelpacer. We could perhaps try to optimize further next week too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format:LaTeX pertains to exporting to the LaTeX format
Projects
None yet
Development

No branches or pull requests

7 participants