Add LaTeX citation handling to nbconvert #4090

ellisonbg · 2013-08-21T21:26:05Z

This PR adds the ability for nbconvert to manage LaTeX citations. These are entered into the markdown cells using data attributes:

<strong data-cite="granger">(Granger, 2013)</strong>

Which gets converted to the following in the LaTeX document: \cite{granger}

We then run BibTeX. The user also has to override the bibliography block of the nbconvert template. See:

https://github.com/ipython/nbconvert-examples

For a full example.

jdfreder · 2013-08-21T23:22:01Z

👍

minrk · 2013-08-22T08:35:35Z

IPython/nbconvert/postprocessors/pdf.py

@@ -30,31 +30,101 @@ class PDFPostProcessor(PostProcessorBase):
        How many times pdflatex will be called.
        """)

-    command = List(["pdflatex", "{filename}"], config=True, help="""
+    pdflatex_command = List(["pdflatex", "{filename}"], config=True, help="""


this should be 'latex_command', if it should change. It is not specific to pdflatex (I use xelatex, for example).

* Don't use lstrip/rstrip in that way. * Renaming things in the pdf postprocessor.

ellisonbg · 2013-08-22T21:11:28Z

OK @minrk review comments are addressed.

minrk · 2013-08-24T11:26:11Z

IPython/nbconvert/filters/citation.py

+            pass
+    else:
+        for child in node:
+            _process_node_cite(child)


Which branches of this block are exercised in your test?

With my latest PR all branches are covered.

minrk · 2013-08-24T11:30:55Z

What are the chances, do you suppose, of supporting the citation syntax in other output formats (rst, etc.)? If we plan to do that, then parse_citation is probably the wrong name, since it is specifically converting the citation from HTML to latex (citation2latex, maybe?). Unless you plan to have just one filter that always parses citations correctly, switching based on the output format.

ellisonbg · 2013-08-26T20:39:12Z

Yes, I think that rename makes sense. There would need to be different
handling for other output formats.

On Sat, Aug 24, 2013 at 4:30 AM, Min RK notifications@github.com wrote:

What are the chances, do you suppose, of supporting the citation syntax in
other output formats (rst, etc.)? If we plan to do that, then
parse_citation is probably the wrong name, since it is specifically
converting the citation from HTML to latex (citation2latex, maybe?).
Unless you plan to have just one filter that always parses citations
correctly, switching based on the output format.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4090#issuecomment-23207240
.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

* Rename parse_citation to citation2latex. * Add <p> block to test markdown.

ellisonbg · 2013-08-27T04:55:42Z

OK I think all review comments have been addressed.

ellisonbg · 2013-08-28T04:59:11Z

Let's not merge this until we have a chance to talk at this weeks dev meeting. I have been looking at enabling MathJAx's equation numbering. In this mode, MathJax starts to parse more LaTeX syntax such as \ref{eq:foo}. I am wondering if we should just embrace using the \cite{granger} syntax in Markdown and write a MathJax extension to render that pleasantly in HTML. In don't think this is as flexible as the scheme in this PR in terms of what the HTML shows.

Carreau · 2013-08-28T06:45:41Z

I am wondering if we should just embrace using the \cite{granger} syntax in Markdown and write a MathJax extension to render that pleasantly in HTML

If it's a macro-style that expanded by codemirror why not, but adding extra syntax to markdown... you know the drill.
Maybe we could have a look at what JATS format (cf #4119) need for citation/xlink ? It might give us more insight of what will be needed.

ellisonbg · 2013-08-28T16:33:43Z

Yes, it is more syntax we would be adding to Markdown, but if we want
equation numbering in the live notebook through MathJax, we have to embrace
this "latex in markdown" model anyways. This syntax would be parsed into
HTML when MathJax renders the Markdown cell- so codemirror wouldn't be
involved.

On Tue, Aug 27, 2013 at 11:45 PM, Matthias Bussonnier <
notifications@github.com> wrote:

I am wondering if we should just embrace using the \cite{granger} syntax
in Markdown and write a MathJax extension to render that pleasantly in HTML

If it's a macro-style that expanded by codemirror why not, but adding
extra syntax to markdown... you know the drill.
Maybe we could have a look at what JATS format (cf #4119 #4119)
need for citation/xlink ? It might give us more insight of what will be
needed.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4090#issuecomment-23394535
.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

ellisonbg · 2013-08-30T20:55:15Z

At the dev meeting this week we decided to go forward with this approach, merging.

Add LaTeX citation handling to nbconvert

jakobgager · 2013-09-25T13:58:07Z

With respect to the issues in #4251 I wonder why the citation parsing uses lxml instead of simple regex?
Please correct me but this could be achieved using something like:

import re
s2 = u"Before <STRONG data-cite='granger'>(Granger, 2013)</strong> between <cite data-cite='foo'>(foo, 2012)</cite> behind"
re.sub("<(?P<tag>[a-z]*) data-cite='(?P<label>[^']*).*?/(?P=tag)>",'\\cite{\g<label>}',s2, flags=re.S|re.I)

this gives

u'Before \\cite{granger} between \\cite{foo} behind'

it should be quite robust (the flags option requires python >=2.7)

Carreau · 2013-09-25T15:43:53Z

Because 'data-' is a general microdata format for html, and we'll probably use it for other things than citation.

ellisonbg · 2013-09-25T16:54:00Z

Parsing html with regex is almost always a bad idea. The problems arise
when you pass nested HTML tags to the parser. The tags having the
data-cite attribute don't have to be simple tags, they could be an entire
nested HTML structure, like a table.

On Wed, Sep 25, 2013 at 6:58 AM, Jakob Gager notifications@github.comwrote:

With respect to the issues in #4251 https://github.com/ipython/ipython/issues/4251I wonder why the citation parsing uses lxml instead of simple regex?
Please correct me but this could be achieved using something like:

import res2 = u"Before (Granger, 2013) between (foo, 2012) behind"re.sub("<(?P[a-z]) data-cite='(?P[^']).*?/(?P=tag)>",'\cite{\g}',s2, flags=re.S|re.I)

this gives

u'Before \cite{granger} between \cite{foo} behind'

it should be quite robust (the flags option requires python >=2.7)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4090#issuecomment-25087748
.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

jakobgager · 2013-09-25T21:10:30Z

I totally agree that parsing a complete html structure with regex is an odd idea, however in the present case the data-cite microdata looks like just some sort of tag for a particular html container within a markdown text. This container is subsequently completely replaced by \\cite{...}. Hence I don't see any problems using a simple regex here.
Moreover, this would fix the bug with < in a markdown text (once it appears the rest of the markdown cell is stripped, see #4251 bottom)

ellisonbg · 2013-09-26T00:27:34Z

The data-cite attribute was designed to work with any HTML tag, that way
users can style the citation in HTML in whatever way they want.

On Wed, Sep 25, 2013 at 2:10 PM, Jakob Gager notifications@github.comwrote:

I totally agree that parsing a complete html structure with regex is an
odd idea, however in the present case the data-cite microdata looks like
just some sort of tag for a particular html container within a markdown
text. This container is subsequently completely replaced by \cite{...}.
Hence I don't see any problems using a simple regex here.
Moreover, this would fix the bug with < in a markdown text (once it
appears the rest of the markdown cell is stripped, see #4251 https://github.com/ipython/ipython/issues/4251bottom)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4090#issuecomment-25125082
.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

jakobgager · 2013-09-26T09:08:02Z

Sorry I'm still not convinced lxml is necessary 😕
I've submitted a Draft/Demo PR with my approach to better discuss this.

JohnGriffiths · 2013-11-25T05:05:08Z

Am now using this very nice feature to I think very nice effect. Posting a Q here because I wasn't sure that e.g. stackoverflow would have much wisdom on this.

Q = how to change the colour of citations in the final pdf doc? At the moment mine are green, and as I've shifted to APA ('apalike') format rather than numbered references, there's lots of green words all over the text which are rather unsightly.

Where should I look to change such a setting?

Thanks,

john

jakobgager · 2013-11-25T07:36:47Z

The coloring of the citation links in the final pdf is done by the hyperref package (see .tex file). The default options will color it green, which is what you see. To disable coloring, I guess you could add the option colorlinks=false. See e.g. http://stackoverflow.com/q/2770347/2870069 and ftp://tug.ctan.org/pub/tex-archive/macros/latex/contrib/hyperref/doc/options.pdf

Add LaTeX citation handling to nbconvert

bramtayl · 2015-08-22T06:14:37Z

Is this still functional?

I'm trying to use citations in IPython. I'm having some isssues.

My notebook has one markdown cell,

<cite data-cite="debreu_theory_1959">(Debreu, 1959)</cite>

My template looks like this:

((*- extends 'article.tplx' -*))

((* block bibliography *))
\bibliographystyle{unsrt}
\bibliography{bibliography}
((* endblock bibliography *))

And my bibliography has the section

@book{debreu_theory_1959,
  address = {New Haven, {CT}},
  title = {Theory of value: An axiomatic analysis of economic equilibrium},
  shorttitle = {Theory of value},
  url = {http://books.google.com/books?hl=en\&lr=\&id=QkX10epC46cC\&oi=fnd\&pg=PA1\&dq=debreau+theory+of+value\&ots=9DNz2653qg\&sig=fKS1CYQ6ZFmXIUmOgDq3DRLPEmQ},
  timestamp = {2014-04-17 14:32:20},
  number = {17},
  urldate = {2014-04-17},
  publisher = {Yale University Press},
  author = {Debreu, Gerard},
  year = {1959},
  file = {Snapshot:/home/haldane/.mozilla/firefox/8qzu92vh.default/zotero/storage/CIEI6RPS/books.html:text/html}
}

My command line call is

ipython3 nbconvert --to pdf bibliography.ipynb --template bibliography.tplx

The pdf is created, but all it contains is

[?]

Yes, bibliography.ipynb, bibliography.tplx, and bibliography.bib are all in the same folder.

Carreau · 2015-08-22T14:20:53Z

@bramtayl try opening an issue if we can track it and debug that there.
We are low on manpower these days, so it might be forgotten if you leave it as a comment on a merge Pull request.

I believe if should work.

bramtayl · 2015-08-22T14:31:51Z

Ok, see #8760

ellisonbg added 3 commits August 21, 2013 12:29

Adding citation support.

c4c07de

Adding better logic to the PDF postprocessor.

6636876

Adding docs about latex citations.

f289f1f

jdfreder mentioned this pull request Aug 21, 2013

Handle raw html tags in markdown during conversion to latex #3503

Closed

minrk reviewed Aug 22, 2013
View reviewed changes

Fixing review comments:

37ebf77

* Don't use lstrip/rstrip in that way. * Renaming things in the pdf postprocessor.

minrk reviewed Aug 24, 2013
View reviewed changes

jdfreder mentioned this pull request Aug 26, 2013

nbconvert: Latex template refactor #4112

Merged

ellisonbg added 2 commits August 26, 2013 21:53

Addressing review comments.

5ac3d9a

* Rename parse_citation to citation2latex. * Add <p> block to test markdown.

Fixing docstring.

4a9e333

Fixing attribute access.

99bb6f0

ellisonbg added a commit that referenced this pull request Aug 30, 2013

Merge pull request #4090 from ellisonbg/citation

9f92804

Add LaTeX citation handling to nbconvert

ellisonbg merged commit 9f92804 into ipython:master Aug 30, 2013

jakobgager mentioned this pull request Sep 9, 2013

Upcoming issues with nbconvert #3603

Closed

7 tasks

jakobgager mentioned this pull request Sep 26, 2013

[DRAFT/DEMO] lxml free citation2latex #4284

Closed

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

Merge pull request ipython#4090 from ellisonbg/citation

18c0f39

Add LaTeX citation handling to nbconvert

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LaTeX citation handling to nbconvert #4090

Add LaTeX citation handling to nbconvert #4090

ellisonbg commented Aug 21, 2013

jdfreder commented Aug 21, 2013

minrk Aug 22, 2013

ellisonbg commented Aug 22, 2013

minrk Aug 24, 2013

ellisonbg Aug 27, 2013

minrk commented Aug 24, 2013

ellisonbg commented Aug 26, 2013

ellisonbg commented Aug 27, 2013

ellisonbg commented Aug 28, 2013

Carreau commented Aug 28, 2013

ellisonbg commented Aug 28, 2013

ellisonbg commented Aug 30, 2013

jakobgager commented Sep 25, 2013

Carreau commented Sep 25, 2013

ellisonbg commented Sep 25, 2013

jakobgager commented Sep 25, 2013

ellisonbg commented Sep 26, 2013

jakobgager commented Sep 26, 2013

JohnGriffiths commented Nov 25, 2013

jakobgager commented Nov 25, 2013

bramtayl commented Aug 22, 2015

Carreau commented Aug 22, 2015

bramtayl commented Aug 22, 2015

Add LaTeX citation handling to nbconvert #4090

Add LaTeX citation handling to nbconvert #4090

Conversation

ellisonbg commented Aug 21, 2013

jdfreder commented Aug 21, 2013

minrk Aug 22, 2013

Choose a reason for hiding this comment

ellisonbg commented Aug 22, 2013

minrk Aug 24, 2013

Choose a reason for hiding this comment

ellisonbg Aug 27, 2013

Choose a reason for hiding this comment

minrk commented Aug 24, 2013

ellisonbg commented Aug 26, 2013

ellisonbg commented Aug 27, 2013

ellisonbg commented Aug 28, 2013

Carreau commented Aug 28, 2013

ellisonbg commented Aug 28, 2013

ellisonbg commented Aug 30, 2013

jakobgager commented Sep 25, 2013

Carreau commented Sep 25, 2013

ellisonbg commented Sep 25, 2013

jakobgager commented Sep 25, 2013

ellisonbg commented Sep 26, 2013

jakobgager commented Sep 26, 2013

JohnGriffiths commented Nov 25, 2013

jakobgager commented Nov 25, 2013

bramtayl commented Aug 22, 2015

Carreau commented Aug 22, 2015

bramtayl commented Aug 22, 2015