Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LaTeX citation handling to nbconvert #4090

Merged
merged 7 commits into from Aug 30, 2013
Merged

Conversation

ellisonbg
Copy link
Member

This PR adds the ability for nbconvert to manage LaTeX citations. These are entered into the markdown cells using data attributes:

<strong data-cite="granger">(Granger, 2013)</strong>

Which gets converted to the following in the LaTeX document: \cite{granger}

We then run BibTeX. The user also has to override the bibliography block of the nbconvert template. See:

https://github.com/ipython/nbconvert-examples

For a full example.

@jdfreder
Copy link
Member

👍

@@ -30,31 +30,101 @@ class PDFPostProcessor(PostProcessorBase):
How many times pdflatex will be called.
""")

command = List(["pdflatex", "{filename}"], config=True, help="""
pdflatex_command = List(["pdflatex", "{filename}"], config=True, help="""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be 'latex_command', if it should change. It is not specific to pdflatex (I use xelatex, for example).

* Don't use lstrip/rstrip in that way.
* Renaming things in the pdf postprocessor.
@ellisonbg
Copy link
Member Author

OK @minrk review comments are addressed.

pass
else:
for child in node:
_process_node_cite(child)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which branches of this block are exercised in your test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my latest PR all branches are covered.

@minrk
Copy link
Member

minrk commented Aug 24, 2013

What are the chances, do you suppose, of supporting the citation syntax in other output formats (rst, etc.)? If we plan to do that, then parse_citation is probably the wrong name, since it is specifically converting the citation from HTML to latex (citation2latex, maybe?). Unless you plan to have just one filter that always parses citations correctly, switching based on the output format.

@ellisonbg
Copy link
Member Author

Yes, I think that rename makes sense. There would need to be different
handling for other output formats.

On Sat, Aug 24, 2013 at 4:30 AM, Min RK notifications@github.com wrote:

What are the chances, do you suppose, of supporting the citation syntax in
other output formats (rst, etc.)? If we plan to do that, then
parse_citation is probably the wrong name, since it is specifically
converting the citation from HTML to latex (citation2latex, maybe?).
Unless you plan to have just one filter that always parses citations
correctly, switching based on the output format.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4090#issuecomment-23207240
.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

* Rename parse_citation to citation2latex.
* Add <p> block to test markdown.
@ellisonbg
Copy link
Member Author

OK I think all review comments have been addressed.

@ellisonbg
Copy link
Member Author

Let's not merge this until we have a chance to talk at this weeks dev meeting. I have been looking at enabling MathJAx's equation numbering. In this mode, MathJax starts to parse more LaTeX syntax such as \ref{eq:foo}. I am wondering if we should just embrace using the \cite{granger} syntax in Markdown and write a MathJax extension to render that pleasantly in HTML. In don't think this is as flexible as the scheme in this PR in terms of what the HTML shows.

@Carreau
Copy link
Member

Carreau commented Aug 28, 2013

I am wondering if we should just embrace using the \cite{granger} syntax in Markdown and write a MathJax extension to render that pleasantly in HTML

If it's a macro-style that expanded by codemirror why not, but adding extra syntax to markdown... you know the drill.
Maybe we could have a look at what JATS format (cf #4119) need for citation/xlink ? It might give us more insight of what will be needed.

@ellisonbg
Copy link
Member Author

Yes, it is more syntax we would be adding to Markdown, but if we want
equation numbering in the live notebook through MathJax, we have to embrace
this "latex in markdown" model anyways. This syntax would be parsed into
HTML when MathJax renders the Markdown cell- so codemirror wouldn't be
involved.

On Tue, Aug 27, 2013 at 11:45 PM, Matthias Bussonnier <
notifications@github.com> wrote:

I am wondering if we should just embrace using the \cite{granger} syntax
in Markdown and write a MathJax extension to render that pleasantly in HTML

If it's a macro-style that expanded by codemirror why not, but adding
extra syntax to markdown... you know the drill.
Maybe we could have a look at what JATS format (cf #4119#4119)
need for citation/xlink ? It might give us more insight of what will be
needed.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4090#issuecomment-23394535
.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg
Copy link
Member Author

At the dev meeting this week we decided to go forward with this approach, merging.

ellisonbg added a commit that referenced this pull request Aug 30, 2013
Add LaTeX citation handling to nbconvert
@ellisonbg ellisonbg merged commit 9f92804 into ipython:master Aug 30, 2013
@jakobgager jakobgager mentioned this pull request Sep 9, 2013
7 tasks
@jakobgager
Copy link
Contributor

With respect to the issues in #4251 I wonder why the citation parsing uses lxml instead of simple regex?
Please correct me but this could be achieved using something like:

import re
s2 = u"Before <STRONG data-cite='granger'>(Granger, 2013)</strong> between <cite data-cite='foo'>(foo, 2012)</cite> behind"
re.sub("<(?P<tag>[a-z]*) data-cite='(?P<label>[^']*).*?/(?P=tag)>",'\\cite{\g<label>}',s2, flags=re.S|re.I)

this gives

u'Before \\cite{granger} between \\cite{foo} behind'

it should be quite robust (the flags option requires python >=2.7)

@Carreau
Copy link
Member

Carreau commented Sep 25, 2013

Because 'data-' is a general microdata format for html, and we'll probably use it for other things than citation.

@ellisonbg
Copy link
Member Author

Parsing html with regex is almost always a bad idea. The problems arise
when you pass nested HTML tags to the parser. The tags having the
data-cite attribute don't have to be simple tags, they could be an entire
nested HTML structure, like a table.

On Wed, Sep 25, 2013 at 6:58 AM, Jakob Gager notifications@github.comwrote:

With respect to the issues in #4251https://github.com/ipython/ipython/issues/4251I wonder why the citation parsing uses lxml instead of simple regex?
Please correct me but this could be achieved using something like:

import res2 = u"Before (Granger, 2013) between (foo, 2012) behind"re.sub("<(?P[a-z]) data-cite='(?P[^']).*?/(?P=tag)>",'\cite{\g}',s2, flags=re.S|re.I)

this gives

u'Before \cite{granger} between \cite{foo} behind'

it should be quite robust (the flags option requires python >=2.7)


Reply to this email directly or view it on GitHubhttps://github.com//pull/4090#issuecomment-25087748
.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@jakobgager
Copy link
Contributor

I totally agree that parsing a complete html structure with regex is an odd idea, however in the present case the data-cite microdata looks like just some sort of tag for a particular html container within a markdown text. This container is subsequently completely replaced by \\cite{...}. Hence I don't see any problems using a simple regex here.
Moreover, this would fix the bug with < in a markdown text (once it appears the rest of the markdown cell is stripped, see #4251 bottom)

@ellisonbg
Copy link
Member Author

The data-cite attribute was designed to work with any HTML tag, that way
users can style the citation in HTML in whatever way they want.

On Wed, Sep 25, 2013 at 2:10 PM, Jakob Gager notifications@github.comwrote:

I totally agree that parsing a complete html structure with regex is an
odd idea, however in the present case the data-cite microdata looks like
just some sort of tag for a particular html container within a markdown
text. This container is subsequently completely replaced by \cite{...}.
Hence I don't see any problems using a simple regex here.
Moreover, this would fix the bug with < in a markdown text (once it
appears the rest of the markdown cell is stripped, see #4251https://github.com/ipython/ipython/issues/4251bottom)


Reply to this email directly or view it on GitHubhttps://github.com//pull/4090#issuecomment-25125082
.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@jakobgager
Copy link
Contributor

Sorry I'm still not convinced lxml is necessary 😕
I've submitted a Draft/Demo PR with my approach to better discuss this.

@JohnGriffiths
Copy link

Am now using this very nice feature to I think very nice effect. Posting a Q here because I wasn't sure that e.g. stackoverflow would have much wisdom on this.

Q = how to change the colour of citations in the final pdf doc? At the moment mine are green, and as I've shifted to APA ('apalike') format rather than numbered references, there's lots of green words all over the text which are rather unsightly.

Where should I look to change such a setting?

Thanks,

john

@jakobgager
Copy link
Contributor

The coloring of the citation links in the final pdf is done by the hyperref package (see .tex file). The default options will color it green, which is what you see. To disable coloring, I guess you could add the option colorlinks=false. See e.g. http://stackoverflow.com/q/2770347/2870069 and ftp://tug.ctan.org/pub/tex-archive/macros/latex/contrib/hyperref/doc/options.pdf

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Add LaTeX citation handling to nbconvert
@bramtayl
Copy link

Is this still functional?

I'm trying to use citations in IPython. I'm having some isssues.

My notebook has one markdown cell,

<cite data-cite="debreu_theory_1959">(Debreu, 1959)</cite>

My template looks like this:

((*- extends 'article.tplx' -*))

((* block bibliography *))
\bibliographystyle{unsrt}
\bibliography{bibliography}
((* endblock bibliography *))

And my bibliography has the section

@book{debreu_theory_1959,
  address = {New Haven, {CT}},
  title = {Theory of value: An axiomatic analysis of economic equilibrium},
  shorttitle = {Theory of value},
  url = {http://books.google.com/books?hl=en\&lr=\&id=QkX10epC46cC\&oi=fnd\&pg=PA1\&dq=debreau+theory+of+value\&ots=9DNz2653qg\&sig=fKS1CYQ6ZFmXIUmOgDq3DRLPEmQ},
  timestamp = {2014-04-17 14:32:20},
  number = {17},
  urldate = {2014-04-17},
  publisher = {Yale University Press},
  author = {Debreu, Gerard},
  year = {1959},
  file = {Snapshot:/home/haldane/.mozilla/firefox/8qzu92vh.default/zotero/storage/CIEI6RPS/books.html:text/html}
}

My command line call is

ipython3 nbconvert --to pdf bibliography.ipynb --template bibliography.tplx

The pdf is created, but all it contains is

[?]

Yes, bibliography.ipynb, bibliography.tplx, and bibliography.bib are all in the same folder.

@Carreau
Copy link
Member

Carreau commented Aug 22, 2015

@bramtayl try opening an issue if we can track it and debug that there.
We are low on manpower these days, so it might be forgotten if you leave it as a comment on a merge Pull request.

I believe if should work.

@bramtayl
Copy link

Ok, see #8760

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants