Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Introduction to a citation processor in org-ref

As a potential solution for citations in org-mode for non-LaTeX export, here we introduce csl (citation syntax lisp). The idea is heavily influenced by the xml-based Citation Syntax Language, but uses lisp sexps instead.

Briefly, there is a csl file that contains two variables: citation-style and bibliography-style. The citation-style defines how the in-text citations are represented for different types of citations. The bibliography-style defines how the bibliography is constructed.

What do we gain by this?

  1. No need for external citeproc program, and hackability by org-mode experts.
  2. Punctuation transposition and space chomping, i.e. put superscripts on the right side of punctuation if you want it, and remove whitespace before superscripts if you want it.
  3. Total tunability of the citation format to different backends.
  4. Easy to change bibliography format with the bibliographystyle link.
  5. The use of Bibtex databases. These are plain text, and flexible.

The real code for this is too long to blog about. Instead, you should check it out here:

Reference types

  • A book cite:kittel-2005-introd-solid.
  • An article cite:kitchin-2015-examp
  • A miscellaneous bibtex type cite:xu-suppor.
  • A technical report cite:2011-mater-genom
  • An MS thesis: cite:ding-2012-metal-oxide
  • A PhD thesis: cite:Hossein-thesis
  • in proceedings: cite:abolhasani-2011-model

There is work to do in supporting other types of entry types that are common in bibtex files.

Citation types

  • Regular citation: cite:kitchin-2015-examp
  • citenum: See Ref. citenum:kitchin-2015-examp
  • citeauthor: citeauthor:kitchin-2015-examp
  • citeyear: citeyear:kitchin-2015-examp

There is work to do in supporting other types of citations.

Multiple citations and sorting within citation

You can specify that the cites within a citation are consistently sorted in the export.

  • a,b: cite:kitchin-2015-examp,kitchin-2015-data-surfac-scien
  • b,a: cite:kitchin-2015-data-surfac-scien,kitchin-2015-examp

There is work to do for range collapsing, e.g. to turn 1,2,3 into 1-3.

Space chomping and punctuation testing

I think citations should always be put in the sentence they logically belong to. LaTeX has a feature through natbib I think where for some styles, e.g. superscripts, the citations are moved to the right side of punctuation, and whitespace is chomped so the superscript is next to words, not separated by spaces. We can do that here too.

  • Citation at end of sentence cite:kitchin-2015-examp.
  • Citation in clause cite:kitchin-2015-examp,kitchin-2015-data-surfac-scien, with a comma.
  • Citation in middle of cite:kitchin-2015-examp,kitchin-2015-data-surfac-scien a sentence.


At the moment, you have to add a hook function to put the replacements in the document before parsing.

(add-to-list 'load-path ".")
(require 'org-ref-citeproc)

(when (file-exists-p "readme.html") (delete-file "readme.html"))
(let ((org-export-before-parsing-hook '(orcp-citeproc)))
  (browse-url (org-html-export-to-html)))

An org export for footnotes. The footnotes don’t go where I expect.

(add-to-list 'load-path ".")
(require 'org-ref-citeproc)

(let ((org-export-before-parsing-hook '(orcp-citeproc)))
  (org-open-file (org-org-export-to-org)))

Summary thoughts

This looks promising. There is probably a lot of work to do to make this as robust as say citeproc-js or the Zotero handler. I am not sure if we could write this in a way to directly use the CSL. My feeling is it would not be as flexible as this, and we would have to add to it anyway.

Here are some remaining things that could be worked on if we continue this direction.

  1. Other bibtex entries need to be tested out.
  2. Remaining bibtex fields need to be defined.
  3. Standardization of styling that can be done. Not all features described in my csl are supported, e.g. et. al. and probably others.
  4. The author-year style needs name disambiguation somehow.
  5. Hyperlinking in org, html.
  6. Make sure export to other backends works. Can this make a LaTeX output with bibitems?
  7. Can this work for notes-based styles?


You use a bibliographystyle link to specify a csl. These are similar to bibtex styles, and in some cases no change is needed for LaTeX export (although you may have to remove the citeproc hook function).

bibliographystyle:unsrt-footnote bibliography:readme.bib

Archive work

These are sections I worked on developing the working approach.

Parsing bibtex names

Partially derived from

see also

Bibtex names have 4 parts:

  1. First name, which also includes middle name/initial
  2. von which is part of the last name, but in lower case letters
  3. Last name
  4. Jr.

Here are some acceptable formats for names in Bibtex.

"First von Last"
"von Last, First"
"von Last, Jr, First"

Note, it is also possible to have in Bibtex, in which case the von part becomes part of the last name.

"Jan A. {van Mieghem}"
"{van Mieghem}, Jan A."

And also

De Gaulle, Charles
Van Buren, Martin
Scott Thomas, Kristin
Bonham Carter, Helena
Garcia Pascual, Antonio
Del Negro, Marco
Van Gogh, Vincent
Della Francesca, Piero

Charles {De Gaulle}
Martin {Van Buren}
Kristin {Scott Thomas}
Helena {Bonham Carter}
Antonio {Garcia Pascual}
Marco {Del Negro}
Vincent {Van Gogh}
Piero {Della Francesca}

Dealing with these brackets is somewhat tricky. We cannot simply split on spaces with these present. A hack I will use is to replace them temporarily with something that looks like a name part, parse, and then put them back. Otherwise, we need a real recursive descent parser. I will use a uuid that starts with a capital letter. We develop an elisp function to parse these into a data structure: (first von last jr).

Here are examples of case 1.

(cl-loop for author in '("John R. Kitchin"
                      "John von Kitchin"
                      "John von de La von Kitchin"
                      "John von de Kitchin Jr."
                      "John {von de Kitchin}"
                      "John Paul Jones"
                      "Charles Louis Xavier Joseph de la Vallee Poussin"
                      "Ludwig von Beethoven"
                      "Jan A. {van Mieghem}"
                      "Vincent {Van Gogh}"
                      "Piero {Della Francesca}"
                      "{Barnes and Noble}"
                      "{Barnes & Noble}")
      collect (citeproc-parse-authorname author))

Case 2.

(cl-loop for author in '("von Beethoven, Ludwig"
                      "{van {M}ieghem}, Jan A."
                      "De Gaulle, Charles"
                      "Van Buren, Martin"
                      "Scott Thomas, Kristin"
                      "Bonham Carter, Helena"
                      "Garcia Pascual, Antonio"
                      "Del Negro, Marco"
                      "Van Gogh, Vincent"
                      "Della Francesca, Piero")
      collect (citeproc-parse-authorname author))

And case 3

(cl-loop for author in '("von de la Kitchin, Sr., John Robert"
                      "von Kitchin, Sr., John Robert")
      collect (citeproc-parse-authorname author))

I am pretty satisfied with that. This is a foundation for formatting author names in a bibliography. Note the bibtex function `bibtex-autokey-demangle-name’ function has some similar code for getting a last name to use as the key.


(cl-loop for author in '("Aa Bb"
                      "Aa bb"
                      "Aa bb Cc"
                      "Aa bb Cc dd Ee"
                      "Aa 1B cc dd"
                      "Aa 1b cc dd"
                      "Aa {b}B cc dd"
                      "Aa {b}b cc dd"
                      "Aa {B}b cc dd"
                      "Aa {B}B cc dd"
                      "Aa \\Bb{b} cc dd"
                      "Aa \\bb{b} cc dd"
                      "{Lastname with {,} in it}, Formerly Known as"
                      "Aa {bb} cc Dd"
                      "Aa bb {cc} Dd"
                      "Aa {bb} Cc")
      collect (citeproc-parse-authorname author))

I think this does about what it is supposed to do. Another function would be responsible for formatting the author name as required, e.g. putting initials in, the order of the names, etc…

Another function after that would be responsible for formatting a group of authors, e.g. specifying the delimiter between them, the trailing character after the authors, if et al should be used after some number of authors, etc…

Finally, I don’t consider any transformations of the author strings, e.g. if there are LaTeX commands or other markup in them. This gets passed verbatim to what ever is next.

Collapse numeric ranges

Make (1 2 3 4 7 9) be “1-4,7,9”

(defun collapse-range (cites)
  "CITES is a list of at least 3 numbers."
  (let (n
        (groups '()))
    (while cites
      (setq n (pop cites))
      (if (and (caar groups) (= (- n 1) (elt (car groups) 0)))
          (setf (car groups) (append `(,n) (car groups)))
        (setf groups (append `((,n)) groups))))
    ;; Now for each group
    (mapconcat 'identity
                (lambda (lst)
                   ((>= (length lst) 3)
                    (format "%s-%s" (car lst) (car (last lst))))
                   ((= (length lst) 2)
                    (format "%s,%s" (nth 0 lst) (nth 1 lst)))
                    (number-to-string (car lst)))))
                (mapcar 'reverse (reverse groups)))

(collapse-range '(1 2))

CSL parsing

Info nodes

This seems to be information about a CSL.

(let (xml)
  (setq xml (with-temp-buffer
              (insert-file-contents "acs-nano.csl")
              (libxml-parse-xml-region (point-min) (point-max))))
  (xml-get-children xml 'info))

Macro nodes

These seem to define inputs to a function that formats each field of an entry.

(let (xml)
  (setq xml (with-temp-buffer
              (insert-file-contents "acs-nano.csl")
              (libxml-parse-xml-region (point-min) (point-max))))
  (xml-get-children xml 'macro))

Citation overlay

This seems to define the layout of a citation in the text, how they are sorted, collapsed, and delimited.

#+BEGIN_SRC emacs-lisp :results code
(let (xml)
  (setq xml (with-temp-buffer
              (insert-file-contents "acs-nano.csl")
              (libxml-parse-xml-region (point-min) (point-max))))
  (xml-get-children xml 'citation))


This seems to layout how the bibliography is constructed.

(let (xml)
  (setq xml (with-temp-buffer
              (insert-file-contents "acs-nano.csl")
              (libxml-parse-xml-region (point-min) (point-max))))
  (xml-get-children xml 'bibliography))

Layout of entry

Here we get the layout of an entry.

(let (xml)
  (setq xml (with-temp-buffer
              (insert-file-contents "acs-nano.csl")
              (libxml-parse-xml-region (point-min) (point-max))))
(car (xml-get-children  (car (xml-get-children xml 'bibliography)) 'layout)))

An overview of a CSL for ACS Nano

  (insert-file-contents "acs-nano.csl")
  (libxml-parse-xml-region (point-min) (point-max)))

Working example

(add-to-list 'load-path ".")
(require 'org-ref-citeproc)
(require 'unsrt)

My data cite:kitchin-2015-examp in Ref. citenum:kitchin-2015-data-surfac-scien.

A multicite cite:kitchin-2015-examp,kitchin-2015-data-surfac-scien

Getting the citations

 (lambda (link)
   (org-element-property :type link))

I need a key to entry function

(orcp-key-to-entry "kitchin-2015-examp")

(cdr (assoc "year" (orcp-key-to-entry "kitchin-2015-examp")))




Sorting the entries

The unique entries will make up the bibliography. They might get sorted in a variety of ways, e.g. unsorted, alphabetically, by year, etc…

(orcp-sort-entries-increasing-year (orcp-collect-unique-entries))


(orcp-sort-entries-alphabetical (orcp-collect-unique-entries))

Getting a citation replacement

This will be taking a key, and the unique replacements, and computing a replacement for that key.

 (orcp-sort-entries-alphabetical (orcp-collect-unique-entries)))



(orcp-get-citation-style 'vertical-align 'citenum)

(cdr (assoc 'vertical-align (cdr (assoc 'citenum citation-style))))
(setq orcp-unique-bibliography-links (orcp-collect-unique-entries))

 (lambda (link)
    (org-element-property :type link)
    (orcp-get-text-replacement link)) )
(orcp-author (nth 3 (orcp-collect-unique-entries)))
(let ((entry (nth 3 (orcp-collect-unique-entries))))
   (orcp-author entry)
   (orcp-title entry)
   (orcp-journal entry)
   (orcp-volume entry)
   (orcp-pages entry)
   (orcp-year entry)
   (orcp-doi entry)))
(mapconcat 'identity (cl-loop for entry in (orcp-collect-unique-entries)
                            (orcp-author entry)
                            (orcp-title entry)
                            (orcp-journal entry)
                            (orcp-volume entry)
                            (orcp-pages entry)
                            (orcp-year entry)
                            (orcp-doi entry)))





Reading my csl

(require 'unsrt)
(assoc 'sort bibliography-style)


Putting it together

You run these to get the replacements

(cl-loop for link in *orcp-citation-links*
      for repl in (orcp-get-citation-replacements)
      (list repl
            (org-element-property :begin link)
            (org-element-property :end link)))
(cl-loop for link in (org-element-map
                        (org-element-parse-buffer) 'link 'identity)
        if (string= "bibliographystyle"
                    (org-element-property :type link))
        ;; get path for style
        (let ((style (intern (org-element-property :path link)))))
        (load-library style)
You can’t perform that action at this time.