Skip to content


Repository files navigation

(Canonical) Citation Extractor


DOI Build Status codecov


This software supports Python version 2.7, and it was tested only on POSIX–compliant operating systems (Linux, Mac OS X, FreeBSD, etc.).

Installing TreeTagger

The CitationExtractor relies on TreeTagger for the PoS tagging of input texts.

There is a handy script to install it.

To run it without having to clone this repo:

wget -O
chmod a+x


git clone
cd CitationExtractor
chmod a+x

With pip

To install the CitationExtractor first run:

$ pip install
$ pip install

followed by:

$ pip install citation-extractor

NB: the installation of all other dependencies is handled by but for some reason (that I'm still trying to figure out) it does not pick up these two.

Verify installation

To double check that everything was installed correctly, try running the following lines (it should take ~20s):

from citation_extractor.settings import crfsuite
from citation_extractor.pipeline import get_extractor
extractor = get_extractor(crfsuite)
assert extractor is not None

If the code above runs without throwing exceptions means you managed to install the library!


I'm working on it ;-)

For the time being, you can find a concrete example of how to use the library in this notebook.