(Canonical) Citation Extractor
This software supports Python version 2.7, and it was tested only on POSIX–compliant operating systems (Linux, Mac OS X, FreeBSD, etc.).
CitationExtractor relies on TreeTagger for the PoS tagging of input texts.
There is a handy script to install it.
To run it without having to clone this repo:
wget -O install_treetagger.sh https://raw.githubusercontent.com/mromanello/CitationExtractor/master/install_treetagger.sh chmod a+x install_treetagger.sh ./install_treetagger.sh rm install_treetagger.sh
git clone https://github.com/mromanello/CitationExtractor.git cd CitationExtractor chmod a+x install_treetagger.sh ./install_treetagger.sh rm install_treetagger.sh
To install the
CitationExtractor first run:
$ pip install http://www.antlr3.org/download/Python/antlr_python_runtime-3.1.3.tar.gz#egg=antlr_python_runtime-3.1.3 $ pip install https://github.com/mromanello/treetagger-python/archive/master.zip#egg=treetagger-1.0.1
$ pip install citation-extractor
NB: the installation of all other dependencies is handled by
setup.py but for some reason
(that I'm still trying to figure out) it does not pick up these two.
To double check that everything was installed correctly, try running the following lines (it should take ~20s):
from citation_extractor.settings import crfsuite from citation_extractor.pipeline import get_extractor extractor = get_extractor(crfsuite) assert extractor is not None
If the code above runs without throwing exceptions means you managed to install the library!
I'm working on it ;-)
For the time being, you can find a concrete example of how to use the library in this notebook.