HTTPS clone URL
Subversion checkout URL
Currently, the code is a proof of concept for the task of extracting Canonical References as defined here.
As input data I used the XML output of JSTOR’s DFR API.
This API allows you to get different pieces of information for a given document in the JSTOR archive. For my purpose I’m calling the API to get the references contained in it. Indeed JSTOR pre-processes already the text content of its document identifying some chunks of information.
After you installed all the depencies, install the CRefEx python module as usually by typing:
python setup.py install
CRefEx relies on the following external modules/libraries:
CRF++: a C++ implementation of CRF written by Taku Kudo (I’m using version 0.53) which provides a python bridge.
For the installation of the python module to call CRF++ please refer to the instruction in the ./python/README file (the path is relative to CRF++’s installation folder)
- two python scripts for Cross Validation (partitioner.py and crossvalidationconstructor.py) written by Michael G. Noll
Before running the code, change set the following variables accordingly with your local settings:
- EVAL_PATH in eval.py: path to the directory where some data files will be written to
- LOG_FILE: path to the log file
To run the example provided type: