Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
128 lines (101 sloc) 4.45 KB
Please see the README.TXT first to familiarize yourself with the
binaries (shell scripts) for this software package. You should first
attempt to get the basic command line clients working first. The
Troubleshooting section of the home page (in doc/index.html in this
distribution; identical to the ParsCit home page at also contains some notes of
Please note: Currently the software is not supported by any auto
install package -- we expect a UNIX-knowledgable person to be able to
get the installation working within 30 minutes to an hour.
You must have ruby and perl and a working C++ compiler on your local
ParsCit use the following Perl libraries from CPAN
- Class::Struct
- Getopt::Long
- Getopt::Std
- File::Basename
- File::Spec
- FindBin
- HTML::Entities
- IO::File
- XML::Parser
- XML::Twig
- XML::Writer
- XML::Writer::String
**Command-line client**
These are the two scripts: and The
second script is a subset of the first, and should be your first
target to get running.
1) You will first need to install CRF++. CRF++ is the
core conditional random field learner, that is re-distributed in this
ParsCit software. It is due to Taku Kudo. We are using CRF++ version 0.51.
You have two options:
a) You may install CRF++ anywhere, and set the environment variable
CRFPP_HOME to tell ParsCit where to find it. CRFPP_HOME should be set
such that "$CRFPP_HOME/bin" contains the binaries crf_test and crf_learn.
For example if you run a default "make install" on Linux and those
binaries are installed to "/usr/local/bin", CRFPP_HOME should be
b) You may place the binaries in the crfpp/ directory within the ParsCit
source. You do not then need to set any environment variables, although
you may need to edit files.
Installation methods:
a) From source, in the crfpp directory:
$ cd crfpp
# we're going to rebuild CRF 51
$ rm -Rf CRF++-0.51
$ tar -xvzf crf++-0.51.tar.gz
$ cd CRF++-0.51
$ ./configure
# important! the first "make" command fails for me; not sure why
# Also see the Troubleshooting notes in the ParsCit home page
# (doc/index.html) for other possible problems with this step.
$ make
$ make clean
$ make
# optional, you may want to use sudo or root privileges to install CRF binaries so that other applications can use CRF. Thanks to Priya Venkateshan for this.
$ sudo make install
# hopefully by here, you have a successful build of both crf_test and crf_learn
# move executables to where parscit expects to find them
$ cp crf_learn crf_test ..
# on Windows you may have to do this instead, as the executables are named with .exe
$ copy crf_learn.exe ../crf_learn
$ copy crf_test.exe ../crf_test
$ cd .libs
$ cp -Rf * ../../.libs
b) With apt, on a Debian-based Linux distribution:
sudo apt-add-repository 'deb oneiric all'
sudo apt-get update
sudo apt-get install libcrf++-dev crf++
c) With Homebrew, on OS X:
brew install crf++
2) Once the binaries are placed properly, if you are not using the
CRFPP_HOME environment variable, you may need to edit the
lib/ParsCit/ file to point to the proper directories on your
machine. If you are using CRFPP_HOME, you may want to set it in
your .bashrc or .profile file.
3) Edit the shebang lines (first line) of the scripts in the bin/
directory to point to the proper versions of perl
4) Try running and on the .txt data
found in demodata. It should produce the same result as found in the
.out files. In the bin/ directory try:
./ -m extract_all ../demodata/sample2.txt
./ -i xml -m extract_all ../demodata/E06-1050.xml
**Web Service**
These are the remaining parts of the system. There are two sets of
client/server pairs (perl ones tested at IST and ruby ones tested at
NUS). You can use either and the installation does not require both.
1) In order to use the ParsCit web service you will need the following
modules in your perl library:
2) Edit lib/ParsCit/ to provide values appropriate for your
environment. Also edit wsdl/ParsCit.wsdl to reflect any changes to
your service URL.