REx: Relation Extraction. Modernized re-write of the code in the master's thesis: "Relation Extraction using Distant Supervision, SVMs, and Probabalistic First-Order Logic"
This project uses
sbt for build management. If you're unfamiliar with
sbt, see the last section
for some pointers.
To download all dependencies and compile code, run
To run all tests, execute
Moreover, to see code coverage, first run
test. The coverage report will be
output as an HTML file.
Command Line Applications
To produce bash scripts that will execute each individual command-line application within this
This project includes data that allows one to distantly supervise relation mentions in text.
The files are located under
data/: a local
README further explains the data content, format,
These files are large and are stored using
git-lfs. Be sure to
follow the appropriate instructions and ensure that you've set up this
git plugin (i.e. have
git lfs install once).
To evaluate relation extraction performance on the UIUC relation dataset using 3 fold cross-
validation, first build the executable scripts with
sbt pack then execute:
./target/pack/bin/relation-extraction-learning-main \ learn_eval \ -li data/uiuc_cog_comp_group-entity_and_relation_recognition_corpora/all.corp \ --input_format uiuc \ -cg true \ --cost 1 \ --epsilon 0.003 \ --n_cv_folds 3
learn_evalis the command for the script
-lispecifies where the labeled relation data lives
--input_formattells the program how to interpret the file at
uuicmeans to use the UUIC relation classification data format
-cg truemeans that candidate generation is performed
--costindicates the cost-sensitive learning parameter for the SVM
--epsiloncontrols the weight converage: stop when weight updates are less then this value
--n_cv_foldsindicates the number of folds to perform for cross-validation
Invoking this program with the
--help flag, or with no arguments, will output a detailed help
message to stdout.
Everything within this repository is copyright (2015-) by Malcolm Greaves.
Use of this code is permitted according to the stipulations of the Apache 2 license.
How to use
sbt, it is best to start it in the "interactive shell mode". To do this, simply
execute from the command line:
After starting up (give it a few seconds), you can execute the following commands:
compile // compiles code pack // creates executable scripts test // runs tests coverage / initializes the code-coverage system, use right before 'test' reload // re-loads the sbt build definition, including plugin definitions update // grabs all dependencies
There are a lot more commands for
sbt. And a ton of community plugins that extend
Not necessary! Just a few suggestions...
We recommend using the following configuration for sbt:
sbt -J-XX:MaxPermSize=768m -J-Xmx2g -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled
This gives some more memory to
sbt, gives it a better default GC option, and enables a better class loading &
Also, to limit the logging output of the Spark framework export this environment variable before running tests: