Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

UCL/UMass BioNLP Event Extractor


Get maven, untar/unzip and then run

$ mvn compile

If it looks like dependencies cannot be found, first try this.

ucleed stores preprocessed data in a mongo database. Hence you need to get mongo, and run the mongo server

$ mongod

You should also have an installation of the BioNLP reranking parser by David McClosky on your machine.

You also need to configure a few directory locations. Copy the example in src/main/resources/props/example.prop and modify as needed.

Setting up the syntactic parser

ucleed uses the reranking parser by David McClosky, in combination with his Improved self-trained biomedical parsing model. In the configuration file, set rerankparser to the main directory of the parser, and biomodel to the directory of the biomedical parsing model. Note that for some odd reason, the recent versions of the bllip reranker expects bzip files and not the gzip files provided in the biomodel. You can fix this by calling

$ gunzip *.gz; bzip *

in the biomodel/reranker directory.


Before we train, we need to go through two preprocessing steps that prepare the data.

Data preprocessing

First call

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.ClearRaw"

to clear the database (this is actually only necessary if you want to rerun experiments but it shouldn't hurt). Then do

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.LowLevelAnnotation dev train test"

This will add tokenize, sentence-split etc. the data specified in the prop file.

Feature preprocessing

Next we run

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.ClearAnnotated"

to initialize the feature preprocessing database. Then do:

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.App dev train test"

This will prepare some candidate structures that are used during inference/learning.


Now copy data with features to the learning KB:

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx1g -Dprop=props/example.prop  -cp %classpath cc.refectorie.proj.bionlp2011.ClearLearningKB"

Finally, you're ready to train the model

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx8g -Dprop=props/example.prop -cp %classpath cc.refectorie.proj.bionlp2011.BioNLPLearner"

This will store weights for different epochs into $UMASSDIR/weights/[epoch]

Learning also runs evaluation on test and development sets. The results will appear in the outDir specified in the prop file.


You can use the stored weights in a standalone tool that applies the complete preprocessing chain and the event extractor model to input files. For this first set weightsSrc=weights/[epoch of choice] in the prop file. Generally epoch 4 or 5 seems to give good results, but can check what works best on the dev set.

Then run the standalone tool as follows:

$ mvn exec:exec -Dexec.executable="java" -Dexec.args="-Xmx80g -Dprop=props/example.prop -cp %classpath cc.refectorie.proj.bionlp2011.UMassBioEventExtractor [txt file] [a1file] [destfile]"

Further Reading and Citations

The most relevant citation for this work is our EMNLP paper. Further details can be found in our BioNLP shared task papers on system combination and dual decomposition.


UCL (BioNLP) (E)vent (E)xtractor based on dual (d)ecomposition






No releases published


No packages published